-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about fine-tuning #136
Comments
Sorry, I've been too busy lately, and I'll need to delay the open-sourcing by a few more weeks. The VPTQ paper includes the learning rate —make sure to fine-tune it carefully, as I remembered it being quite sensitive. Additionally, the trainable parameters are RMSNorm, centroids, and the LM head. Remember to keep the embeddings frozen. |
I guess your experiment is overfitting. Remember to increase the accumulation batch size. |
Thank you for your kind reply. I have loaded a packed model into my existing QAT pipeline, but I'm getting a multi-GPU error. It seems to be caused by the int16 precision. Should I train with a saved model instead of a packed one? |
Yes, I encountered the same issue at the time. It seemed to be related to the data type handling in NCCL. My workaround was to save the unpacked model as a .pt file and use approaches like viewing the index as a float. This was also one of the reasons for the delay in releasing the fine-tuning code. Apologies for the inconvenience—this definitely needs to be addressed.
|
Firstly, thank you for your great effort to make this project.
When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.
The text was updated successfully, but these errors were encountered: