Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about fine-tuning #136

Open
kimwin2 opened this issue Dec 3, 2024 · 4 comments
Open

Question about fine-tuning #136

kimwin2 opened this issue Dec 3, 2024 · 4 comments
Labels
alogrithm enhancement New feature or request question Further information is requested

Comments

@kimwin2
Copy link

kimwin2 commented Dec 3, 2024

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

@YangWang92
Copy link
Contributor

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

Sorry, I've been too busy lately, and I'll need to delay the open-sourcing by a few more weeks. The VPTQ paper includes the learning rate —make sure to fine-tune it carefully, as I remembered it being quite sensitive. Additionally, the trainable parameters are RMSNorm, centroids, and the LM head. Remember to keep the embeddings frozen.

Image

@YangWang92
Copy link
Contributor

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

I guess your experiment is overfitting. Remember to increase the accumulation batch size.

@YangWang92 YangWang92 added enhancement New feature or request question Further information is requested alogrithm labels Dec 3, 2024
@kimwin2
Copy link
Author

kimwin2 commented Dec 4, 2024

Thank you for your kind reply.

I have loaded a packed model into my existing QAT pipeline, but I'm getting a multi-GPU error. It seems to be caused by the int16 precision. Should I train with a saved model instead of a packed one?

@YangWang92
Copy link
Contributor

Yes, I encountered the same issue at the time. It seemed to be related to the data type handling in NCCL. My workaround was to save the unpacked model as a .pt file and use approaches like viewing the index as a float. This was also one of the reasons for the delay in releasing the fine-tuning code. Apologies for the inconvenience—this definitely needs to be addressed.

Thank you for your kind reply.

I have loaded a packed model into my existing QAT pipeline, but I'm getting a multi-GPU error. It seems to be caused by the int16 precision. Should I train with a saved model instead of a packed one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alogrithm enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants