Question about fine-tuning #136

kimwin2 · 2024-12-03T01:53:43Z

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

YangWang92 · 2024-12-03T09:17:30Z

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

Sorry, I've been too busy lately, and I'll need to delay the open-sourcing by a few more weeks. The VPTQ paper includes the learning rate —make sure to fine-tune it carefully, as I remembered it being quite sensitive. Additionally, the trainable parameters are RMSNorm, centroids, and the LM head. Remember to keep the embeddings frozen.

YangWang92 · 2024-12-03T09:19:43Z

Firstly, thank you for your great effort to make this project.

When will the fine-tuning code be released? If it's delayed, could you please let me know the learning rate used in the end-to-end training? I've tried to reimplement the fine-tuning using the model output logits, but while the train loss decreases, the evaluation loss keeps increasing.

I guess your experiment is overfitting. Remember to increase the accumulation batch size.

kimwin2 · 2024-12-04T00:52:00Z

Thank you for your kind reply.

I have loaded a packed model into my existing QAT pipeline, but I'm getting a multi-GPU error. It seems to be caused by the int16 precision. Should I train with a saved model instead of a packed one?

YangWang92 · 2024-12-04T06:18:50Z

Yes, I encountered the same issue at the time. It seemed to be related to the data type handling in NCCL. My workaround was to save the unpacked model as a .pt file and use approaches like viewing the index as a float. This was also one of the reasons for the delay in releasing the fine-tuning code. Apologies for the inconvenience—this definitely needs to be addressed.

Thank you for your kind reply.

I have loaded a packed model into my existing QAT pipeline, but I'm getting a multi-GPU error. It seems to be caused by the int16 precision. Should I train with a saved model instead of a packed one?

YangWang92 added enhancement New feature or request question Further information is requested alogrithm labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about fine-tuning #136

Question about fine-tuning #136

kimwin2 commented Dec 3, 2024

YangWang92 commented Dec 3, 2024

YangWang92 commented Dec 3, 2024

kimwin2 commented Dec 4, 2024

YangWang92 commented Dec 4, 2024

Question about fine-tuning #136

Question about fine-tuning #136

Comments

kimwin2 commented Dec 3, 2024

YangWang92 commented Dec 3, 2024

YangWang92 commented Dec 3, 2024

kimwin2 commented Dec 4, 2024

YangWang92 commented Dec 4, 2024