You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model exhibits abnormal generation behavior, such as repetitive or nonsensical outputs, despite successful pretraining performance.
During pretraining, the model performed well on scene description tasks and demonstrated strong zero-shot classification capabilities, indicating effective alignment between the visual encoder and the LLM.
Example generation after pretraining:
I've identified the one applicable category for this image. The category is: Arable land.
At that stage, model pretrained only scene description and do not know anything about the scenes class names.
However, after fine-tuning on the classification dataset, the model fails to produce meaningful outputs.
Example generation of the after finetuning on single label classification dataset:
So in the high-level, I am trying change vision encoder for processing different domain input image (dataset curated by me), and in the finetuning stage I am using single label classification dataset. (followed LLaVA conservation template)
I think getting good result after pretraining stage indicates, we are getting succesful signal after pretraining, means nothing wrong with vision_tower section. I am thinking possible reason for this issue: Pretrained task and finetuning task mis-alignment.
Should I increase the diffuculty level of the instruction tuning dataset?
PS: There were some other issues related to abnormal and repetetive generation, so I checked the image broker and dataset folder. They were all fine.
PS: Loss function has gradually decrease over the steps.
PS: I have tried to model overfitting on the given train set by decreasing sample size(trained only 3 percent of the finetuning data) and increasing number of epoch. But still cannot perform well on train set.
The text was updated successfully, but these errors were encountered:
Question
I am trying to change the model architecture,
Model exhibits abnormal generation behavior, such as repetitive or nonsensical outputs, despite successful pretraining performance.
During pretraining, the model performed well on scene description tasks and demonstrated strong zero-shot classification capabilities, indicating effective alignment between the visual encoder and the LLM.
Example generation after pretraining:
At that stage, model pretrained only scene description and do not know anything about the scenes class names.
However, after fine-tuning on the classification dataset, the model fails to produce meaningful outputs.
Example generation of the after finetuning on single label classification dataset:
-xResSeawayResSeaidentialSeaidentialSeaSeaSeawaySeaSeaSeawaySeaSeawayResSeaSeaSeaidentialSeaSeaSeaSeaSeaSeaidentialSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSeaSea
So in the high-level, I am trying change vision encoder for processing different domain input image (dataset curated by me), and in the finetuning stage I am using single label classification dataset. (followed LLaVA conservation template)
I think getting good result after pretraining stage indicates, we are getting succesful signal after pretraining, means nothing wrong with vision_tower section. I am thinking possible reason for this issue: Pretrained task and finetuning task mis-alignment.
Should I increase the diffuculty level of the instruction tuning dataset?
PS: There were some other issues related to abnormal and repetetive generation, so I checked the image broker and dataset folder. They were all fine.
PS: Loss function has gradually decrease over the steps.
PS: I have tried to model overfitting on the given train set by decreasing sample size(trained only 3 percent of the finetuning data) and increasing number of epoch. But still cannot perform well on train set.
The text was updated successfully, but these errors were encountered: