Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loses information very quickly #25

Open
Lazy3valuation opened this issue May 28, 2024 · 2 comments
Open

Model loses information very quickly #25

Lazy3valuation opened this issue May 28, 2024 · 2 comments

Comments

@Lazy3valuation
Copy link

Hi! I trained the model with LoRA and 8 bit precision down to 1.5/2.5 training loss. The generation is segment-wise, but the model seems to not generate correct text. It cannot perform a needle-in-a-sack test even in small tests (less tokens than the segment size, aka 400 for me). It starts to spit out nosense very quickly. For example:
I've tried a NIAS test with this pattern:
"There is an important info hidden inside a lot of irrelevant text. Find it and memorize it. I will quiz you about the important information there."
Then a loop of "\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again." continues for many times (I've repeated it as long as to reach 400 tokens, 3600 tokens and 10k tokens).
Inside the loop, in a random position, there's a "\nThe pass key is 72498. Remember it. 72498 is the pass key.". In the end of the prompt, there's written "What is the pass key? The pass key is " and the base model completes correctly with 72498, up until 3600 tokens (then my GPU goes oom).

With infini attention, the model can't complete correctly even once. Moreover, the pattern repeated many times gets "broken", here's a completion example:
" The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is bluer. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is bluer. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we will. They will be a bit of the distance, at least we"

It behaves as if the model can't keep information at all, or for a very short amount of time. Has anyone tested how good those models go? I sadly noticed that the repo has not been updated in a month :-(

@Thirvin
Copy link

Thirvin commented Jun 6, 2024

I train Infini-llama with arxiv-paper.
The result is alike to yours.
It can't handle the attention compressed in memory.
Its outputs have little relation to the content I prvided.

@LWL-cpu
Copy link

LWL-cpu commented Jun 17, 2024

I also encountered a similar issue. For example, for the question-answering task, I input the passage into the model and save the memory, then provide the model with the question query along with the passage memory. However, the model seems to just repeat my query and is unable to answer the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants