-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues after upgrading to 1.79 #1248
Comments
Which API format are you using? Could you list the field that was expected to be a string, but was null? For inference speeds, that could be due to vulkan kernel changes upstream and the backend refactor. It should have nothing to do with offloading, that error is just only 1 single layer. |
Can confirm that this exact same issue (token embed weight) is happening when using Nvidia GeForce 3060 12Gb, so it doesn't seem to be Vulkan exclusive. Also noticed that CPU us being used over and above GPU during response generation now, on all models I have tried, despite whether the model is offloading or it it's small enough to fit into VRAM. |
can also confirm, 4090 |
I had this issue since version koboldcpp-rocm-1.78.yr0-ROCm. 7900XTX + 2x 7600XT Version 1.77.yr1-ROCm was the last version without this behavior. |
Same (using 256 blas) on rtx 3090 + rtx 4070 ti s |
token_embd.weight never offloads to the GPU this is nornal behavior. Its about the rest of the layers that do / don't offload. |
Please try v1.80, should be less messy looking now |
I'm using a Ministral 8B Instruct Q4_K_M model fully offloaded onto an Arc A380 GPU using the Vulkan backend on Linux. Everything was working fine on KoboldCPP 1.76, except for the fact that for longer prompts I was getting a DeviceLost error, so I upgraded to 1.79.1 to see if it was a bug that was fixed. Setting a lower BLAS batch size seems to have fixed it, but I encountered two other problems.
which didn't appear in 1.76.
Any help with these would be greatly appreciated.
The text was updated successfully, but these errors were encountered: