[Bug] Cannot run bitsandbytes llama models #2600

merrymercy · 2024-12-26T15:51:48Z

The issue is the same as #2556, but for llama models. We should be able to fix with a similar approach.

The following command crashes.

python3 -m sglang.bench_one_batch --model unsloth/llama-3-8b-bnb-4bit --load-format bitsandbytes

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/root/sglang/python/sglang/bench_one_batch.py", line 470, in <module>
[rank0]:     main(server_args, bench_args)
[rank0]:   File "/root/sglang/python/sglang/bench_one_batch.py", line 434, in main
[rank0]:     work_func(server_args, port_args, bench_args, 0)
[rank0]:   File "/root/sglang/python/sglang/bench_one_batch.py", line 369, in latency_test
[rank0]:     model_runner, tokenizer = load_model(server_args, port_args, tp_rank)
[rank0]:   File "/root/sglang/python/sglang/bench_one_batch.py", line 121, in load_model
[rank0]:     model_runner = ModelRunner(
[rank0]:   File "/root/sglang/python/sglang/srt/model_executor/model_runner.py", line 158, in __init__
[rank0]:     self.load_model()
[rank0]:   File "/root/sglang/python/sglang/srt/model_executor/model_runner.py", line 258, in load_model
[rank0]:     self.model = get_model(
[rank0]:   File "/root/sglang/python/sglang/srt/model_loader/__init__.py", line 22, in get_model
[rank0]:     return loader.load_model(
[rank0]:   File "/root/sglang/python/sglang/srt/model_loader/loader.py", line 1029, in load_model
[rank0]:     self._load_weights(model_config, model)
[rank0]:   File "/root/sglang/python/sglang/srt/model_loader/loader.py", line 960, in _load_weights
[rank0]:     model.load_weights(qweight_iterator)
[rank0]:   File "/root/sglang/python/sglang/srt/models/llama.py", line 442, in load_weights
[rank0]:     param = params_dict[name]
[rank0]: KeyError: 'model.layers.0.mlp.down_proj.qweight'
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:01<?, ?it/s]

The text was updated successfully, but these errors were encountered:

upskyy · 2024-12-27T02:04:26Z

@merrymercy
Looking at the code, I think the issue will also be resolved with the PR #2557. This problem occurred when changing to qweight when doing bitsandbytes 4bit load, but in the PR #2557, it was corrected not only for the gemma model but also for loads of other models.

bkodes · 2024-12-29T03:40:06Z

did #2557 resolve this? @upskyy

merrymercy · 2024-12-29T06:21:40Z

Yes. It is solved! Thanks for the fix @upskyy . It turns out I am using some old code.

merrymercy added the good first issue Good for newcomers label Dec 26, 2024

merrymercy mentioned this issue Dec 26, 2024

Error occurs when loading the gemma model in bitsandbytes format. #2557

Merged

3 tasks

merrymercy added the help wanted Extra attention is needed label Dec 26, 2024

merrymercy closed this as completed Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Cannot run bitsandbytes llama models #2600

[Bug] Cannot run bitsandbytes llama models #2600

merrymercy commented Dec 26, 2024 •

edited

Loading

upskyy commented Dec 27, 2024

bkodes commented Dec 29, 2024

merrymercy commented Dec 29, 2024

[Bug] Cannot run bitsandbytes llama models #2600

[Bug] Cannot run bitsandbytes llama models #2600

Comments

merrymercy commented Dec 26, 2024 • edited Loading

upskyy commented Dec 27, 2024

bkodes commented Dec 29, 2024

merrymercy commented Dec 29, 2024

merrymercy commented Dec 26, 2024 •

edited

Loading