Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU #1791

Rachel0901 · 2024-12-07T22:01:08Z

Describe the issue

Hello, I am trying to run the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model on an A100-40GB GPU. However, I encounter an OOM issue when loading the model in its default configuration. To address this, I attempted to enable 4-bit quantization using the bitsandbytes library by modifying my script as follows:

pretrained = "lmms-lab/LLaVA-NeXT-Video-32B-Qwen"
model_name = "llava_qwen"
device_map = "auto"

# Load the model with proper configuration
tokenizer, model, image_processor, max_length = load_pretrained_model(
    pretrained,
    None,
    model_name,
    load_in_8bit=False,  # Ensure 8-bit quantization is disabled
    load_in_4bit=True    # Enable 4-bit quantization
)
model.eval()

However, when I run the script, I encounter the following error message:

ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models.
Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

Could you clarify how to properly enable 4-bit quantization for the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model in Python scripts?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU #1791

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU #1791

Rachel0901 commented Dec 7, 2024

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU #1791

Issue with 4-bit Quantization for LLaVA-NeXT-Video-32B Model on A100-40GB GPU #1791

Comments

Rachel0901 commented Dec 7, 2024

Describe the issue