You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to run the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model on an A100-40GB GPU. However, I encounter an OOM issue when loading the model in its default configuration. To address this, I attempted to enable 4-bit quantization using the bitsandbytes library by modifying my script as follows:
pretrained = "lmms-lab/LLaVA-NeXT-Video-32B-Qwen"
model_name = "llava_qwen"
device_map = "auto"
# Load the model with proper configuration
tokenizer, model, image_processor, max_length = load_pretrained_model(
pretrained,
None,
model_name,
load_in_8bit=False, # Ensure 8-bit quantization is disabled
load_in_4bit=True # Enable 4-bit quantization
)
model.eval()
However, when I run the script, I encounter the following error message:
ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models.
Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.
Could you clarify how to properly enable 4-bit quantization for the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model in Python scripts?
The text was updated successfully, but these errors were encountered:
Describe the issue
Hello, I am trying to run the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model on an A100-40GB GPU. However, I encounter an OOM issue when loading the model in its default configuration. To address this, I attempted to enable 4-bit quantization using the bitsandbytes library by modifying my script as follows:
However, when I run the script, I encounter the following error message:
Could you clarify how to properly enable 4-bit quantization for the lmms-lab/LLaVA-NeXT-Video-32B-Qwen model in Python scripts?
The text was updated successfully, but these errors were encountered: