-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: QVQ-72B-Preview #11479
Comments
This model uses the same architecture as regular Qwen2-VL, so it should be supported already in vLLM. Just try it! |
Video input is already supported - how are you passing them to the model? |
I think there is currently some bug with video processing in
Let me report this issue to them. |
I hadn't tested the video input before, but thanks you for the reply! |
Well, maybe the official documentation doesn't mention supporting video input, so I didn't do the test. |
@ZB052-A Hello , Can you tell me how to load this model with vllm ? I get |
Are you able to run other 72B models with your setup? Maybe it's just that you don't have enough GPU memory. You can use tensor parallelism ( |
QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, showcasing QVQ's powerful ability in multidisciplinary understanding and reasoning. Furthermore, the significant improvements on MathVision highlight the model's progress in mathematical reasoning tasks. OlympiadBench also demonstrates the model's enhanced ability to tackle challenging problems. But It's Not All Perfect: Acknowledging the Limitations While QVQ-72B-Preview exhibits promising performance that surpasses expectations, it’s important to acknowledge several limitations: Language Mixing and Code-Switching: The model might occasionally mix different languages or unexpectedly switch between them, potentially affecting the clarity of its responses. QVQ not support video inputs |
Oh, I missed that part, thanks for bringing this up! Nevertheless, this is still an issue for regular Qwen2-VL. |
The model to consider.
https://huggingface.co/Qwen/QVQ-72B-Preview
The closest model vllm already supports.
Qwen2-VL-72B or Qwen2-VL-72B-Instruct
What's your difficulty of supporting the model you want?
At present, it seems that the model only supports single-round dialogue and images, and does not support video input. Hopefully the latest version can at least inference this model, thanks!
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: