Qwen2VLProcessor cannot handle odd number of video frames #35412

DarkLight1337 · 2024-12-25T06:58:11Z

System Info

- `transformers` version: 4.47.1
- Platform: Linux-5.4.0-174-generic-x86_64-with-glibc2.31
- Python version: 3.9.20
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
- Accelerate version: 1.0.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA A10

Who can help?

@ArthurZucker @zucchini-nlp

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I found that the processor for Qwen2-VL cannot handle input videos with an odd number of frames (except for videos with a single frame). This occurs regardless of the channel format and image dimensions of each frame.

import numpy as np
from transformers import AutoProcessor

# The processor fails when num_frames = 3, 5, 7, ...
num_frames = 3
video = np.random.randint(0, 255, size=(num_frames, 256, 256, 3), dtype=np.uint8)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor(text="<|vision_start|><|video_pad|><|vision_end|>", videos=[video])

Error when num_frames = 3

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 124, in __call__
    videos_inputs = self.image_processor(images=None, videos=videos, **output_kwargs["videos_kwargs"])
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/image_processing_utils.py", line 41, in __call__
    return self.preprocess(images, **kwargs)
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 439, in preprocess
    patches, video_grid_thw = self._preprocess(
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 299, in _preprocess
    patches = patches.reshape(
ValueError: cannot reshape array of size 571536 into shape (1,2,3,9,2,14,9,2,14)

Expected behavior

The processor should be able to handle videos with an odd number of frames.

The text was updated successfully, but these errors were encountered:

DarkLight1337 added the bug label Dec 25, 2024

This was referenced Dec 25, 2024

[New Model]: QVQ-72B-Preview vllm-project/vllm#11479

Open

[Usage]: Qwen/Qwen2-VL-7B-Instruct vllm-project/vllm#10994

Closed

jla524 linked a pull request Dec 27, 2024 that will close this issue

Fix Qwen2VL processor to handle odd number of frames #35431

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2VLProcessor cannot handle odd number of video frames #35412

Qwen2VLProcessor cannot handle odd number of video frames #35412

DarkLight1337 commented Dec 25, 2024 •

edited

Loading

Qwen2VLProcessor cannot handle odd number of video frames #35412

Qwen2VLProcessor cannot handle odd number of video frames #35412

Comments

DarkLight1337 commented Dec 25, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

DarkLight1337 commented Dec 25, 2024 •

edited

Loading