You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I found that the processor for Qwen2-VL cannot handle input videos with an odd number of frames (except for videos with a single frame). This occurs regardless of the channel format and image dimensions of each frame.
import numpy as np
from transformers import AutoProcessor
# The processor fails when num_frames = 3, 5, 7, ...
num_frames = 3
video = np.random.randint(0, 255, size=(num_frames, 256, 256, 3), dtype=np.uint8)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor(text="<|vision_start|><|video_pad|><|vision_end|>", videos=[video])
Error when num_frames = 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 124, in __call__
videos_inputs = self.image_processor(images=None, videos=videos, **output_kwargs["videos_kwargs"])
File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/image_processing_utils.py", line 41, in __call__
return self.preprocess(images, **kwargs)
File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 439, in preprocess
patches, video_grid_thw = self._preprocess(
File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 299, in _preprocess
patches = patches.reshape(
ValueError: cannot reshape array of size 571536 into shape (1,2,3,9,2,14,9,2,14)
Expected behavior
The processor should be able to handle videos with an odd number of frames.
The text was updated successfully, but these errors were encountered:
System Info
Who can help?
@ArthurZucker @zucchini-nlp
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I found that the processor for Qwen2-VL cannot handle input videos with an odd number of frames (except for videos with a single frame). This occurs regardless of the channel format and image dimensions of each frame.
Error when
num_frames = 3
Expected behavior
The processor should be able to handle videos with an odd number of frames.
The text was updated successfully, but these errors were encountered: