Text Only input using LlaVa Next #35421

sinngam-khaidem · 2024-12-26T14:57:33Z

System Info

transformers version: 4.47.1
Platform: Linux-5.4.0-113-generic-x86_64-with-glibc2.39
Python version: 3.11.11
Huggingface_hub version: 0.27.0
Safetensors version: 0.4.5
Accelerate version: 1.2.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: NVIDIA A100-SXM4-80GB

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Code Snippet

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")

model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16)
model.to("cuda:0")  # Use GPU if available

conversation = [
    {"role": "user", "content": [{"type": "text", "text": "What is the capital of France?"}]},
    {"role": "assistant", "content": [{"type": "text", "text": "The capital of France is Paris."}]},
    {"role": "user", "content": [{"type": "text", "text": "What is the capital of India?"}]},
]

text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(text = prompt, images=None,  return_tensors="pt").to("cuda:0")

output = model.generate(**inputs, max_new_tokens=100)

generated_text = processor.decode(output[0], skip_special_tokens=True)
print(generated_text)

Error Message

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  6.30it/s]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.
Traceback (most recent call last):
  File "/home/gpuuser3/reddit_data_annotation/Llava_play.py", line 20, in <module>
    output = model.generate(**inputs, max_new_tokens=100)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/transformers/generation/utils.py", line 2252, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/transformers/generation/utils.py", line 3251, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gpuuser3/miniconda3/envs/mmsd_annotation/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 874, in forward
    inputs_embeds = inputs_embeds.to(image_features.dtype)
                                     ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'dtype'

Expected behavior

I was trying to generate a response using pure-text input. I was assuming that simply setting images=None in processor() function call would achieve this. But it gives the mentioned error. Is there something I'm missing here?

The text was updated successfully, but these errors were encountered:

sinngam-khaidem added the bug label Dec 26, 2024

sinngam-khaidem changed the title ~~Text Only generation using LlaVa Next~~ Text Only input using LlaVa Next Dec 26, 2024

giobin mentioned this issue Dec 26, 2024

LLaVa 1.5 and 1.6 not working with text-only inputs #35424

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Only input using LlaVa Next #35421

Text Only input using LlaVa Next #35421

sinngam-khaidem commented Dec 26, 2024 •

edited

Loading

Text Only input using LlaVa Next #35421

Text Only input using LlaVa Next #35421

Comments

sinngam-khaidem commented Dec 26, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sinngam-khaidem commented Dec 26, 2024 •

edited

Loading