InternVL3_5 Flash models: Flash mode doesn't support batches

Hi all, anytime I try batch inference using InternVL Flash I get the error:

`RuntimeError: Tensors must have same number of dimensions: got 2 and 1`

Looking further into it, the way that `modeling_internvl_chat.py` is written it doesn't seem like the code will work with a batch greater than 1. Is this by design?

When flash_mode is used, this func is called: https://huggingface.co/OpenGVLab/InternVL3_5-2B-Flash/blob/main/modeling_internvl_chat.py#L562

Which then uses `self.get_image_num_per_sample(input_ids) / 256`

And then this method tries to reduce down to 1 dim here:
https://huggingface.co/OpenGVLab/InternVL3_5-2B-Flash/blob/main/modeling_internvl_chat.py#L284

So in the case that there is more than 1 dimension I get the above error and am unsure if this `flash_mode` is meant to support batches and why.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InternVL3_5 Flash models: Flash mode doesn't support batches #1239

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

InternVL3_5 Flash models: Flash mode doesn't support batches #1239

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions