Skip to content

[Bug]: Mistral 3.1 Small Image inference is broken on 0.8.4Β #16675

@mgoin

Description

@mgoin

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

πŸ› Describe the bug

I tested that vllm==0.8.3 works fine and vllm==0.8.4 fails

Server:

vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic --disable-log-requests

Client:

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model_id = client.models.list().data[0].id

# Text inference
chat_response = client.chat.completions.create(
    model=model_id,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Who are you?"},
        ],
    }],
)
print("Text Chat completion output:", chat_response.choices[0].message.content)

# Single-image input inference
image_url = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
]
prompt = "What's in this image?"
for img in image_url:
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": prompt},
            {"type": "image_url", "image_url": {"url": img}},
        ],
    }]
    chat_response = client.chat.completions.create(model=model_id, messages=messages)
    print("Single image Chat completion output:", chat_response.choices[0].message.content)

Here is the stacktrace for the failure when the image request is sent to vllm==0.8.4

INFO:     127.0.0.1:52340 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR 04-15 17:28:02 [core.py:387] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 380, in run_engine_core
ERROR 04-15 17:28:02 [core.py:387]     engine_core.run_busy_loop()
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 402, in run_busy_loop
ERROR 04-15 17:28:02 [core.py:387]     self._process_engine_step()
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 431, in _process_engine_step
ERROR 04-15 17:28:02 [core.py:387]     outputs = self.step_fn()
ERROR 04-15 17:28:02 [core.py:387]               ^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 207, in step
ERROR 04-15 17:28:02 [core.py:387]     output = self.model_executor.execute_model(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 04-15 17:28:02 [core.py:387]     output = self.collective_rpc("execute_model",
ERROR 04-15 17:28:02 [core.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-15 17:28:02 [core.py:387]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-15 17:28:02 [core.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/utils.py", line 2378, in run_method
ERROR 04-15 17:28:02 [core.py:387]     return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-15 17:28:02 [core.py:387]     return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 242, in execute_model
ERROR 04-15 17:28:02 [core.py:387]     output = self.model_runner.execute_model(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-15 17:28:02 [core.py:387]     return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1002, in execute_model
ERROR 04-15 17:28:02 [core.py:387]     self._execute_mm_encoder(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 888, in _execute_mm_encoder
ERROR 04-15 17:28:02 [core.py:387]     self.encoder_cache[req_id][input_id] = scatter_mm_placeholders(
ERROR 04-15 17:28:02 [core.py:387]                                            ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387]   File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/utils.py", line 58, in scatter_mm_placeholders
ERROR 04-15 17:28:02 [core.py:387]     placeholders[is_embed] = embeds
ERROR 04-15 17:28:02 [core.py:387]     ~~~~~~~~~~~~^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] RuntimeError: shape mismatch: value tensor of shape [1980, 5120] cannot be broadcast to indexing result of shape [7920, 5120]

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions