-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
π Describe the bug
I tested that vllm==0.8.3
works fine and vllm==0.8.4
fails
Server:
vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic --disable-log-requests
Client:
from openai import OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model_id = client.models.list().data[0].id
# Text inference
chat_response = client.chat.completions.create(
model=model_id,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Who are you?"},
],
}],
)
print("Text Chat completion output:", chat_response.choices[0].message.content)
# Single-image input inference
image_url = [
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
]
prompt = "What's in this image?"
for img in image_url:
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": img}},
],
}]
chat_response = client.chat.completions.create(model=model_id, messages=messages)
print("Single image Chat completion output:", chat_response.choices[0].message.content)
Here is the stacktrace for the failure when the image request is sent to vllm==0.8.4
INFO: 127.0.0.1:52340 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR 04-15 17:28:02 [core.py:387] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 380, in run_engine_core
ERROR 04-15 17:28:02 [core.py:387] engine_core.run_busy_loop()
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 402, in run_busy_loop
ERROR 04-15 17:28:02 [core.py:387] self._process_engine_step()
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 431, in _process_engine_step
ERROR 04-15 17:28:02 [core.py:387] outputs = self.step_fn()
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 207, in step
ERROR 04-15 17:28:02 [core.py:387] output = self.model_executor.execute_model(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 04-15 17:28:02 [core.py:387] output = self.collective_rpc("execute_model",
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-15 17:28:02 [core.py:387] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/utils.py", line 2378, in run_method
ERROR 04-15 17:28:02 [core.py:387] return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-15 17:28:02 [core.py:387] return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 242, in execute_model
ERROR 04-15 17:28:02 [core.py:387] output = self.model_runner.execute_model(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-15 17:28:02 [core.py:387] return func(*args, **kwargs)
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1002, in execute_model
ERROR 04-15 17:28:02 [core.py:387] self._execute_mm_encoder(scheduler_output)
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 888, in _execute_mm_encoder
ERROR 04-15 17:28:02 [core.py:387] self.encoder_cache[req_id][input_id] = scatter_mm_placeholders(
ERROR 04-15 17:28:02 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/vllm/v1/worker/utils.py", line 58, in scatter_mm_placeholders
ERROR 04-15 17:28:02 [core.py:387] placeholders[is_embed] = embeds
ERROR 04-15 17:28:02 [core.py:387] ~~~~~~~~~~~~^^^^^^^^^^
ERROR 04-15 17:28:02 [core.py:387] RuntimeError: shape mismatch: value tensor of shape [1980, 5120] cannot be broadcast to indexing result of shape [7920, 5120]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
DarkLight1337
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working