-
Notifications
You must be signed in to change notification settings - Fork 190
Open
Description
vLLM 0.14.0: _preprocess_chat Return Value Mismatch Causes ValueError
Problem Description
When using vLLM 0.14.0, the OpenAIServingChat._preprocess_chat method returns a different number of values than expected, causing a ValueError: not enough values to unpack (expected 3, got 2) error.
Error Message
ValueError: not enough values to unpack (expected 3, got 2)
Error Location:
- File:
src/prime_rl/inference/vllm/serving_chat_with_tokens.py - Line: 100 (before fix)
- Method:
OpenAIServingChatWithTokens.create_chat_completion_with_tokens
Problem Details
Expected Return Value Format
The code expects _preprocess_chat to return 3 values:
conversation, request_prompts, engine_prompts = await self._preprocess_chat(...)Actual Return Value Format
In vLLM 0.14.0, _preprocess_chat actually returns only 2 values:
# Actual return type: tuple[list[ConversationMessage], list[TokensPrompt]]
conversation, engine_prompts = await self._preprocess_chat(...)Verification
Verified by inspecting vLLM 0.14.0 source code and runtime behavior:
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
import inspect
sig = inspect.signature(OpenAIServingChat._preprocess_chat)
print('Return annotation:', sig.return_annotation)
# Output: tuple[list[vllm.entrypoints.chat_utils.ConversationMessage], list[vllm.inputs.data.TokensPrompt]]
# Actual return statement (from serving_engine.py:855)
# return conversation, [engine_prompt]Environment Information
- vLLM Version: 0.14.0
- Python Version: 3.12.0
- Operating System: Linux
- Related Code:
OpenAIServingChatWithTokensclass inprime-rlproject
Reproduction Steps
- Start inference server with vLLM 0.14.0
- Call
/v1/chat/completions/tokensendpoint - Trigger
create_chat_completion_with_tokensmethod - Unpacking error occurs when executing
_preprocess_chatcall
Workaround
Temporary fix implemented in serving_chat_with_tokens.py:
# vLLM 0.14.0's _preprocess_chat returns (conversation, [engine_prompt])
# instead of (conversation, request_prompts, engine_prompts)
conversation, engine_prompts = await self._preprocess_chat(...)
# Construct request_prompts from engine_prompts for compatibility
request_prompts = []
for engine_prompt in engine_prompts:
request_prompts.append({
"prompt_token_ids": engine_prompt.get("prompt_token_ids", []),
})Suggestions
- Documentation Update: vLLM documentation should clearly state the return value format of
_preprocess_chatacross different versions - API Stability: Recommend vLLM maintain backward compatibility, or provide migration guides when updating versions
- Type Annotations: Ensure type annotations match actual return values
Related Files
- vLLM Source:
vllm/entrypoints/openai/serving_engine.py(line 740-856) - Affected Code:
src/prime_rl/inference/vllm/serving_chat_with_tokens.py(line 100-124)
Additional Information
This issue affects functionality using the custom /v1/chat/completions/tokens endpoint, which supports pre-tokenized prompt input. This is important for certain RL training scenarios.
kelvin715
Metadata
Metadata
Assignees
Labels
No labels