Skip to content

vLLM 0.14.0: _preprocess_chat Return Value Mismatch Causes ValueErrorΒ #1641

@kelvin715

Description

@kelvin715

vLLM 0.14.0: _preprocess_chat Return Value Mismatch Causes ValueError

Problem Description

When using vLLM 0.14.0, the OpenAIServingChat._preprocess_chat method returns a different number of values than expected, causing a ValueError: not enough values to unpack (expected 3, got 2) error.

Error Message

ValueError: not enough values to unpack (expected 3, got 2)

Error Location:

  • File: src/prime_rl/inference/vllm/serving_chat_with_tokens.py
  • Line: 100 (before fix)
  • Method: OpenAIServingChatWithTokens.create_chat_completion_with_tokens

Problem Details

Expected Return Value Format

The code expects _preprocess_chat to return 3 values:

conversation, request_prompts, engine_prompts = await self._preprocess_chat(...)

Actual Return Value Format

In vLLM 0.14.0, _preprocess_chat actually returns only 2 values:

# Actual return type: tuple[list[ConversationMessage], list[TokensPrompt]]
conversation, engine_prompts = await self._preprocess_chat(...)

Verification

Verified by inspecting vLLM 0.14.0 source code and runtime behavior:

from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
import inspect

sig = inspect.signature(OpenAIServingChat._preprocess_chat)
print('Return annotation:', sig.return_annotation)
# Output: tuple[list[vllm.entrypoints.chat_utils.ConversationMessage], list[vllm.inputs.data.TokensPrompt]]

# Actual return statement (from serving_engine.py:855)
# return conversation, [engine_prompt]

Environment Information

  • vLLM Version: 0.14.0
  • Python Version: 3.12.0
  • Operating System: Linux
  • Related Code: OpenAIServingChatWithTokens class in prime-rl project

Reproduction Steps

  1. Start inference server with vLLM 0.14.0
  2. Call /v1/chat/completions/tokens endpoint
  3. Trigger create_chat_completion_with_tokens method
  4. Unpacking error occurs when executing _preprocess_chat call

Workaround

Temporary fix implemented in serving_chat_with_tokens.py:

# vLLM 0.14.0's _preprocess_chat returns (conversation, [engine_prompt])
# instead of (conversation, request_prompts, engine_prompts)
conversation, engine_prompts = await self._preprocess_chat(...)

# Construct request_prompts from engine_prompts for compatibility
request_prompts = []
for engine_prompt in engine_prompts:
    request_prompts.append({
        "prompt_token_ids": engine_prompt.get("prompt_token_ids", []),
    })

Suggestions

  1. Documentation Update: vLLM documentation should clearly state the return value format of _preprocess_chat across different versions
  2. API Stability: Recommend vLLM maintain backward compatibility, or provide migration guides when updating versions
  3. Type Annotations: Ensure type annotations match actual return values

Related Files

  • vLLM Source: vllm/entrypoints/openai/serving_engine.py (line 740-856)
  • Affected Code: src/prime_rl/inference/vllm/serving_chat_with_tokens.py (line 100-124)

Additional Information

This issue affects functionality using the custom /v1/chat/completions/tokens endpoint, which supports pre-tokenized prompt input. This is important for certain RL training scenarios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions