Skip to content

EPP scheduling missing important Chat Completion fields #1868

@bbrowning

Description

@bbrowning

What would you like to be added:

The current Chat Completion types in the EPP scheduling code is missing fields that are required to accurately create a prompt like the model will actually see from the Chat Completion request. This means downstream projects, like llm-d-inference-scheduler, also cannot accurately do things like prefix-aware routing because they cannot reconstruct the actual prompts that models will see.

Important fields missing from Message that make their way into prompts for most newer models:

  • reasoning
  • tool_calls
  • tool_call_id

There are also additional content types supported by inference servers like vLLM for multi-modal content other than just the text and image_url implemented here - see https://docs.vllm.ai/en/v0.11.0/examples/online_serving/openai_chat_completion_client_for_multimodal.html for an example of some of those.

To be future-proof, you'd want to support the entire Chat Completions API surface as well as additional fields, like reasoning that are commonly used in model chat templates and/or libraries that turn requests to prompts like MistralTokenizer or openai-harmony used by vLLM. Basically, a superset of all fields accepted by every supported inference server that can influence the prompt itself is what's required to do this accurately.

Why is this needed:

Without supporting all the Chat Completions fields (both part of the official spec and otherwise implemented by vLLM or other inference servers), the usefulness of prefix-aware routing and similar concepts in real production scenarios will be quite limited as the projects calculating the prefixes (such as llm-d-inference-scheduler) will not have all the necessary information available to accurately reconstruct prompts as the inference server will actually see them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions