[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding #26291

astralord · 2025-10-06T11:26:54Z

Hi team!

Purpose

We've noticed that that the recent PR doesn't fully fix gpt-oss + streaming + speculative-decoding issue, for example generated messages end abruptly. This happens because multiple tokens can relate to different channels (e.g. <final>, <analysis>, None) in one decoding stage. This PR handles it.

Test Plan

Test Result

mergify · 2025-10-06T11:27:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @astralord.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request effectively addresses a bug in handling multiple channels for gpt-oss with speculative decoding, particularly in streaming mode. The changes introduce a more robust mechanism by tracking the state for each token, grouping them by channel and recipient, and then constructing the delta messages. This ensures that channel switches within a single decoding step are handled correctly. The related logging improvements are also a good addition, providing more comprehensive output. I have one suggestion to improve code readability and reduce the risk of future bugs by removing a magic number.

vllm/entrypoints/openai/serving_chat.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-06T11:29:35Z

vllm/entrypoints/openai/serving_chat.py

+                                if group_text:
+                                    tool_messages.append(DeltaToolCall(
+                                        index=base_index - 1,  # Use the index of the current tool call


Avoid off-by-one when streaming ongoing tool call arguments

Within the Harmony streaming branch, base_index is derived from harmony_parser.messages to count previously completed tool calls, the same value the prior implementation used directly for both opening a call and streaming additional arguments. The new logic subtracts one when emitting argument deltas (index=base_index - 1). When a tool call spans multiple decoding iterations, base_index remains 0 (no completed calls yet), so subsequent argument chunks are sent with index=-1, which clients cannot associate with the active tool call. This breaks streaming for any tool call whose arguments arrive over multiple chunks.

Useful? React with 👍 / 👎.

astralord · 2025-10-06T11:44:01Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-06T11:46:37Z

vllm/entrypoints/openai/serving_chat.py

+                        groups = []
+                        for channel, recipient, text in token_states:
+                            if not text:
+                                continue
+                            if groups and groups[-1]['channel'] == channel and groups[-1]['recipient'] == recipient:
+                                groups[-1]['text'] += text
                            else:


Emit tool call deltas even when first tokens carry no text

The new grouping logic skips any token whose last_content_delta is empty before building groups. When a speculative chunk switches channels (e.g., starts a new commentary function call) the parser often changes current_channel/current_recipient using structural tokens that have no textual delta. Previously, the code still emitted a DeltaToolCall whenever prev_recipient != cur_recipient even if delta_text was empty, so clients immediately saw the tool name. With the if not text: continue filter, those structural tokens are ignored and no DeltaToolCall is streamed until arguments appear; calls with empty arguments are never surfaced at all. This regression breaks streaming compatibility for Harmony tool calls that begin with metadata-only tokens.

Useful? React with 👍 / 👎.

astralord · 2025-10-06T11:58:10Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-06T12:01:28Z

vllm/entrypoints/openai/serving_chat.py

+                        # next_tool_index tracks the index for the next NEW tool call
+                        next_tool_index = base_index
+


Increment tool index when new call follows ongoing call

next_tool_index is initialised from base_index and only increased when prev_recipient != group_recipient. If this chunk first streams more arguments for an already-active tool call and then starts a second tool call, the continuation leaves next_tool_index unchanged, so the new call is emitted with the same index as the ongoing call and its arguments are also written under that index. The OpenAI streaming API expects tool call indexes to be unique and monotonically increasing, so this mislabels calls and causes clients to merge two separate calls. Consider advancing next_tool_index when a continuation is emitted or computing base_index to include the in-flight call.

Useful? React with 👍 / 👎.

Signed-off-by: Aleksandr Samarin <[email protected]>

…-multiple-channels

astralord · 2025-10-06T12:32:52Z

@codex review

chatgpt-codex-connector · 2025-10-06T12:38:12Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

astralord · 2025-10-06T12:56:36Z

/gemini review

gemini-code-assist

Code Review

This pull request provides a much-needed fix for handling multiple channels in gpt-oss streaming with speculative decoding. The previous implementation had a flaw where it only considered the state after the last token in a chunk, which could lead to data loss or incorrect message construction if the channel or recipient changed within the chunk.

The new approach is robust and correctly handles this complex scenario. Key improvements include:

Tracking the state (channel, recipient, delta) for each individual token.
Grouping consecutive tokens with the same state for efficient processing.
Refactoring the logic to build a single, comprehensive DeltaMessage that can contain content, reasoning, and tool calls from a single chunk.
Improving the indexing logic for tool calls, correctly handling calls that span across multiple streamed chunks.
Enhancing logging to be more comprehensive.

The changes significantly increase the correctness and reliability of streaming for gpt-oss models. The implementation is well-structured, and the added complexity is justified by the problem it solves. I don't see any issues with the proposed changes.

astralord requested review from aarnphm and chaunceyjiang as code owners October 6, 2025 11:26

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Oct 6, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 6, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 6, 2025

mergify bot added the needs-rebase label Oct 6, 2025

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

astralord force-pushed the fix-gpt-oss-with-speculative-decoding-handle-multiple-channels branch 2 times, most recently from a953d9c to ede4584 Compare October 6, 2025 11:37

mergify bot removed the needs-rebase label Oct 6, 2025

astralord force-pushed the fix-gpt-oss-with-speculative-decoding-handle-multiple-channels branch from 704867c to 3c1bf55 Compare October 6, 2025 11:40

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

astralord force-pushed the fix-gpt-oss-with-speculative-decoding-handle-multiple-channels branch from 3c1bf55 to e1f14dd Compare October 6, 2025 11:55

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

astralord added 7 commits October 6, 2025 12:19

Handle multiple channels in one decoding stage

fc7d331

Signed-off-by: Aleksandr Samarin <[email protected]>

Fixed base_index calculation innefficiency

32f84dd

Signed-off-by: Aleksandr Samarin <[email protected]>

Avoid off-by-one when streaming ongoing tool call arguments

3e5b734

Signed-off-by: Aleksandr Samarin <[email protected]>

Removed magic constant

c649c72

Signed-off-by: Aleksandr Samarin <[email protected]>

Fixed invalid syntax

a8c2ef5

Signed-off-by: Aleksandr Samarin <[email protected]>

Fix ruff/codex errors

c2dee6f

Signed-off-by: Aleksandr Samarin <[email protected]>

Increment tool index when new call follows ongoing call

3ad1d7b

Signed-off-by: Aleksandr Samarin <[email protected]>

astralord force-pushed the fix-gpt-oss-with-speculative-decoding-handle-multiple-channels branch from e4f6360 to 3ad1d7b Compare October 6, 2025 12:19

Merge branch 'main' into fix-gpt-oss-with-speculative-decoding-handle…

d7c3e18

…-multiple-channels

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding #26291

[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding #26291

astralord commented Oct 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 6, 2025

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

		# next_tool_index tracks the index for the next NEW tool call
		next_tool_index = base_index

Uh oh!

[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding #26291

Are you sure you want to change the base?

[MODEL] Fix handling of multiple channels for gpt-oss with speculative decoding #26291

Conversation

astralord commented Oct 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 6, 2025

Uh oh!

astralord commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

astralord commented Oct 6, 2025 •

edited by github-actions bot

Loading