[CI] Fix `tests/evals/gsm8k/test_gsm8k_correctness.py` for `Qwen3-Next-80B-A3B-NVFP4-EP2` by LucasWilkinson · Pull Request #34999 · vllm-project/vllm

LucasWilkinson · 2026-02-20T23:08:06Z

#34077 broke pytest -s -v tests/evals/gsm8k/test_gsm8k_correctness.py -k "Qwen3-Next-80B-A3B-NVFP4-EP2" --config-list-file=tests/evals/gsm8k/configs/models-blackwell.txt with what appears to be an overly conservative assert

Test Plan:

pytest -s -v tests/evals/gsm8k/test_gsm8k_correctness.py -k "Qwen3-Next-80B-A3B-NVFP4-EP2" --config-list-file=tests/evals/gsm8k/configs/models-blackwell.txt

Now passes

cc @vadiklyutiy

…es in GDN attention The GDN attention metadata builder had an assertion that prevented batches containing both regular decode requests and speculative decode requests. This assertion was introduced in vllm-project#34077 as a defensive check, but it is overly conservative. Mixed batches naturally occur during MTP speculative decoding when a request enters its first decode step (no draft tokens yet) while other requests are already spec-decoding. The metadata builder's else branch (line 247) already computes separate spec/non-spec tensors correctly for this case, and the model forward pass in qwen3_next.py handles mixed batches by separating, processing, and merging tokens independently. CUDAGraphs are unaffected: the two CUDAGraph preparation blocks already exclude mixed batches via their guard conditions (num_decodes==0 and num_spec_decodes==0 respectively), so mixed batches fall back to eager execution. Fixes vllm-project#34993 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

dosubot · 2026-02-20T23:08:14Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request addresses a bug in the GDN attention backend by removing an assertion that was overly conservative. The assertion prevented batches from containing both speculative and non-speculative decode requests simultaneously. My analysis of the surrounding code confirms that the logic is designed to handle such mixed batches by partitioning requests and preparing separate metadata for each type. Therefore, removing the assertion is the correct fix. The change is minimal, targeted, and seems to resolve the issue as described.

vadiklyutiy · 2026-02-21T00:03:37Z

As much as I understand this code(and around) there shouldn't be case where both num_decodes and num_spec_decodes presents. Without spec decode - all go to num_decodes with spec decode - all go to num_spec_decodes.

Maybe some prefill was counted as num_decode (I recall DS has such issue).

LucasWilkinson · 2026-02-21T04:21:54Z

Ya, decode is a loose term, short enough prefill chunks are considered "decodes" from a reordering perspective

vadiklyutiy · 2026-02-21T11:00:02Z

Ya, decode is a loose term, short enough prefill chunks are considered "decodes" from a reordering perspective

I am not sure that code of this function will work correctly in case we count prefill as decode as well as the model implementation.

What is the reason that short prefill we consider as decode?

LucasWilkinson changed the title ~~[Bugfix] Fix tests/evals/gsm8k/test_gsm8k_correctness.py for Qwen3-Next-80B-A3B-NVFP4-EP2~~ [CI] Fix tests/evals/gsm8k/test_gsm8k_correctness.py for Qwen3-Next-80B-A3B-NVFP4-EP2 Feb 20, 2026

mergify bot added qwen Related to Qwen models v1 labels Feb 20, 2026

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

LucasWilkinson requested review from DarkLight1337, mgoin and robertgshaw2-redhat February 20, 2026 23:11

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 21, 2026

DarkLight1337 approved these changes Feb 21, 2026

View reviewed changes

cjackal mentioned this pull request Feb 21, 2026

[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches #34871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[CI] Fix `tests/evals/gsm8k/test_gsm8k_correctness.py` for `Qwen3-Next-80B-A3B-NVFP4-EP2`#34999

[CI] Fix `tests/evals/gsm8k/test_gsm8k_correctness.py` for `Qwen3-Next-80B-A3B-NVFP4-EP2`#34999
LucasWilkinson wants to merge 1 commit intovllm-project:mainfrom
neuralmagic:lwilkinson/fix-gdn-mixed-spec-decode-assert

LucasWilkinson commented Feb 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

dosubot bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

vadiklyutiy commented Feb 21, 2026

Uh oh!

LucasWilkinson commented Feb 21, 2026

Uh oh!

vadiklyutiy commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

LucasWilkinson commented Feb 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dosubot bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

vadiklyutiy commented Feb 21, 2026

Uh oh!

LucasWilkinson commented Feb 21, 2026

Uh oh!

vadiklyutiy commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasWilkinson commented Feb 20, 2026 •

edited by github-actions bot

Loading