[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests #26663

hl475 · 2025-10-12T18:19:40Z

Purpose

This PR makes K/V cache unbinding robust across cache layouts by detecting the axis of size 2 at runtime instead of assuming it sits at dim=1. This fixes unpacking errors seen when kv_cache is shaped with the K/V dimension elsewhere (e.g., dim=0).

When running tests tests/v1/attention/test_attention_backends.py on H100, the following line

key_cache, value_cache = kv_cache.unbind(1)

failed with ValueError: too many values to unpack (expected 2)

~~This change avoids relying on a specific layout and works with both older and newer cache shapes.~~

Per discussion below, we updated the test to provide Triton the same (num_blocks, 2, …) KV cache layout that FlashInfer uses.

Test Plan

pytest -v -s tests/v1/attention/test_attention_backends.py

on H100

Test Result

============================================================= 15 passed, 1 warning in 48.49s ==============================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/v1/attention/backends/triton_attn.py

tdoublep · 2025-10-12T18:26:51Z

Could you explain exactly which test is failing? Could it just be that the test is using the "wrong" layout?

hl475 · 2025-10-12T18:30:49Z

Could you explain exactly which test is failing? Could it just be that the test is using the "wrong" layout?

Thanks @tdoublep ! The failed test is from https://github.com/vllm-project/vllm/blob/a6049be73cb965bad04f6657de6c4d98261a5237/tests/v1/attention/test_attention_backends.py where all 15 tests are failing on H100 with the same unpack error

tdoublep · 2025-10-12T18:34:11Z

OK. I think that the test should be modified to provide the KV cache with the correct layout. e.g., we can look at how it works for Flashinfer which has the same (num_blocks, 2, ...) layout.

Signed-off-by: Huamin Li <[email protected]>

hl475 · 2025-10-12T19:01:52Z

Thanks @tdoublep for suggestions! I updated my PR to only change the test. PTAL

tdoublep · 2025-10-12T19:06:51Z

Thank you. Curious now why this test failure wasn't caught as part of CI. Are we failing to trigger this test when we change the attention backend code?

hl475 · 2025-10-12T19:12:55Z

Thank you. Curious now why this test failure wasn't caught as part of CI. Are we failing to trigger this test when we change the attention backend code?

I don't think this test is currently running in CI. We are trying to enabling them from #26649 , so I found the failure.

yeqcharlotte · 2025-10-13T00:27:11Z

@tdoublep there's no H100 in the test queue. so pretty much most attention tests that are relevant have not been running. @hl475 and @simon-mo got a new one just set up this week.

seindum · 2025-10-13T02:00:33Z

This seems to be covered already by #26597.

yeqcharlotte

LGTM

tdoublep · 2025-10-14T17:44:19Z

there's no H100 in the test queue. so pretty much most attention tests that are relevant have not been running.

How come we can't run the attention tests on L4 where the other tests run?

tdoublep

LGTM (needs follow-up to enable test to run in CI)

tdoublep · 2025-10-14T17:46:05Z

Hmm CI failure suggests we are trying to run this test on CPU now?? Looking at the test job definition, I don't understand why the attention test is running?

yeqcharlotte

cpu tests pick up attention tests after this pr :(

tdoublep · 2025-10-15T06:30:13Z

@yeqcharlotte Do you understand how that can be happening? I'm a bit baffled tbh

tdoublep · 2025-10-15T11:41:09Z

It's like the CI job is trying to execute commands that are different to what is checked into the branch. It's really weird. I tried to create a clean branch (added you as co-author @hl475) with this change and triggered the failing job in CI to see if it is reproducible.

tdoublep · 2025-10-15T15:43:07Z

So after some investigation it looks like we are now generating the test pipeline automatically based on the files that have changed vllm-project/ci-infra#184

This PR changes a single test that should be run on GPU, and it trying to run it in the CPU jobs.

rzabarazesh · 2025-10-16T18:45:22Z

I have been investigating this. This isn't really a test filtering issue. The main problem is that these tests are orphaned and not being run anywhere in the first place. Test filtering is doing a "best guess" but ends up putting it in the wrong test group.

rzabarazesh · 2025-10-16T22:01:50Z

I fixed the CI issue above is now resolved in vllm-project/ci-infra#194.
As far as the signals are concerned this test is still orphaned however

tdoublep

LGTM

tdoublep · 2025-10-17T06:00:26Z

@rzabarazesh Thank you for the investigation and fix!

@yeqcharlotte It looks like we can't merge until you approve since you requested changes.

mergify bot added the v1 label Oct 12, 2025

hl475 marked this pull request as ready for review October 12, 2025 18:23

hl475 requested a review from tdoublep as a code owner October 12, 2025 18:23

chatgpt-codex-connector bot reviewed Oct 12, 2025

View reviewed changes

vllm/v1/attention/backends/triton_attn.py Outdated Show resolved Hide resolved

fix_kv_cache_unbind

81e81de

Signed-off-by: Huamin Li <[email protected]>

hl475 force-pushed the fix_kv_cache_unbind branch from 7798745 to 81e81de Compare October 12, 2025 18:57

hl475 changed the title ~~Triton attention: detect K/V axis when unbinding kv_cache (avoid hardcoded dim=1)~~ tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests Oct 12, 2025

yeqcharlotte changed the title ~~tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests~~ [CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests Oct 13, 2025

Merge branch 'main' into fix_kv_cache_unbind

54baf13

yeqcharlotte approved these changes Oct 14, 2025

View reviewed changes

yeqcharlotte enabled auto-merge (squash) October 14, 2025 15:37

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025

tdoublep approved these changes Oct 14, 2025

View reviewed changes

yeqcharlotte requested changes Oct 15, 2025

View reviewed changes

tdoublep mentioned this pull request Oct 15, 2025

[CI] Fix attention backend test for Triton #26907

Closed

5 tasks

rzabarazesh mentioned this pull request Oct 16, 2025

Improve test filtering logic vllm-project/ci-infra#194

Merged

Merge branch 'main' into fix_kv_cache_unbind

722828c

hl475 requested review from tdoublep and yeqcharlotte October 17, 2025 04:46

tdoublep approved these changes Oct 17, 2025

View reviewed changes

yeqcharlotte approved these changes Oct 18, 2025

View reviewed changes

yeqcharlotte merged commit c312320 into vllm-project:main Oct 18, 2025
21 checks passed

Uh oh!

[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests #26663

[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests #26663

Conversation

hl475 commented Oct 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

tdoublep commented Oct 12, 2025

Uh oh!

hl475 commented Oct 12, 2025

Uh oh!

tdoublep commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hl475 commented Oct 12, 2025

Uh oh!

tdoublep commented Oct 12, 2025

Uh oh!

hl475 commented Oct 12, 2025

Uh oh!

yeqcharlotte commented Oct 13, 2025

Uh oh!

seindum commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Oct 14, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Oct 15, 2025

Uh oh!

tdoublep commented Oct 15, 2025

Uh oh!

tdoublep commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzabarazesh commented Oct 16, 2025

Uh oh!

rzabarazesh commented Oct 16, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hl475 commented Oct 12, 2025 •

edited by github-actions bot

Loading

tdoublep commented Oct 12, 2025 •

edited

Loading

seindum commented Oct 13, 2025 •

edited

Loading

tdoublep commented Oct 14, 2025 •

edited

Loading

tdoublep commented Oct 15, 2025 •

edited

Loading