minor: zero workspace buffer init for flashinfer trtllm-gen attn #22603

yyihuang · 2025-08-10T20:59:13Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

flashinfer v0.2.11.post3 updates: flashinfer-ai/flashinfer#1463

cc @elvischenv

Test Plan

Test Result

(Optional) Documentation Update

github-actions · 2025-08-10T20:59:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request addresses a critical correctness issue by initializing the FlashInfer workspace buffer with zeros using torch.zeros instead of torch.empty. This is necessary for the proper functioning of the TensorRT-LLM attention kernels in FlashInfer. The change is applied correctly in both the core library code and the test suite. My review includes suggestions to align the data type of the workspace buffer in the tests with the main implementation (torch.uint8) to ensure consistency and prevent potential data interpretation bugs.

gemini-code-assist · 2025-08-10T21:00:16Z

tests/kernels/attention/test_flashinfer_trtllm_attention.py

@@ -113,7 +113,7 @@ def test_flashinfer_trtllm_decode_with_baseline(
    kv_indices = torch.tensor(kv_indices, dtype=torch.int32)
    kv_last_page_lens = torch.tensor(kv_last_page_lens, dtype=torch.int32)

-    workspace_buffer = torch.empty(128 * 1024 * 1024, dtype=torch.int8)
+    workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.int8)


The workspace buffer is created with torch.int8 dtype, while the main implementation in vllm/v1/attention/backends/flashinfer.py uses torch.uint8. While this might not cause issues with a zero-initialized buffer, using an inconsistent data type can lead to subtle bugs if the underlying kernel has specific expectations about the data being signed or unsigned. For consistency and to prevent potential correctness issues, it's recommended to use torch.uint8 here.

Suggested change

workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.int8)

workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.uint8)

gemini-code-assist · 2025-08-10T21:00:17Z

tests/kernels/attention/test_flashinfer_trtllm_attention.py

@@ -247,7 +247,7 @@ def test_flashinfer_trtllm_prefill_with_baseline(
    kv_indices = torch.tensor(kv_indices, dtype=torch.int32)
    kv_last_page_lens = torch.tensor(kv_last_page_lens, dtype=torch.int32)

-    workspace_buffer = torch.empty(128 * 1024 * 1024, dtype=torch.int8)
+    workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.int8)


The workspace buffer here is created with torch.int8, which is inconsistent with the torch.uint8 used in the main implementation. To ensure consistency across the codebase and avoid potential issues related to signed versus unsigned byte interpretation by the FlashInfer kernel, it is advisable to use torch.uint8 for this buffer as well.

Suggested change

workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.int8)

workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.uint8)

elvischenv · 2025-08-11T01:27:20Z

vllm/v1/attention/backends/flashinfer.py

@@ -251,7 +251,7 @@ def __init__(self, kv_cache_spec: AttentionSpec, layer_names: list[str],

    def _get_workspace_buffer(self):
        if self._workspace_buffer is None:
-            self._workspace_buffer = torch.empty(
+            self._workspace_buffer = torch.zeros(


Also need to update in vllm/attention/backends/flashinfer.py

Updated. Thanks for your review!

Sorry for accidentally pushing to another PR. It's added now.

IwakuraRein · 2025-08-11T23:03:40Z

I have tested this pr on b200 and benchmarks/benchmark_serving.py passed with flashinfer-ai/flashinfer#1463. Arguments:

python3 ./benchmarks/benchmark_serving.py --model gpt-oss-120b --dataset-name random --ignore-eos --num-prompts 12288 --random-input-len 1024 --random-output-len 1024 --max-concurrency 4096

nvpohanh · 2025-08-13T01:08:05Z

@yyihuang Please fix the DCO issue (i.e. run git commit with -s flag)

Signed-off-by: Avery Yingyi Huang <[email protected]>

yyihuang · 2025-08-13T06:42:46Z

@nvpohanh I re-committed and DCO should pass now. Thank you!

## 📌 Description The duplicate zero_init should be fixed. But we got some crash reported from DLFW. So we revert it in #1459 and make 0.2.11.post1. After this fix, **workspace buffer passed into any trtllm-gen attn interface must be zero-initialized**. This PR is to enable this optimization. It should be merged and released only after these two are tested. - sgl-project/sglang#9065 - vllm-project/vllm#22603 ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes

mgoin

LGTM

yewentao256

Looks good to me, thanks for the work!

nvpohanh · 2025-08-15T05:57:14Z

CI error doesn't seem to be caused by this change

…m-project#22603) Signed-off-by: Yiwen Chen <[email protected]>

…m-project#22603)

…m-project#22603) Signed-off-by: Duncan Moss <[email protected]>

…m-project#22603) Signed-off-by: Boyuan Feng <[email protected]>

…m-project#22603)

…m-project#22603) Signed-off-by: Xiao Yu <[email protected]>

…m-project#22603)

yyihuang requested review from tlrmchlsmth, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners August 10, 2025 20:59

mergify bot added the v1 label Aug 10, 2025

gemini-code-assist bot reviewed Aug 10, 2025

View reviewed changes

yyihuang force-pushed the init_zero_workspace branch from fc85b03 to 2f56b6a Compare August 10, 2025 21:08

elvischenv suggested changes Aug 11, 2025

View reviewed changes

yyihuang marked this pull request as draft August 11, 2025 08:36

yyihuang mentioned this pull request Aug 11, 2025

fix: remove redundant zero_init reverted by #1459 flashinfer-ai/flashinfer#1463

Merged

5 tasks

yyihuang force-pushed the init_zero_workspace branch from 3d73616 to 2f56b6a Compare August 11, 2025 21:16

yyihuang marked this pull request as ready for review August 11, 2025 22:09

nvpohanh approved these changes Aug 13, 2025

View reviewed changes

yyihuang added 3 commits August 13, 2025 02:38

init

379f9a6

Signed-off-by: Avery Yingyi Huang <[email protected]>

upd test

3e3e279

Signed-off-by: Avery Yingyi Huang <[email protected]>

upd

1a3a34e

Signed-off-by: Avery Yingyi Huang <[email protected]>

yyihuang force-pushed the init_zero_workspace branch from 963518a to 1a3a34e Compare August 13, 2025 06:39

yyihuang requested a review from yewentao256 as a code owner August 13, 2025 06:39

mgoin approved these changes Aug 13, 2025

View reviewed changes

mgoin enabled auto-merge (squash) August 13, 2025 12:45

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 13, 2025

yewentao256 approved these changes Aug 13, 2025

View reviewed changes

Merge branch 'main' into init_zero_workspace

bda441a

Merge branch 'main' into init_zero_workspace

9db5edc

mgoin merged commit 1723ef1 into vllm-project:main Aug 15, 2025
40 of 41 checks passed

666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

6199971

…m-project#22603) Signed-off-by: Yiwen Chen <[email protected]>

juuice-lee pushed a commit to juuice-lee/vllm-moe.code that referenced this pull request Aug 18, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

a512986

…m-project#22603)

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

627d061

…m-project#22603)

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

a5ef63d

…m-project#22603)

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

4ab6bd4

…m-project#22603) Signed-off-by: Duncan Moss <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 21, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

13dede3

…m-project#22603) Signed-off-by: Boyuan Feng <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

02fd5a0

…m-project#22603)

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

4de2a63

…m-project#22603) Signed-off-by: Xiao Yu <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

704655b

…m-project#22603) Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

7a2fa04

…m-project#22603)

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

a2fe222

…m-project#22603)

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

minor: zero workspace buffer init for flashinfer trtllm-gen attn (vll…

a6f0086

…m-project#22603)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

minor: zero workspace buffer init for flashinfer trtllm-gen attn #22603

minor: zero workspace buffer init for flashinfer trtllm-gen attn #22603

Uh oh!

yyihuang commented Aug 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 10, 2025

Uh oh!

gemini-code-assist bot Aug 10, 2025

Uh oh!

elvischenv Aug 11, 2025

Uh oh!

yyihuang Aug 11, 2025

Uh oh!

yyihuang Aug 11, 2025

Uh oh!

IwakuraRein commented Aug 11, 2025

Uh oh!

nvpohanh commented Aug 13, 2025

Uh oh!

yyihuang commented Aug 13, 2025

Uh oh!

mgoin left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

nvpohanh commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

	workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.int8)
	workspace_buffer = torch.zeros(128 * 1024 * 1024, dtype=torch.uint8)

Uh oh!

minor: zero workspace buffer init for flashinfer trtllm-gen attn #22603

minor: zero workspace buffer init for flashinfer trtllm-gen attn #22603

Uh oh!

Conversation

yyihuang commented Aug 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

elvischenv Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

yyihuang Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

yyihuang Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

IwakuraRein commented Aug 11, 2025

Uh oh!

nvpohanh commented Aug 13, 2025

Uh oh!

yyihuang commented Aug 13, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

nvpohanh commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

yyihuang commented Aug 10, 2025 •

edited by github-actions bot

Loading