feat: add lse return on trtllm-gen attention #1566

yyihuang · 2025-08-24T22:34:11Z

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

gemini-code-assist

Summary of Changes

Hello @yyihuang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the capability to return the Log-Sum-Exp (LSE) value from the trtllm-gen attention kernels. This enhancement provides access to intermediate attention computation results, which can be crucial for debugging, advanced analysis, or implementing specific loss functions that rely on these values. The changes span both the CUDA kernel implementations and their Python bindings, ensuring that LSE can be optionally returned from both prefill and decode operations.

Highlights

CUDA Kernel Update: The core trtllm_paged_attention_launcher CUDA kernel now accepts an lse (Log-Sum-Exp) output parameter, enabling the direct return of these intermediate attention computation values.
Python API Enhancement: The Python API functions, specifically trtllm_batch_decode_with_kv_cache, trtllm_batch_context_with_kv_cache, and their MLA (Multi-Layer Attention) counterparts, have been enhanced to optionally return the lse tensor.
Automatic Tensor Allocation: Automatic allocation of the lse tensor is now supported when the return_lse flag is enabled and no lse tensor is explicitly provided, simplifying usage.
Comprehensive Test Coverage: Unit tests have been updated across various attention functions to include verification of the returned lse values against established reference implementations, ensuring correctness and numerical stability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-08-24T22:40:46Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

…n attention (#1584)  ## 📌 Description **workspace_buffer arrangement** - on main branch paged_attention: multiCtasKvCounter or semaphores | multiCtasKvScratch ragged_attention: softmax | multiCtasKvCounter or semaphores | multiCtasKvScratch - on PR branch: softmax (optional) | multiCtasKvScratch | multiCtasKvCounter or semaphores (last 8MB of 128 MB) The range of semaphores must be fixed across multiple execution, since we are not clearing the buffer by zeros explicitly any more. related PR: #1463 And #1566 (WIP) depends on this. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

…m-gen-lse

zhyncs · 2025-08-29T03:20:57Z

@yzh119 @yyihuang what's the progress about this pr

zhyncs · 2025-08-29T03:21:25Z

Can we merge this before v0.3.0 @yzh119

…m-gen-lse

yzh119 · 2025-09-05T07:12:19Z

@yyihuang fyi, c5822e7 should make bf16 mla tests passed (but unfortunately, fp8 UT will fail).

yzh119 · 2025-09-05T07:26:04Z

91e3b83 this commit should fix fp8.

The fundamental reason is because trtllm kernels use internal scale for rowsum/rowmax which might not align with provided bmm scale. For fp8, the reason we need to - log2(448) is that the logits is multipled with 448 (maximum value of fp8 e4m3) to improve numerical stability.

…nto trtllm-gen-lse

…m-gen-lse

pavanimajety · 2026-01-09T17:14:21Z

@yyihuang Is there a plan to revive this PR?? @yzh119

yyihuang · 2026-01-12T20:11:58Z

@yyihuang Is there a plan to revive this PR?? @yzh119

We can move it to #2332.

init

2966a40

yyihuang marked this pull request as draft August 24, 2025 22:34

gemini-code-assist bot reviewed Aug 24, 2025

View reviewed changes

yyihuang added 4 commits August 24, 2025 18:41

fix

0672394

upd

b0dd429

upd

4c4c26d

upd

bdaa8de

yyihuang mentioned this pull request Aug 26, 2025

fix: semaphoress must be at the fixed range in workspace buffer on trtllm_gen attention #1584

Merged

5 tasks

yyihuang added 9 commits August 26, 2025 18:00

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

4a80a89

…m-gen-lse

fix

7630bc7

upd allocator

455c355

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

d3024b3

…m-gen-lse

upd

8f408c6

cmt

160e816

stash

3df3a63

add assertion

490e217

upd

4bd7c4b

zhyncs added the priority: must have (P0) label Aug 29, 2025

zhyncs assigned yzh119 Aug 29, 2025

yyihuang added 7 commits August 29, 2025 17:51

upd

4a48e27

cleanup

185bcf3

upd

7e5a406

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

bd67f0f

…m-gen-lse

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

3169988

…m-gen-lse

upd

d273b8d

fix

99d84ec

yyihuang and others added 2 commits September 4, 2025 12:30

upd

77a1b1d

upd

c5822e7

yzh119 added 2 commits September 5, 2025 03:21

another fix

91e3b83

upd

abccba2

yyihuang added 7 commits September 5, 2025 10:14

upd

eb09f50

init

0ae5f55

upd

3d16e14

upd

cde21f7

Merge branch 'fix_test_workspace' of github.com:yyihuang/flashinfer i…

4997027

…nto trtllm-gen-lse

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

dfb1056

…m-gen-lse

add prefill

80e54d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add lse return on trtllm-gen attention #1566

feat: add lse return on trtllm-gen attention #1566

Uh oh!

yyihuang commented Aug 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot commented Aug 24, 2025

Uh oh!

zhyncs commented Aug 29, 2025

Uh oh!

zhyncs commented Aug 29, 2025

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

pavanimajety commented Jan 9, 2026

Uh oh!

yyihuang commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add lse return on trtllm-gen attention #1566

Are you sure you want to change the base?

feat: add lse return on trtllm-gen attention #1566

Uh oh!

Conversation

yyihuang commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Aug 24, 2025

Uh oh!

zhyncs commented Aug 29, 2025

Uh oh!

zhyncs commented Aug 29, 2025

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

yzh119 commented Sep 5, 2025

Uh oh!

pavanimajety commented Jan 9, 2026

Uh oh!

yyihuang commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yyihuang commented Aug 24, 2025 •

edited

Loading