[None][chore] test latest flashinfer #10522

nvchenghaoz · 2026-01-07T21:46:51Z

Draft PR to check whether TRTLLM can use the latest FI

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz · 2026-01-07T21:47:05Z

/bot run

nvchenghaoz · 2026-01-07T21:48:21Z

Mark as draft; This PR is to see if updating the latest flashinfer will cause some regression issues.

tensorrt-cicd · 2026-01-07T21:52:54Z

PR_Github #30933 [ run ] triggered by Bot. Commit: 26f84d9

coderabbitai · 2026-01-07T21:54:44Z

📝 Walkthrough

Walkthrough

Updates the flashinfer-python dependency across multiple configuration files from 0.3.x to version 0.5.3, reflecting a significant version bump. The updates use different version specification formats: exact pinning (==0.5.3) in requirements.txt and caret syntax (^0.5.3) in pyproject.toml.

Changes

Cohort / File(s)	Summary
flashinfer-python dependency updates `ATTRIBUTIONS-Python.md`, `requirements.txt`, `security_scanning/pyproject.toml`	Updated flashinfer-python from 0.3.x range to 0.5.3 across all files. requirements.txt uses exact pinning (==0.5.3), pyproject.toml uses caret syntax (^0.5.3 allowing compatible versions), and ATTRIBUTIONS-Python.md updates documentation version reference.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely empty of substantive content, containing only the template structure without filling in any required sections like Description or Test Coverage.	Add a Description section explaining why flashinfer was updated to 0.5.3 and a Test Coverage section listing relevant tests that validate the dependency upgrade.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: updating flashinfer-python dependency across multiple configuration files from version 0.3.x to 0.5.3.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @security_scanning/pyproject.toml:
- Line 59: The package version for flashinfer-python is inconsistent between
requirements.txt (exact pin 0.5.3) and security_scanning/pyproject.toml
(^0.5.3); align them to avoid environment drift by using the same exact pin in
both files—update the security_scanning/pyproject.toml entry for
flashinfer-python to match the exact version 0.5.3 (or alternatively update
requirements.txt to the caret/range if you prefer allowing patch updates),
ensuring both files reference the identical version format for
flashinfer-python.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7187afe and 26f84d9.

📒 Files selected for processing (3)

ATTRIBUTIONS-Python.md
requirements.txt
security_scanning/pyproject.toml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{md,rst}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When documenting CLI commands for TensorRT-LLM tools like trtllm-serve, trtllm-bench, or trtllm-eval, prefer using --config over --extra_llm_api_options for specifying configuration files

Files:

ATTRIBUTIONS-Python.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

ATTRIBUTIONS-Python.md (1)

5264-5264: Attribution update correctly reflects the dependency version bump to 0.5.3.

The flashinfer-python version in ATTRIBUTIONS-Python.md is consistent with the pinned versions in requirements.txt (==0.5.3) and pyproject.toml (^0.5.3). The Apache-2.0 license designation is accurate.

requirements.txt (2)

55-55: Flashinfer-python 0.5.3 is appropriately pinned.

The codebase has extensive flashinfer integration across attention backends, sampling utilities, and custom operators. The exact version constraint (==0.5.3) is justified by version-specific workarounds in tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py, which copies fast_decode_plan logic from flashinfer >0.5 (not available in 0.3.1). The pinned version satisfies the documented requirement in that file's TODO comment.

55-55: Verify flashinfer-python 0.5.3 compatibility with codebase changes.

This is a significant version jump from 0.3.x to 0.5.3, skipping the entire 0.4.x series. The version exists and is stable on PyPI, but multiple breaking changes between v0.3 and v0.5.3 affect APIs used in the codebase:

Paged KV API changes: get_batch_indices_positions() signature and semantics changed (line 304-306 in flashinfer.py)

plan/run API changes: APIs now expect host tensors with automatic host→device transfers, while code may be passing device tensors

JIT compilation rework: Custom-attention and kernel compilation lifecycle changed

RoPE standardization: sin/cos now computed as fp32

CUDAGraph/torch.compile compatibility: Kernel invocation requirements changed

The codebase actively uses these APIs in tensorrt_llm/_torch/attention_backend/flashinfer.py. Verify that the implementation has been updated to handle these API changes, particularly the plan/run host tensor expectations and get_batch_indices_positions call signatures.

No security advisories found for this version.

coderabbitai · 2026-01-07T21:54:47Z

security_scanning/pyproject.toml

 patchelf = "^0.17.2.4"
 einops = "^0.8.1"
-flashinfer-python = ">=0.3.0,<0.4.0"
+flashinfer-python = "^0.5.3"


⚠️ Potential issue | 🟠 Major

Align version specification with requirements.txt for consistency.

There's an inconsistency in version specifications between the two files:

requirements.txt (line 55): flashinfer-python==0.5.3 (exact pin)

security_scanning/pyproject.toml (line 59): flashinfer-python = "^0.5.3" (caret, allows 0.5.x)

The caret syntax ^0.5.3 permits any version ≥0.5.3 and <0.6.0, while the exact pin locks to 0.5.3. This can lead to different versions being installed across environments, potentially causing environment drift or CI/CD inconsistencies.

Consider either:

Using exact pinning in both files for strict reproducibility: "0.5.3" or "==0.5.3"

Using caret syntax in both files if patch updates are acceptable: Change requirements.txt to flashinfer-python>=0.5.3,<0.6.0

Proposed fix: Align to exact version

-flashinfer-python = "^0.5.3" +flashinfer-python = "0.5.3"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

flashinfer-python = "^0.5.3"

flashinfer-python = "0.5.3"

🤖 Prompt for AI Agents

In @security_scanning/pyproject.toml at line 59, The package version for flashinfer-python is inconsistent between requirements.txt (exact pin 0.5.3) and security_scanning/pyproject.toml (^0.5.3); align them to avoid environment drift by using the same exact pin in both files—update the security_scanning/pyproject.toml entry for flashinfer-python to match the exact version 0.5.3 (or alternatively update requirements.txt to the caret/range if you prefer allowing patch updates), ensuring both files reference the identical version format for flashinfer-python.

tensorrt-cicd · 2026-01-07T23:08:15Z

PR_Github #30933 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #23895 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

nvchenghaoz · 2026-01-07T23:29:54Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-01-07T23:35:40Z

PR_Github #30940 [ run ] triggered by Bot. Commit: 26f84d9

tensorrt-cicd · 2026-01-08T03:43:54Z

PR_Github #30940 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #23902 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yibinl-nvidia · 2026-01-08T20:15:28Z

/bot run --disable-fail-fast --post-merge

tensorrt-cicd · 2026-01-08T20:24:31Z

PR_Github #31119 [ run ] triggered by Bot. Commit: 26f84d9

tensorrt-cicd · 2026-01-08T23:29:28Z

PR_Github #31119 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #24036 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz · 2026-01-09T01:32:23Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-01-09T01:38:51Z

PR_Github #31147 [ run ] triggered by Bot. Commit: 06722c4

tensorrt-cicd · 2026-01-09T06:59:20Z

PR_Github #31147 [ run ] completed with state SUCCESS. Commit: 06722c4
/LLM/main/L0_MergeRequest_PR pipeline #24063 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz · 2026-01-09T18:46:39Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-01-09T18:53:53Z

PR_Github #31286 [ run ] triggered by Bot. Commit: 4dc785a

tensorrt-cicd · 2026-01-09T23:35:18Z

PR_Github #31286 [ run ] completed with state SUCCESS. Commit: 4dc785a
/LLM/main/L0_MergeRequest_PR pipeline #24180 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

nvchenghaoz · 2026-01-10T00:10:15Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-01-10T00:17:02Z

PR_Github #31297 [ run ] triggered by Bot. Commit: 4dc785a

tensorrt-cicd · 2026-01-10T02:24:16Z

PR_Github #31297 [ run ] completed with state SUCCESS. Commit: 4dc785a
/LLM/main/L0_MergeRequest_PR pipeline #24191 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

test latest flashinfer

26f84d9

Signed-off-by: Chenghao Zhang <[email protected]>

nvchenghaoz requested a review from a team as a code owner January 7, 2026 21:46

nvchenghaoz marked this pull request as draft January 7, 2026 21:46

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Update the FI API

06722c4

Signed-off-by: Chenghao Zhang <[email protected]>

Update API and waive the test

4dc785a

Signed-off-by: Chenghao Zhang <[email protected]>

[None][chore] test latest flashinfer #10522

Are you sure you want to change the base?

[None][chore] test latest flashinfer #10522

Conversation

nvchenghaoz commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvchenghaoz commented Jan 7, 2026

Uh oh!

nvchenghaoz commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jan 7, 2026

Uh oh!

nvchenghaoz commented Jan 7, 2026

Uh oh!

tensorrt-cicd commented Jan 7, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

yibinl-nvidia commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

nvchenghaoz commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

nvchenghaoz commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

nvchenghaoz commented Jan 10, 2026

Uh oh!

tensorrt-cicd commented Jan 10, 2026

Uh oh!

tensorrt-cicd commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvchenghaoz commented Jan 7, 2026 •

edited

Loading

nvchenghaoz commented Jan 7, 2026 •

edited

Loading