Skip to content

Conversation

@nvchenghaoz
Copy link
Collaborator

@nvchenghaoz nvchenghaoz commented Jan 7, 2026

Draft PR to check whether TRTLLM can use the latest FI

Signed-off-by: Chenghao Zhang <[email protected]>
@nvchenghaoz nvchenghaoz requested a review from a team as a code owner January 7, 2026 21:46
@nvchenghaoz nvchenghaoz marked this pull request as draft January 7, 2026 21:46
@nvchenghaoz
Copy link
Collaborator Author

/bot run

@nvchenghaoz
Copy link
Collaborator Author

nvchenghaoz commented Jan 7, 2026

Mark as draft; This PR is to see if updating the latest flashinfer will cause some regression issues.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30933 [ run ] triggered by Bot. Commit: 26f84d9

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Updates the flashinfer-python dependency across multiple configuration files from 0.3.x to version 0.5.3, reflecting a significant version bump. The updates use different version specification formats: exact pinning (==0.5.3) in requirements.txt and caret syntax (^0.5.3) in pyproject.toml.

Changes

Cohort / File(s) Summary
flashinfer-python dependency updates
ATTRIBUTIONS-Python.md, requirements.txt, security_scanning/pyproject.toml
Updated flashinfer-python from 0.3.x range to 0.5.3 across all files. requirements.txt uses exact pinning (==0.5.3), pyproject.toml uses caret syntax (^0.5.3 allowing compatible versions), and ATTRIBUTIONS-Python.md updates documentation version reference.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty of substantive content, containing only the template structure without filling in any required sections like Description or Test Coverage. Add a Description section explaining why flashinfer was updated to 0.5.3 and a Test Coverage section listing relevant tests that validate the dependency upgrade.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: updating flashinfer-python dependency across multiple configuration files from version 0.3.x to 0.5.3.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @security_scanning/pyproject.toml:
- Line 59: The package version for flashinfer-python is inconsistent between
requirements.txt (exact pin 0.5.3) and security_scanning/pyproject.toml
(^0.5.3); align them to avoid environment drift by using the same exact pin in
both files—update the security_scanning/pyproject.toml entry for
flashinfer-python to match the exact version 0.5.3 (or alternatively update
requirements.txt to the caret/range if you prefer allowing patch updates),
ensuring both files reference the identical version format for
flashinfer-python.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7187afe and 26f84d9.

📒 Files selected for processing (3)
  • ATTRIBUTIONS-Python.md
  • requirements.txt
  • security_scanning/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{md,rst}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When documenting CLI commands for TensorRT-LLM tools like trtllm-serve, trtllm-bench, or trtllm-eval, prefer using --config over --extra_llm_api_options for specifying configuration files

Files:

  • ATTRIBUTIONS-Python.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
ATTRIBUTIONS-Python.md (1)

5264-5264: Attribution update correctly reflects the dependency version bump to 0.5.3.

The flashinfer-python version in ATTRIBUTIONS-Python.md is consistent with the pinned versions in requirements.txt (==0.5.3) and pyproject.toml (^0.5.3). The Apache-2.0 license designation is accurate.

requirements.txt (2)

55-55: Flashinfer-python 0.5.3 is appropriately pinned.

The codebase has extensive flashinfer integration across attention backends, sampling utilities, and custom operators. The exact version constraint (==0.5.3) is justified by version-specific workarounds in tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py, which copies fast_decode_plan logic from flashinfer >0.5 (not available in 0.3.1). The pinned version satisfies the documented requirement in that file's TODO comment.


55-55: Verify flashinfer-python 0.5.3 compatibility with codebase changes.

This is a significant version jump from 0.3.x to 0.5.3, skipping the entire 0.4.x series. The version exists and is stable on PyPI, but multiple breaking changes between v0.3 and v0.5.3 affect APIs used in the codebase:

  • Paged KV API changes: get_batch_indices_positions() signature and semantics changed (line 304-306 in flashinfer.py)
  • plan/run API changes: APIs now expect host tensors with automatic host→device transfers, while code may be passing device tensors
  • JIT compilation rework: Custom-attention and kernel compilation lifecycle changed
  • RoPE standardization: sin/cos now computed as fp32
  • CUDAGraph/torch.compile compatibility: Kernel invocation requirements changed

The codebase actively uses these APIs in tensorrt_llm/_torch/attention_backend/flashinfer.py. Verify that the implementation has been updated to handle these API changes, particularly the plan/run host tensor expectations and get_batch_indices_positions call signatures.

No security advisories found for this version.

patchelf = "^0.17.2.4"
einops = "^0.8.1"
flashinfer-python = ">=0.3.0,<0.4.0"
flashinfer-python = "^0.5.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align version specification with requirements.txt for consistency.

There's an inconsistency in version specifications between the two files:

  • requirements.txt (line 55): flashinfer-python==0.5.3 (exact pin)
  • security_scanning/pyproject.toml (line 59): flashinfer-python = "^0.5.3" (caret, allows 0.5.x)

The caret syntax ^0.5.3 permits any version ≥0.5.3 and <0.6.0, while the exact pin locks to 0.5.3. This can lead to different versions being installed across environments, potentially causing environment drift or CI/CD inconsistencies.

Consider either:

  1. Using exact pinning in both files for strict reproducibility: "0.5.3" or "==0.5.3"
  2. Using caret syntax in both files if patch updates are acceptable: Change requirements.txt to flashinfer-python>=0.5.3,<0.6.0
Proposed fix: Align to exact version
-flashinfer-python = "^0.5.3"
+flashinfer-python = "0.5.3"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
flashinfer-python = "^0.5.3"
flashinfer-python = "0.5.3"
🤖 Prompt for AI Agents
In @security_scanning/pyproject.toml at line 59, The package version for
flashinfer-python is inconsistent between requirements.txt (exact pin 0.5.3) and
security_scanning/pyproject.toml (^0.5.3); align them to avoid environment drift
by using the same exact pin in both files—update the
security_scanning/pyproject.toml entry for flashinfer-python to match the exact
version 0.5.3 (or alternatively update requirements.txt to the caret/range if
you prefer allowing patch updates), ensuring both files reference the identical
version format for flashinfer-python.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30933 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #23895 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@nvchenghaoz
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30940 [ run ] triggered by Bot. Commit: 26f84d9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30940 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #23902 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yibinl-nvidia
Copy link
Collaborator

/bot run --disable-fail-fast --post-merge

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31119 [ run ] triggered by Bot. Commit: 26f84d9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31119 [ run ] completed with state SUCCESS. Commit: 26f84d9
/LLM/main/L0_MergeRequest_PR pipeline #24036 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chenghao Zhang <[email protected]>
@nvchenghaoz
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31147 [ run ] triggered by Bot. Commit: 06722c4

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31147 [ run ] completed with state SUCCESS. Commit: 06722c4
/LLM/main/L0_MergeRequest_PR pipeline #24063 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chenghao Zhang <[email protected]>
@nvchenghaoz
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31286 [ run ] triggered by Bot. Commit: 4dc785a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31286 [ run ] completed with state SUCCESS. Commit: 4dc785a
/LLM/main/L0_MergeRequest_PR pipeline #24180 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@nvchenghaoz
Copy link
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31297 [ run ] triggered by Bot. Commit: 4dc785a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31297 [ run ] completed with state SUCCESS. Commit: 4dc785a
/LLM/main/L0_MergeRequest_PR pipeline #24191 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants