-
Notifications
You must be signed in to change notification settings - Fork 2k
[None][chore] test latest flashinfer #10522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run |
|
Mark as draft; This PR is to see if updating the latest flashinfer will cause some regression issues. |
|
PR_Github #30933 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughUpdates the flashinfer-python dependency across multiple configuration files from 0.3.x to version 0.5.3, reflecting a significant version bump. The updates use different version specification formats: exact pinning (==0.5.3) in requirements.txt and caret syntax (^0.5.3) in pyproject.toml. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @security_scanning/pyproject.toml:
- Line 59: The package version for flashinfer-python is inconsistent between
requirements.txt (exact pin 0.5.3) and security_scanning/pyproject.toml
(^0.5.3); align them to avoid environment drift by using the same exact pin in
both files—update the security_scanning/pyproject.toml entry for
flashinfer-python to match the exact version 0.5.3 (or alternatively update
requirements.txt to the caret/range if you prefer allowing patch updates),
ensuring both files reference the identical version format for
flashinfer-python.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ATTRIBUTIONS-Python.mdrequirements.txtsecurity_scanning/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{md,rst}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When documenting CLI commands for TensorRT-LLM tools like
trtllm-serve,trtllm-bench, ortrtllm-eval, prefer using--configover--extra_llm_api_optionsfor specifying configuration files
Files:
ATTRIBUTIONS-Python.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
ATTRIBUTIONS-Python.md (1)
5264-5264: Attribution update correctly reflects the dependency version bump to 0.5.3.The flashinfer-python version in ATTRIBUTIONS-Python.md is consistent with the pinned versions in requirements.txt (==0.5.3) and pyproject.toml (^0.5.3). The Apache-2.0 license designation is accurate.
requirements.txt (2)
55-55: Flashinfer-python 0.5.3 is appropriately pinned.The codebase has extensive flashinfer integration across attention backends, sampling utilities, and custom operators. The exact version constraint (==0.5.3) is justified by version-specific workarounds in
tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py, which copiesfast_decode_planlogic from flashinfer >0.5 (not available in 0.3.1). The pinned version satisfies the documented requirement in that file's TODO comment.
55-55: Verify flashinfer-python 0.5.3 compatibility with codebase changes.This is a significant version jump from 0.3.x to 0.5.3, skipping the entire 0.4.x series. The version exists and is stable on PyPI, but multiple breaking changes between v0.3 and v0.5.3 affect APIs used in the codebase:
- Paged KV API changes:
get_batch_indices_positions()signature and semantics changed (line 304-306 in flashinfer.py)- plan/run API changes: APIs now expect host tensors with automatic host→device transfers, while code may be passing device tensors
- JIT compilation rework: Custom-attention and kernel compilation lifecycle changed
- RoPE standardization: sin/cos now computed as fp32
- CUDAGraph/torch.compile compatibility: Kernel invocation requirements changed
The codebase actively uses these APIs in
tensorrt_llm/_torch/attention_backend/flashinfer.py. Verify that the implementation has been updated to handle these API changes, particularly the plan/run host tensor expectations and get_batch_indices_positions call signatures.No security advisories found for this version.
| patchelf = "^0.17.2.4" | ||
| einops = "^0.8.1" | ||
| flashinfer-python = ">=0.3.0,<0.4.0" | ||
| flashinfer-python = "^0.5.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align version specification with requirements.txt for consistency.
There's an inconsistency in version specifications between the two files:
requirements.txt(line 55):flashinfer-python==0.5.3(exact pin)security_scanning/pyproject.toml(line 59):flashinfer-python = "^0.5.3"(caret, allows 0.5.x)
The caret syntax ^0.5.3 permits any version ≥0.5.3 and <0.6.0, while the exact pin locks to 0.5.3. This can lead to different versions being installed across environments, potentially causing environment drift or CI/CD inconsistencies.
Consider either:
- Using exact pinning in both files for strict reproducibility:
"0.5.3"or"==0.5.3" - Using caret syntax in both files if patch updates are acceptable: Change requirements.txt to
flashinfer-python>=0.5.3,<0.6.0
Proposed fix: Align to exact version
-flashinfer-python = "^0.5.3"
+flashinfer-python = "0.5.3"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| flashinfer-python = "^0.5.3" | |
| flashinfer-python = "0.5.3" |
🤖 Prompt for AI Agents
In @security_scanning/pyproject.toml at line 59, The package version for
flashinfer-python is inconsistent between requirements.txt (exact pin 0.5.3) and
security_scanning/pyproject.toml (^0.5.3); align them to avoid environment drift
by using the same exact pin in both files—update the
security_scanning/pyproject.toml entry for flashinfer-python to match the exact
version 0.5.3 (or alternatively update requirements.txt to the caret/range if
you prefer allowing patch updates), ensuring both files reference the identical
version format for flashinfer-python.
|
PR_Github #30933 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #30940 [ run ] triggered by Bot. Commit: |
|
PR_Github #30940 [ run ] completed with state
|
|
/bot run --disable-fail-fast --post-merge |
|
PR_Github #31119 [ run ] triggered by Bot. Commit: |
|
PR_Github #31119 [ run ] completed with state
|
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #31147 [ run ] triggered by Bot. Commit: |
|
PR_Github #31147 [ run ] completed with state
|
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #31286 [ run ] triggered by Bot. Commit: |
|
PR_Github #31286 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #31297 [ run ] triggered by Bot. Commit: |
|
PR_Github #31297 [ run ] completed with state
|
Draft PR to check whether TRTLLM can use the latest FI