-
Notifications
You must be signed in to change notification settings - Fork 2k
[#9525][feat] add L2 norm pattern matcher and fusion transform #10767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[#9525][feat] add L2 norm pattern matcher and fusion transform #10767
Conversation
📝 WalkthroughWalkthroughThis PR introduces L2Norm pattern matching and fusion transforms for TensorRT-LLM's PyTorch auto-deploy system. It adds configuration entries, implements pattern detection and backend-specific fusion logic, and includes comprehensive test coverage for the new transforms. Changes
Sequence Diagram(s)sequenceDiagram
actor Input as GraphModule
participant PMatcher as MatchL2NormPattern
participant Registry as PatternRegistry
participant Fusion as FuseL2Norm
participant Backend as Backend Ops<br/>(fla/torch)
actor Output as GraphModule
Input->>PMatcher: Apply pattern matching
PMatcher->>Registry: Register L2Norm patterns<br/>(with/without dtype cast)
Registry-->>PMatcher: Pattern matches found
PMatcher->>PMatcher: Replace matches with<br/>torch_l2norm op
PMatcher-->>Output: Intermediate graph
Output->>Fusion: Apply fusion transform
Fusion->>Fusion: Validate backend<br/>("fla" or "torch")
Fusion->>Fusion: Traverse graph nodes
Fusion->>Backend: Swap torch_l2norm ops
Backend-->>Fusion: Backend-specific ops
Fusion->>Fusion: Recompile graph
Fusion-->>Output: Fused GraphModule
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py`:
- Line 1: The file-level docstring in l2_norm.py is missing the required NVIDIA
copyright header; add the standard multi-line NVIDIA copyright/header block
(matching the header used in adjacent TensorRT-LLM Python sources) at the top of
the file above the existing module docstring, update the modification year to
2026, and ensure the header formatting and SPDX/license lines exactly match the
project's other source files (use l2_norm.py and the module-level docstring to
locate the insertion point).
🧹 Nitpick comments (2)
tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py (1)
3-21: Use module‑namespace imports for internal modulesThe file uses multiple
from ... import ...statements (internal modules,pydantic,torch.fx,typing). Please switch to module imports and qualify usages (e.g.,node_utils.is_op,pattern_matcher.ADPatternMatcherPass) to preserve namespaces. As per coding guidelines.tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py (1)
6-6: Avoid wildcard import; keep namespace for custom opsSwitch to a module import to preserve the namespace while still registering the ops.
♻️ Proposed change
-from tensorrt_llm._torch.auto_deploy.custom_ops.l2norm import * # noqa +import tensorrt_llm._torch.auto_deploy.custom_ops.l2norm as l2norm_ops # noqa: F401Note: tests under
tests/don’t require NVIDIA headers. As per coding guidelines; based on learnings.
13bfe98 to
8e55921
Compare
lucaslie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. just a small comment. Please update it and then feel free to get it merged :)
8e55921 to
40d2f50
Compare
|
/bot run |
|
PR_Github #32779 [ run ] triggered by Bot. Commit: |
|
PR_Github #32779 [ run ] completed with state
|
b2f95de to
52d4bb6
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #32992 [ run ] triggered by Bot. Commit: |
|
PR_Github #32992 [ run ] completed with state
|
52d4bb6 to
c69136c
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #33006 [ run ] triggered by Bot. Commit: |
|
PR_Github #33006 [ run ] completed with state
|
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
c69136c to
47c2fea
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #33190 [ run ] triggered by Bot. Commit: |
|
PR_Github #33190 [ run ] completed with state
|
Description
This PR adds L2 normalization pattern matching and fusion transforms to the TensorRT-LLM AutoDeploy system, following the established two-stage pattern matching approach used by RMS norm (see #9969).
Configuration
The transforms are configured in
default.yaml:PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.