Skip to content

Conversation

@karthikvetrivel
Copy link
Member

@karthikvetrivel karthikvetrivel commented Jan 16, 2026

Description

This PR adds L2 normalization pattern matching and fusion transforms to the TensorRT-LLM AutoDeploy system, following the established two-stage pattern matching approach used by RMS norm (see #9969).

Configuration

The transforms are configured in default.yaml:

transforms:
  match_l2norm_pattern:
    stage: pattern_matcher
  fuse_l2norm:
    stage: post_load_fusion
    l2norm_backend: fla  # Options: 'fla' or 'torch'

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

Summary by CodeRabbit

  • New Features

    • L2Norm pattern matching and fusion optimization now available for model acceleration
    • Configurable L2Norm backend support with FLA and Torch options
  • Tests

    • Comprehensive test suite added for L2Norm fusion with multiple backend variants

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

This PR introduces L2Norm pattern matching and fusion transforms for TensorRT-LLM's PyTorch auto-deploy system. It adds configuration entries, implements pattern detection and backend-specific fusion logic, and includes comprehensive test coverage for the new transforms.

Changes

Cohort / File(s) Summary
Configuration
tensorrt_llm/_torch/auto_deploy/config/default.yaml
Added two new transform definitions: match_l2norm_pattern (pattern_matcher stage) and fuse_l2norm (post_load_fusion stage with l2norm_backend parameter set to "fla").
L2Norm Transform Implementation
tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py
Introduced MatchL2NormPattern and FuseL2Norm transform classes for identifying and fusing L2Norm patterns. Includes FuseL2NormConfig dataclass, _BACKEND_OPS mapping for backend selection ("fla" or "torch"), and three helper functions implementing L2Norm computation patterns with optional dtype casting.
L2Norm Fusion Tests
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py
Added test module with L2Norm and L2NormNoCast model variants, TestModel combining linear layers with L2Norm blocks, and parametrized test_l2norm_fusion() test covering both fusion backends ("fla" and "torch") with multiple epsilon values.

Sequence Diagram(s)

sequenceDiagram
    actor Input as GraphModule
    participant PMatcher as MatchL2NormPattern
    participant Registry as PatternRegistry
    participant Fusion as FuseL2Norm
    participant Backend as Backend Ops<br/>(fla/torch)
    actor Output as GraphModule

    Input->>PMatcher: Apply pattern matching
    PMatcher->>Registry: Register L2Norm patterns<br/>(with/without dtype cast)
    Registry-->>PMatcher: Pattern matches found
    PMatcher->>PMatcher: Replace matches with<br/>torch_l2norm op
    PMatcher-->>Output: Intermediate graph
    
    Output->>Fusion: Apply fusion transform
    Fusion->>Fusion: Validate backend<br/>("fla" or "torch")
    Fusion->>Fusion: Traverse graph nodes
    Fusion->>Backend: Swap torch_l2norm ops
    Backend-->>Fusion: Backend-specific ops
    Fusion->>Fusion: Recompile graph
    Fusion-->>Output: Fused GraphModule
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding L2 norm pattern matcher and fusion transform, which aligns with the changeset that introduces these two transforms and their configuration.
Description check ✅ Passed The description covers the main feature, configuration details, and includes a completed PR checklist. However, the 'Test Coverage' section from the template is not explicitly filled out, though tests are referenced and the checklist mentions test cases.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py`:
- Line 1: The file-level docstring in l2_norm.py is missing the required NVIDIA
copyright header; add the standard multi-line NVIDIA copyright/header block
(matching the header used in adjacent TensorRT-LLM Python sources) at the top of
the file above the existing module docstring, update the modification year to
2026, and ensure the header formatting and SPDX/license lines exactly match the
project's other source files (use l2_norm.py and the module-level docstring to
locate the insertion point).
🧹 Nitpick comments (2)
tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py (1)

3-21: Use module‑namespace imports for internal modules

The file uses multiple from ... import ... statements (internal modules, pydantic, torch.fx, typing). Please switch to module imports and qualify usages (e.g., node_utils.is_op, pattern_matcher.ADPatternMatcherPass) to preserve namespaces. As per coding guidelines.

tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py (1)

6-6: Avoid wildcard import; keep namespace for custom ops

Switch to a module import to preserve the namespace while still registering the ops.

♻️ Proposed change
-from tensorrt_llm._torch.auto_deploy.custom_ops.l2norm import *  # noqa
+import tensorrt_llm._torch.auto_deploy.custom_ops.l2norm as l2norm_ops  # noqa: F401

Note: tests under tests/ don’t require NVIDIA headers. As per coding guidelines; based on learnings.

@karthikvetrivel karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch 4 times, most recently from 13bfe98 to 8e55921 Compare January 20, 2026 14:29
Copy link
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. just a small comment. Please update it and then feel free to get it merged :)

@karthikvetrivel karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 8e55921 to 40d2f50 Compare January 20, 2026 14:47
@karthikvetrivel
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32779 [ run ] triggered by Bot. Commit: b2f95de

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32779 [ run ] completed with state SUCCESS. Commit: b2f95de
/LLM/main/L0_MergeRequest_PR pipeline #25373 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@karthikvetrivel karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from b2f95de to 52d4bb6 Compare January 21, 2026 18:07
@karthikvetrivel
Copy link
Member Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32992 [ run ] triggered by Bot. Commit: 52d4bb6

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32992 [ run ] completed with state SUCCESS. Commit: 52d4bb6
/LLM/main/L0_MergeRequest_PR pipeline #25508 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@karthikvetrivel karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 52d4bb6 to c69136c Compare January 21, 2026 20:37
@karthikvetrivel
Copy link
Member Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33006 [ run ] triggered by Bot. Commit: c69136c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33006 [ run ] completed with state SUCCESS. Commit: c69136c
/LLM/main/L0_MergeRequest_PR pipeline #25516 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
@karthikvetrivel karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from c69136c to 47c2fea Compare January 22, 2026 14:31
@karthikvetrivel
Copy link
Member Author

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33190 [ run ] triggered by Bot. Commit: 47c2fea

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33190 [ run ] completed with state SUCCESS. Commit: 47c2fea
/LLM/main/L0_MergeRequest_PR pipeline #25645 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants