Skip to content

[triton][beta] [Cherry-pick] '[BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)'#1014

Closed
agron911 wants to merge 3 commits intofacebookexperimental:mainfrom
agron911:export-D94678547
Closed

[triton][beta] [Cherry-pick] '[BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)'#1014
agron911 wants to merge 3 commits intofacebookexperimental:mainfrom
agron911:export-D94678547

Conversation

@agron911
Copy link
Contributor

@agron911 agron911 commented Mar 2, 2026

Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8137

Upstream commit message:

> [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)

> Triton’s existing pass pipelines are explicitly defined in the various
> extended compiler.py files that live in Triton’s backends. Currently
> when we require insertion of passes either for instrumentation or for
> the addition of downstream optimization and custom lowering it is
> required for the compiler.py file itself to be modified.

> In order to allow for more downstream configurability and as a first
> step toward more custom MLIR level pass plugins, we add a hook into the
> compiler stages to allow for a more configurable pass manager system
> setup.

> Using Python inspection routines coupled with the hook allows for more
> fine grained control of things like enabling/disabling passes for
> specific kernels with eventually being able to load and insert
> completely out of tree ops/passes in arbitrary places in the stages
> pipeline.

> Co-authored with plotfi

> ---------

> Co-authored-by: Puyan Lotfi <puyan@puyan.org>

Do not remove the following line from this commit
Reactor Cherry-pick Revision: 905b3d1

This diff was generated by running:

buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1

Reviewed By: dshi7

Differential Revision: D94678547

agron911 added 3 commits March 2, 2026 03:34
…lowering (#8225)' (facebookexperimental#1012)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8225

Upstream commit message:
```
> [BACKEND] Generic tcgen05.cp lowering (#8225)

> We also fix a ton of issues here and there that we found while working
> on this.

> - We add full support for `memdesc_trans` and `memdesc_reshape` using
> the newly minted `SharedLinearLayout`.
> - We fix a few issues we left out in `SharedLinearLayout`'s initial
> implementation.
> - We now make `tcgen05.cp` take the correct layout, and we fix the
> OptimizeDotOperands
> pass to use `memdesc_trans/reshape` to reflect this.
> - We fix a number of previously broken tests

> We still need to tighten the memdesc_copy verifier to make it a bit more
> user-friendly tho.
```

Conflict Resolution:
- File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:953-1002
  Action: Kept HEAD's createCommit 2CTA support code and adopted upstream's new createTcgen05Cp with TMemCopyAtom signature
  Reason: The upstream refactored createTcgen05Cp from (bool scales, bool useTwoCTAs) to (TMemCopyAtom atom). HEAD had TLX 2CTA commit code that needed preservation. The new generic createTcgen05Cp uses TMemCopyAtom for all copy patterns.

- File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1008-1100
  Action: Removed the copyScales function (kept upstream's empty side)
  Reason: Upstream removed copyScales because the generic copySharedToTmem now handles scales via TMemCopyAtom. The old copyScales used the removed createTcgen05Cp signature.

- File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1296-1307
  Action: Adopted upstream's tmemAddr calculation with elementBytes/4 scaling and new createTcgen05Cp(atom) call
  Reason: Upstream corrected the tmem address offset calculation and uses the new generic API.

- File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1326-1349
  Action: Kept HEAD's TLX 2CTA leader CTA predicate logic, removed copyScales branch, call only copySharedToTmem
  Reason: The 2CTA predicate is a Meta/TLX-specific feature. The copyScales branch was removed since copySharedToTmem now handles all copy types generically.

- File: lib/Dialect/TritonNvidiaGPU/IR/Ops.cpp:833-843
  Action: Merged HEAD's DummyTMEMLayoutAttr early-return (TLX-specific) with upstream's null-guard on nvmmaEnc
  Reason: Both changes are additive — the DummyTMEMLayoutAttr check is a TLX feature, and the nvmmaEnc null guard fixes a potential null dereference.

- File: test/Conversion/tritongpu_to_llvm_blackwell.mlir:351-358
  Action: Used upstream's shared_linear layout definition with blocked layout
  Reason: Upstream changed the test to use shared_linear layout, which matches the new generic tcgen05.cp lowering path.

Raw Conflicts: https://www.internalfb.com/intern/paste/P2209497547/

Diff Versions Comparsion
v2 → v3: Restore TLX-compatible scale copy path
https://www.internalfb.com/phabricator/paste/view/P2212980635

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 70e69cb

Reviewed By: dshi7

Differential Revision: D94673495
… x Mxfp4 MoE Kernel (#8176)' (facebookexperimental#1013)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8176

Upstream commit message:
```
> [Bench][AMD] Update Parameters for Bf16 x Mxfp4 MoE Kernel (#8176)
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 7d92894
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1
```

Reviewed By: dshi7

Differential Revision: D94678510
…rridable compiler pass pipeline (#8137)'

Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8137

Upstream commit message:
```
> [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)

> Triton’s existing pass pipelines are explicitly defined in the various
> extended compiler.py files that live in Triton’s backends. Currently
> when we require insertion of passes either for instrumentation or for
> the addition of downstream optimization and custom lowering it is
> required for the compiler.py file itself to be modified.

> In order to allow for more downstream configurability and as a first
> step toward more custom MLIR level pass plugins, we add a hook into the
> compiler stages to allow for a more configurable pass manager system
> setup.

> Using Python inspection routines coupled with the hook allows for more
> fine grained control of things like enabling/disabling passes for
> specific kernels with eventually being able to load and insert
> completely out of tree ops/passes in arbitrary places in the stages
> pipeline.

> Co-authored with plotfi

> ---------

> Co-authored-by: Puyan Lotfi <puyan@puyan.org>
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 905b3d1
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1
```

Reviewed By: dshi7

Differential Revision: D94678547
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 2, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 2, 2026

@agron911 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94678547.

agron911 added a commit to agron911/triton that referenced this pull request Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8137

Upstream commit message:
```
> [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)

> Triton’s existing pass pipelines are explicitly defined in the various
> extended compiler.py files that live in Triton’s backends. Currently
> when we require insertion of passes either for instrumentation or for
> the addition of downstream optimization and custom lowering it is
> required for the compiler.py file itself to be modified.

> In order to allow for more downstream configurability and as a first
> step toward more custom MLIR level pass plugins, we add a hook into the
> compiler stages to allow for a more configurable pass manager system
> setup.

> Using Python inspection routines coupled with the hook allows for more
> fine grained control of things like enabling/disabling passes for
> specific kernels with eventually being able to load and insert
> completely out of tree ops/passes in arbitrary places in the stages
> pipeline.

> Co-authored with plotfi

> ---------

> Co-authored-by: Puyan Lotfi <puyan@puyan.org>
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 905b3d1
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1
```

Reviewed By: dshi7

Differential Revision: D94678547
agron911 added a commit to agron911/triton that referenced this pull request Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8137

Upstream commit message:
```
> [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)

> Triton’s existing pass pipelines are explicitly defined in the various
> extended compiler.py files that live in Triton’s backends. Currently
> when we require insertion of passes either for instrumentation or for
> the addition of downstream optimization and custom lowering it is
> required for the compiler.py file itself to be modified.

> In order to allow for more downstream configurability and as a first
> step toward more custom MLIR level pass plugins, we add a hook into the
> compiler stages to allow for a more configurable pass manager system
> setup.

> Using Python inspection routines coupled with the hook allows for more
> fine grained control of things like enabling/disabling passes for
> specific kernels with eventually being able to load and insert
> completely out of tree ops/passes in arbitrary places in the stages
> pipeline.

> Co-authored with plotfi

> ---------

> Co-authored-by: Puyan Lotfi <puyan@puyan.org>
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 905b3d1
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1
```

Reviewed By: dshi7

Differential Revision: D94678547
agron911 added a commit to agron911/triton that referenced this pull request Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8137

Upstream commit message:
```
> [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)

> Triton’s existing pass pipelines are explicitly defined in the various
> extended compiler.py files that live in Triton’s backends. Currently
> when we require insertion of passes either for instrumentation or for
> the addition of downstream optimization and custom lowering it is
> required for the compiler.py file itself to be modified.

> In order to allow for more downstream configurability and as a first
> step toward more custom MLIR level pass plugins, we add a hook into the
> compiler stages to allow for a more configurable pass manager system
> setup.

> Using Python inspection routines coupled with the hook allows for more
> fine grained control of things like enabling/disabling passes for
> specific kernels with eventually being able to load and insert
> completely out of tree ops/passes in arbitrary places in the stages
> pipeline.

> Co-authored with plotfi

> ---------

> Co-authored-by: Puyan Lotfi <puyan@puyan.org>
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 905b3d1
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1
```

Reviewed By: dshi7

Differential Revision: D94678547
@meta-codesync meta-codesync bot closed this in f73e9a4 Mar 3, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 3, 2026

This pull request has been merged in f73e9a4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant