[triton][beta] [Cherry-pick] '[BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137)'#1014
Closed
agron911 wants to merge 3 commits intofacebookexperimental:mainfrom
Closed
Conversation
…lowering (#8225)' (facebookexperimental#1012) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8225 Upstream commit message: ``` > [BACKEND] Generic tcgen05.cp lowering (#8225) > We also fix a ton of issues here and there that we found while working > on this. > - We add full support for `memdesc_trans` and `memdesc_reshape` using > the newly minted `SharedLinearLayout`. > - We fix a few issues we left out in `SharedLinearLayout`'s initial > implementation. > - We now make `tcgen05.cp` take the correct layout, and we fix the > OptimizeDotOperands > pass to use `memdesc_trans/reshape` to reflect this. > - We fix a number of previously broken tests > We still need to tighten the memdesc_copy verifier to make it a bit more > user-friendly tho. ``` Conflict Resolution: - File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:953-1002 Action: Kept HEAD's createCommit 2CTA support code and adopted upstream's new createTcgen05Cp with TMemCopyAtom signature Reason: The upstream refactored createTcgen05Cp from (bool scales, bool useTwoCTAs) to (TMemCopyAtom atom). HEAD had TLX 2CTA commit code that needed preservation. The new generic createTcgen05Cp uses TMemCopyAtom for all copy patterns. - File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1008-1100 Action: Removed the copyScales function (kept upstream's empty side) Reason: Upstream removed copyScales because the generic copySharedToTmem now handles scales via TMemCopyAtom. The old copyScales used the removed createTcgen05Cp signature. - File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1296-1307 Action: Adopted upstream's tmemAddr calculation with elementBytes/4 scaling and new createTcgen05Cp(atom) call Reason: Upstream corrected the tmem address offset calculation and uses the new generic API. - File: third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/TensorMemoryToLLVM.cpp:1326-1349 Action: Kept HEAD's TLX 2CTA leader CTA predicate logic, removed copyScales branch, call only copySharedToTmem Reason: The 2CTA predicate is a Meta/TLX-specific feature. The copyScales branch was removed since copySharedToTmem now handles all copy types generically. - File: lib/Dialect/TritonNvidiaGPU/IR/Ops.cpp:833-843 Action: Merged HEAD's DummyTMEMLayoutAttr early-return (TLX-specific) with upstream's null-guard on nvmmaEnc Reason: Both changes are additive — the DummyTMEMLayoutAttr check is a TLX feature, and the nvmmaEnc null guard fixes a potential null dereference. - File: test/Conversion/tritongpu_to_llvm_blackwell.mlir:351-358 Action: Used upstream's shared_linear layout definition with blocked layout Reason: Upstream changed the test to use shared_linear layout, which matches the new generic tcgen05.cp lowering path. Raw Conflicts: https://www.internalfb.com/intern/paste/P2209497547/ Diff Versions Comparsion v2 → v3: Restore TLX-compatible scale copy path https://www.internalfb.com/phabricator/paste/view/P2212980635 ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 70e69cb Reviewed By: dshi7 Differential Revision: D94673495
… x Mxfp4 MoE Kernel (#8176)' (facebookexperimental#1013) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8176 Upstream commit message: ``` > [Bench][AMD] Update Parameters for Bf16 x Mxfp4 MoE Kernel (#8176) ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 7d92894 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 ``` Reviewed By: dshi7 Differential Revision: D94678510
…rridable compiler pass pipeline (#8137)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8137 Upstream commit message: ``` > [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137) > Triton’s existing pass pipelines are explicitly defined in the various > extended compiler.py files that live in Triton’s backends. Currently > when we require insertion of passes either for instrumentation or for > the addition of downstream optimization and custom lowering it is > required for the compiler.py file itself to be modified. > In order to allow for more downstream configurability and as a first > step toward more custom MLIR level pass plugins, we add a hook into the > compiler stages to allow for a more configurable pass manager system > setup. > Using Python inspection routines coupled with the hook allows for more > fine grained control of things like enabling/disabling passes for > specific kernels with eventually being able to load and insert > completely out of tree ops/passes in arbitrary places in the stages > pipeline. > Co-authored with plotfi > --------- > Co-authored-by: Puyan Lotfi <puyan@puyan.org> ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 905b3d1 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 ``` Reviewed By: dshi7 Differential Revision: D94678547
agron911
added a commit
to agron911/triton
that referenced
this pull request
Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8137 Upstream commit message: ``` > [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137) > Triton’s existing pass pipelines are explicitly defined in the various > extended compiler.py files that live in Triton’s backends. Currently > when we require insertion of passes either for instrumentation or for > the addition of downstream optimization and custom lowering it is > required for the compiler.py file itself to be modified. > In order to allow for more downstream configurability and as a first > step toward more custom MLIR level pass plugins, we add a hook into the > compiler stages to allow for a more configurable pass manager system > setup. > Using Python inspection routines coupled with the hook allows for more > fine grained control of things like enabling/disabling passes for > specific kernels with eventually being able to load and insert > completely out of tree ops/passes in arbitrary places in the stages > pipeline. > Co-authored with plotfi > --------- > Co-authored-by: Puyan Lotfi <puyan@puyan.org> ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 905b3d1 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 ``` Reviewed By: dshi7 Differential Revision: D94678547
agron911
added a commit
to agron911/triton
that referenced
this pull request
Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8137 Upstream commit message: ``` > [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137) > Triton’s existing pass pipelines are explicitly defined in the various > extended compiler.py files that live in Triton’s backends. Currently > when we require insertion of passes either for instrumentation or for > the addition of downstream optimization and custom lowering it is > required for the compiler.py file itself to be modified. > In order to allow for more downstream configurability and as a first > step toward more custom MLIR level pass plugins, we add a hook into the > compiler stages to allow for a more configurable pass manager system > setup. > Using Python inspection routines coupled with the hook allows for more > fine grained control of things like enabling/disabling passes for > specific kernels with eventually being able to load and insert > completely out of tree ops/passes in arbitrary places in the stages > pipeline. > Co-authored with plotfi > --------- > Co-authored-by: Puyan Lotfi <puyan@puyan.org> ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 905b3d1 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 ``` Reviewed By: dshi7 Differential Revision: D94678547
agron911
added a commit
to agron911/triton
that referenced
this pull request
Mar 2, 2026
…rridable compiler pass pipeline (#8137)' (facebookexperimental#1014) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8137 Upstream commit message: ``` > [BACKEND] Add hook for configurable/overridable compiler pass pipeline (#8137) > Triton’s existing pass pipelines are explicitly defined in the various > extended compiler.py files that live in Triton’s backends. Currently > when we require insertion of passes either for instrumentation or for > the addition of downstream optimization and custom lowering it is > required for the compiler.py file itself to be modified. > In order to allow for more downstream configurability and as a first > step toward more custom MLIR level pass plugins, we add a hook into the > compiler stages to allow for a more configurable pass manager system > setup. > Using Python inspection routines coupled with the hook allows for more > fine grained control of things like enabling/disabling passes for > specific kernels with eventually being able to load and insert > completely out of tree ops/passes in arbitrary places in the stages > pipeline. > Co-authored with plotfi > --------- > Co-authored-by: Puyan Lotfi <puyan@puyan.org> ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 905b3d1 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 ``` Reviewed By: dshi7 Differential Revision: D94678547
|
This pull request has been merged in f73e9a4. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8137
Upstream commit message:
Do not remove the following line from this commit
Reactor Cherry-pick Revision: 905b3d1
This diff was generated by running:
Reviewed By: dshi7
Differential Revision: D94678547