Merge OpenAI Triton commit `6294db5` #5583

anmyachev · 2025-11-28T16:30:32Z

This PR changes the Triton base from e7fb841 to 6294db5 (Nov 21).

Pass rate: _

Blocked on #5582

…(#8770)" This reverts commit 64bcc99.

Signed-off-by: Anatoly Myachev <[email protected]>

This commit modifies the denorm behavior for precise sqrt: switching from FTZ (Flush To Zero) to denorm preservation.

This change addresses the issue that when there is a LoadOp and AddfOp between 2 dots in a loop, this LoadOp is not streamable in AMDGPUPipeline Pass. This case would make compile crash for erasing LoadOp which still have uses. The solution is to replace `loadToInfo` with `loadToStreamOps`, so that only erase LoadOps that are converted to Stream Ops.

This PR enables buffer atomic on RDNA4 for supported data types.

Fixes upgrade to rocm7 breaking proton tests alongside implementing CircularStoreOp for gmem  - [x] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) --------- Co-authored-by: danial javady <[email protected]>

Currently we limit WMMA v3's kWidth to be {2, 8, 16} which matches the hardware view for all possible WMMA instructions. In the case of wmma_scaled, we assume kWidth always to be 16. But in attention kernel, we can use kWidth = 8 which will remove the layout convert between 2 dots. This does not match the hardware view for continuous elements from k dimension, but we can still get correct results unless the kWidth for 2 operands are the same. This PR removes the kWidth check for WMMA v3 and makes it mandatory, same as MFMA.

…788) Broadcasts in the `block` dimensions are not redundant so we should not mask them. This way each CTA has their own copy in shared memory, note that the multicast mask will be set in such cases to efficiently load the data.

We currently force initialisation of operands that have not yet been visited with `setToEntryState`. This means that the order in which values are visited can change the results of the analysis. This can be a source of bugs. For example, the lowering for `AsyncCopyGlobalToLocalOp` validates that the load addresses permit sufficient vectorisation, however, this is up to the analysis actually recovering the same information it had when the async copy was created. Otherwise, we crash during lowering. I have an actual repro for this but it has been very difficult to minimise it enough to make it suitable for an lit test: https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3 Populating the operands in this way also means that we have to handle control flow like `ForOp` and `IfOp` explicitly in `setToEntryState`, because we may attempt to populate their results when we visit their users. Instead, when we encounter an operation whose operands have not yet been encountered, skip over the operation entirely. We can revisit it once the operands have actually been visited. This improves the quality of the analysis, and leaves the handling of control flow to the dataflow framework. This reland adds handling for the case where the dataflow analysis fails to initialise a particular value (likely because it is determined to be dead).

any mxfp where natively supported requires using the persistent matmul kernel. in these cases, do not use heuristics to resolve `is_persistent`

…is (#8758)" This reverts commit 31281bc.

anmyachev added 4 commits November 28, 2025 16:17

Reapply "[LAYOUTS] Make CTALayout an honest-to-goodness LinearLayout …

b82b6a8

…(#8770)" This reverts commit 64bcc99.

[Intel] Use 'CTAEncodingAttr' after '49b7472'

8dc24ec

Signed-off-by: Anatoly Myachev <[email protected]>

fix tests

0de3a86

Signed-off-by: Anatoly Myachev <[email protected]>

more fixes && new 'rank' function for 'IntelDPASLayout'

e5d0ec4

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev force-pushed the amyachev/merge111 branch from 8ae983f to e186c0f Compare December 1, 2025 10:38

ThomasRaoux and others added 11 commits December 1, 2025 10:59

[BACKEND] run remove backward prop until a fix point (#8776)

ce3d636

[AMD] Preserve Denorms for precise sqrt (#8697)

9fb66fb

This commit modifies the denorm behavior for precise sqrt: switching from FTZ (Flush To Zero) to denorm preservation.

[consan] Handle all tmem allocations (#8787)

20095e6

[AMD] Enabling Buffer Atomic for RDNA4 (#8778)

f7a199d

This PR enables buffer atomic on RDNA4 for supported data types.

[KERNELS] fix persistent matmul heuristics (#8791)

6de4a5d

any mxfp where natively supported requires using the persistent matmul kernel. in these cases, do not use heuristics to resolve `is_persistent`

Revert "[Reland] Fix handling of unvisited operands in AxisInfoAnalys…

84fbef0

…is (#8758)" This reverts commit 31281bc.

anmyachev force-pushed the amyachev/merge111 branch from e186c0f to 84fbef0 Compare December 1, 2025 10:59

anmyachev closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `6294db5` #5583

Merge OpenAI Triton commit `6294db5` #5583

Uh oh!

anmyachev commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Merge OpenAI Triton commit 6294db5 #5583

Merge OpenAI Triton commit 6294db5 #5583

Uh oh!

Conversation

anmyachev commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Merge OpenAI Triton commit `6294db5` #5583

Merge OpenAI Triton commit `6294db5` #5583