Merge OpenAI Triton commit `00cf53f` #5424

whitneywhtsang · 2025-11-04T01:14:52Z

This PR change the Triton base from c172d53 to 00cf53f (Oct 23).
Pass rate: 94.55%->94.54%

This commit adds tests for `unrealized_conversion_cast` and make the visitor in the axis analysis more robust. The input AxisInfo should not be propagated if that would inject an AxisInfo with an incorrect rank. That could cause a crash. `unrealized_conversion_cast` ops typically appear during a dialect conversion, but they are also useful for debugging / rapid prototyping purposes. It allows programmers to hand-write the expected low-level IR and connect it with high-level IR that will be lowered as usual. In such a scenario, programmers write IR such as "Case 2" in the added test case. Such IR currently crashes the axis analysis. Also fix a crash when a function call with multi-dimensional function argument is analyzed. (This is now triggered due to the improved `unrealized_conversion_cast` handling.)

@ThomasRaoux

This PR add device-side TMA support for gluon. cc @ThomasRaoux

Use the correct condition value when other value exists.

We fix a number of cases where the constancy analysis could be improved. The code is quite messy, and the whole pass could do with a full rewrite, but we are not doing so ATM. This PR was mostly vibecoded, with a cleaning pass afterwards from me.

…s. (#8512) A few tests in tensor descriptor use "cuda" as device rather than a 'device' fixture in the test arguments. This PR changes those tests to use 'device' fixture instead so that third party users without a cuda runtime can run on these tests.  # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because it is editing the test file only. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) Co-authored-by: Micah Weston <[email protected]>

Add shared memory capacity for `gfx1250` which is 320 kbyte.

This RP fixes the layout and lowering for wmma scaled with small k dim where the tensor's k dimension is smaller than the a single wmma scaled instruction's k dimension. Add corresponding lit tests for common cases.

Prevent crash when lowering memdesc of pointer

…n (#8493) TP > 1 is not supported in this mode

Signed-off-by: Whitney Tsang <[email protected]>

matthias-springer and others added 10 commits October 22, 2025 13:37

[GLUON] add device-side TMA (#8505)

56c6468

This PR add device-side TMA support for gluon. cc @ThomasRaoux

[AMD] Fix branch condition in BufferLoadToLocalOpConversion (#8501)

bad2576

Use the correct condition value when other value exists.

[AMD] Update shared memory size for gfx1250 from TargetInfo (#8517)

c07886c

Add shared memory capacity for `gfx1250` which is 320 kbyte.

[AMD] Fix wmma scaled with small k dim on gfx1250 (#8487)

4d6ce4e

This RP fixes the layout and lowering for wmma scaled with small k dim where the tensor's k dimension is smaller than the a single wmma scaled instruction's k dimension. Add corresponding lit tests for common cases.

[BACKEND] Fix memdesc of pointers (#8515)

3a832d6

Prevent crash when lowering memdesc of pointer

[NFC] Remove legacy TODO (#8520)

1c72fb6

[BENCH] Incorporate EP sharding and deprecate the legacy communicatio…

00cf53f

…n (#8493) TP > 1 is not supported in this mode

whitneywhtsang self-assigned this Nov 4, 2025

whitneywhtsang added the keep-going label Nov 4, 2025

whitneywhtsang force-pushed the whitneywhtsang/merge branch from b61c0b1 to 2ae6dd8 Compare November 4, 2025 03:26

whitneywhtsang removed the keep-going label Nov 4, 2025

whitneywhtsang requested a review from chengjunlu November 4, 2025 04:17

whitneywhtsang marked this pull request as ready for review November 4, 2025 04:19

chengjunlu approved these changes Nov 4, 2025

View reviewed changes

whitneywhtsang added 2 commits November 4, 2025 19:37

Merge commit '00cf53fe57332b463f02a427be65e36c91f544bc'

f694fd7

[TEST] xfail HW specific tests

0f499b9

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 2ff238c to 0f499b9 Compare November 4, 2025 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `00cf53f` #5424

Merge OpenAI Triton commit `00cf53f` #5424

Uh oh!

whitneywhtsang commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Merge OpenAI Triton commit 00cf53f #5424

Are you sure you want to change the base?

Merge OpenAI Triton commit 00cf53f #5424

Uh oh!

Conversation

whitneywhtsang commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Merge OpenAI Triton commit `00cf53f` #5424

Merge OpenAI Triton commit `00cf53f` #5424

whitneywhtsang commented Nov 4, 2025 •

edited

Loading