[RELAND][LAYOUTS] Generate distributed layouts for `tcgen05.ld/st` generically (#8421) #8495

lezcano · 2025-10-20T22:15:53Z

This PR relands #8386.
It depends on #8492 to avoid regressing in some workloads.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lib/Dialect/TritonNvidiaGPU/IR/Dialect.cpp

peterbell10 · 2025-10-27T16:33:51Z

python/src/gluon_ir.cc

+                        ttng::TritonNvidiaGPUDialect, gluon::GluonDialect>();
+        MLIRContext context(MLIRContext::Threading::DISABLED);
+        context.appendDialectRegistry(registry);
+        context.loadAllAvailableDialects();


Can you add an overload of getDistributedLayoutForTmemLdSt that doesn't require MemDesc and CTALayoutAttr so we don't need to load the dialects?

Context creation alone is cheap (30 us) but once we load the dialects as well it goes up to 380 us.

Hrm actually I guess we would still need the dialects for the layout attrs either way.

Perhaps we could have a global "default context" that is used here?

This function is going to be called, in very complex kernels, at most 3 times. I'd say we leave it as-is and if we ever find that this is an issue we see how to fix it?

do you need to load the dialect if all we care about is string attributes?

We could implement what Peter suggested in the first comment. Let's do it in a follow-up tho

#8421) We move the previous handwritten logic to infer good distributed layout for TMEM layouts to a generic logic. This proves to be more robust than the previous one, as we see in the lit tests, where we are able to get full vectorisation in many cases that we didn't before. Writing this generically also allows us to add support for the two remaining tcgen05.ld/st instructions that were missing. We align the verifier and the lowering as to now error out during verification if and only if we would not be able to lower the given layout. We expose this function in gluon as well and kill the duplicated logic that we had in gluon in favour of the generic logic. **There is just one semantic change** (the rest is generalisation / strengthening): We now generate distributed layouts that generate the fully vectorised load/store instructions (i.e. they would generate just one load/store instruction to load all the registers). As we see in the `_blackwell.mlir` cases, this did not use to be the case. That being said, in some cases the previous heuristics allowed to `tt.split` the tensor along the second dimension, while now this may not be the case. If one wants to perform this splitting game, they need to modify a bit the layout as we do when calling `32x32b_splitn` in the test `test_tmem_subslice_block_m_64`. This change goes in line with the rest of the heuristics, where we have full vectorisation by default, and we expose other layouts for other use cases like `splitLongM`.

lezcano requested review from peterbell10 and ptillet as code owners October 20, 2025 22:15

lezcano changed the title ~~[LAYOUTS] Generate distributed layouts for tcgen05.ld/st generically (#8421)~~ [RELAND][LAYOUTS] Generate distributed layouts for tcgen05.ld/st generically (#8421) Oct 20, 2025

lezcano requested a review from ThomasRaoux October 20, 2025 22:17

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

lib/Dialect/TritonNvidiaGPU/IR/Dialect.cpp Show resolved Hide resolved

ThomasRaoux approved these changes Oct 24, 2025

View reviewed changes

lezcano force-pushed the reland_ldst branch from 7b627cf to 342c9a3 Compare October 27, 2025 09:55

peterbell10 reviewed Oct 27, 2025

View reviewed changes

lezcano force-pushed the reland_ldst branch from 0c42c1d to c7c2e7d Compare October 28, 2025 09:37

lezcano and others added 10 commits October 29, 2025 13:10

Tighten the possible layouts we choose from

9216fb6

fix

10a4918

Implement get_tmem_reg_layout as a constexpr_function

23adfea

fix

1c1b062

fix

255e491

don't assert

6582145

allow splitting layouts with no 1 register

cb9b152

fix

e35c16e

comment

86f8469

lezcano force-pushed the reland_ldst branch from b63a1b8 to 86f8469 Compare October 29, 2025 13:10

minor

272769a

lezcano merged commit b620136 into main Oct 29, 2025
9 checks passed

lezcano deleted the reland_ldst branch October 29, 2025 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RELAND][LAYOUTS] Generate distributed layouts for `tcgen05.ld/st` generically (#8421) #8495

[RELAND][LAYOUTS] Generate distributed layouts for `tcgen05.ld/st` generically (#8421) #8495

Uh oh!

lezcano commented Oct 20, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

peterbell10 Oct 27, 2025

Uh oh!

peterbell10 Oct 27, 2025

Uh oh!

lezcano Oct 27, 2025

Uh oh!

ThomasRaoux Oct 27, 2025

Uh oh!

lezcano Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RELAND][LAYOUTS] Generate distributed layouts for tcgen05.ld/st generically (#8421) #8495

[RELAND][LAYOUTS] Generate distributed layouts for tcgen05.ld/st generically (#8421) #8495

Uh oh!

Conversation

lezcano commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

peterbell10 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

peterbell10 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RELAND][LAYOUTS] Generate distributed layouts for `tcgen05.ld/st` generically (#8421) #8495

[RELAND][LAYOUTS] Generate distributed layouts for `tcgen05.ld/st` generically (#8421) #8495

lezcano commented Oct 20, 2025 •

edited

Loading