[AMDGPU][LDS] Enable global load DMA by default on gfx950+ by lialan · Pull Request #23230 · iree-org/iree

lialan · 2026-01-21T18:49:32Z

Automatically use coalesced global load DMA for matmul/IGEMM on CDNA4+ architectures.
Falls back to standard promotion and avoid using LDS DMA when source comes from tensor.pad or when padding is required.

ci-extra: test_amd_mi355

kuhar

There are some clang-tidy warnings

kuhar · 2026-01-24T01:40:21Z

compiler/src/iree/compiler/Codegen/Common/GPU/AMDGPULowerCoalescedDMAToGatherLDS.cpp

+    if (*maybeChipset < kGfx950) {
+      LDBG() << "Target arch " << targetArch << " is not CDNA4+, skipping pass";
+      return;


Can we make sure this is not accidentally enabled on rdna cards? Would it be possible to have a lit test for this?

Indeed, so let's just restrict it to gfx950 only. I will add a test.

Added some guard to limit to gfx950+ but also arch that has global load lds instructions.

Is there something else that tells us if DMA is available? Maybe we could check for the dma_sizes target attribute?

now only enabled when it is both:

gfx 950+

have DMA size >= 128bit.

We could actually remove the gfx950 check - "DMA sizes >= 128 bits" might actually be a good condition to use here

It could make things a touch awkward if we want to not use this op on gfx1250 and need to do phase ordering about it, but for now that's a sufficient condition that doesn't involve mentioning an architecture by name

krzysz00 · 2026-01-24T05:12:45Z

Just going to stick a call for benchmarks here

compiler/src/iree/compiler/Codegen/Common/GPU/AMDGPULowerCoalescedDMAToGatherLDS.cpp

qedawkins

I'm closing my eyes to the amdgpu chipset hardcoding in the GPU dialect since the whole op needs to move anyway, but there's no way to do that right now.

I'll also echo Krzysztof's request. Providing some basic benchmarking results with a feature enablement acts as proof that the feature is working as intended. Easily reproducible results is best, but even a small hand picked sweep is enough.

compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp

compiler/src/iree/compiler/Codegen/Common/GPU/GPUConvertToCoalescedDMA.cpp

lialan · 2026-01-28T01:21:06Z

@qedawkins @krzysz00

Column Z and Column AA are baseline and the new results.

Overall it is positive, but very much diminished by a lot of regressions. So I am investigating those regressions.

qedawkins · 2026-01-28T19:43:30Z

@qedawkins @krzysz00 here are the benchmark numbers using turbine.

Column Z and Column AA are baseline and the new results.

Overall it is positive, but very much diminished by a lot of regressions. So I am investigating those regressions.

Awesome, thanks for running the sweep. Getting a head start on the regressions sounds great, thanks!

lialan · 2026-02-05T04:14:00Z

#23365 tries to enable DMA for unaligned cases, so we should see if we can merge that before we merge this one.

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>

lialan force-pushed the users/lialan/avoid_dma_when_pad branch 2 times, most recently from de197c4 to 761329c Compare January 22, 2026 00:02

lialan changed the title ~~[AMDGPU][LDS] Do not use DMA in the presence of tensor.pad~~ [AMDGPU][LDS] Turn on coalesced gather dma by default Jan 22, 2026

lialan force-pushed the users/lialan/avoid_dma_when_pad branch 4 times, most recently from 1885d10 to 198bdd3 Compare January 23, 2026 19:48

lialan changed the title ~~[AMDGPU][LDS] Turn on coalesced gather dma by default~~ [AMDGPU][LDS] Enable global load DMA by default Jan 23, 2026

lialan changed the title ~~[AMDGPU][LDS] Enable global load DMA by default~~ [AMDGPU][LDS] Enable global load DMA by default on gfx950+ Jan 23, 2026

lialan marked this pull request as ready for review January 24, 2026 01:34

lialan requested review from Groverkss, Max191, bjacob, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners January 24, 2026 01:34

kuhar reviewed Jan 24, 2026

View reviewed changes

kuhar reviewed Jan 27, 2026

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/AMDGPULowerCoalescedDMAToGatherLDS.cpp Outdated Show resolved Hide resolved

qedawkins requested changes Jan 27, 2026

View reviewed changes

lialan force-pushed the users/lialan/avoid_dma_when_pad branch 2 times, most recently from 21519bf to 49eafdf Compare January 28, 2026 00:20

lialan marked this pull request as draft January 28, 2026 02:43

lialan mentioned this pull request Jan 29, 2026

[AMDGPU] Remaining LDS DMA tasks #23327

Open

5 tasks

Yu-Zhewen mentioned this pull request Feb 2, 2026

[Draft][GPU] Add LDS DMA alignment constraints to tiling heuristics #23357

Draft

[LDS] Do not use DMA in the presence of tensor.pad

97677a9

lialan and others added 6 commits February 6, 2026 14:45

Revert accidentally updated part.

a491d35

Hack.

3224df8

Remove conflicts.

56f7dac

Use another approach to work.

117ad68

check source buffer type

e63f1ef

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>

fix lit tests

5a5b9af

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>

Yu-Zhewen force-pushed the users/lialan/avoid_dma_when_pad branch from 344d4bd to 5a5b9af Compare February 6, 2026 21:14

Conversation

lialan commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

kuhar Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lialan Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lialan Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

kuhar Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

lialan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Jan 24, 2026

Uh oh!

Uh oh!

qedawkins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lialan commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qedawkins commented Jan 28, 2026

Uh oh!

lialan commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lialan commented Jan 21, 2026 •

edited

Loading

lialan commented Jan 28, 2026 •

edited

Loading