[Codegen] Drop the workaround from EmulateNarrowType pass. #23319

hanhanW · 2026-01-29T02:26:41Z

llvm/llvm-project@2de936b is a reasonable support, but it breaks IREE. The reason is that it is conversative about offsets, while IREE usually tiles the dimensions to be aligned with native vector size. In this context, the stores are always aligned store. The upstream has assumeAligned mode, so we drop the workaround.

Below is an example (from #20645) that shows the dynamic offset %arg2 is always aligned with bytes.

func.func @main(%arg0: memref<8xf32>, %arg1: memref<8xi4>) {
  %c4 = arith.constant 4 : index
  %c8 = arith.constant 8 : index
  %c0 = arith.constant 0 : index
  scf.for %arg2 = %c0 to %c8 step %c4 {
    %0 = vector.load %arg0[%arg2] : memref<8xf32>, vector<4xf32>
    %1 = arith.fptoui %0 : vector<4xf32> to vector<4xi32>
    %2 = arith.trunci %1 : vector<4xi32> to vector<4xi4>
    vector.store %2, %arg1[%arg2] : memref<8xi4>, vector<4xi4>
  }
  return
}

Closes #20645

hanhanW · 2026-01-29T02:29:55Z

It requires llvm/llvm-project#178565

Signed-off-by: hanhanW <[email protected]>

MaheshRavishankar · 2026-01-30T01:21:46Z

compiler/src/iree/compiler/Codegen/Common/EmulateNarrowType.cpp

  populateIREEResolveExtractStridedMetadataPatterns(patterns);
-  vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns);
+  vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns,
+                                                    /*disableAtomicRMW=*/false,


Are you sure you want to use AtomicRMW. That can be expensive? My understanding is that we should just vectorize to a size where we dont need atomics.

This is default option; we already use it. We can switch in a follow up.

I forgot to say that I agree with you and it should be happening on CPU side. We'll need the switch or expose the option.

I also have a local patch that makes VMVX happy for all the subtypes we have, and it doesn't use atomicRMW.

THis is common CPU/GPU path. We can tolerate some regression on CPU side, but regression on GPU side is more problematic. If there was a known correctness issue, that is one thing, but do we have a known correctness issue here on GPU?

I did not see any failure locally when I did the switch. I need to test more, but GPU test coverage is lower than I expected. I'm not aware of any correctness issue and I did not find any on Github issues. GPU does not work for other fp8 types, btw. See the table in #23238 (comment)

Type gfx908/90a gfx942 gfx950 gfx11xx gfx12xx

f8E4M3FNUZ emu hw emu emu emu

f8E5M2FNUZ emu hw emu emu emu

f8E4M3FN emu emu hw emu hw

f8E5M2 emu emu hw emu hw

f4E2M1FN emu emu hw emu hw

emu means that it is not supported without the PR.

What I wanted to say is that it's been using AtomicRMW mode for a long time. I'll create a PR to do the experiment, let's move forward to drop the workaround?

I think the tolerance won't be impacted if the tiling config is correct. The heuristic should take it into account, like what I observed in #20645 (comment). If not, it is a bug to me and we need to fix it. Or we expose the option and we always enable the flag on GPU.

sharktank tests are red because of azure issue, other tests look okay: #23344

hanhanW force-pushed the users/hanhanW/fix-narrow-type-issue-20645 branch from 9f2c4ab to e797a2c Compare January 29, 2026 02:28

Drop workaround

012b5d2

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the users/hanhanW/fix-narrow-type-issue-20645 branch from e797a2c to 012b5d2 Compare January 29, 2026 23:29

hanhanW marked this pull request as ready for review January 29, 2026 23:30

hanhanW requested review from MaheshRavishankar, Max191 and qedawkins as code owners January 29, 2026 23:30

MaheshRavishankar reviewed Jan 30, 2026

View reviewed changes

hanhanW requested a review from MaheshRavishankar January 30, 2026 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Drop the workaround from EmulateNarrowType pass. #23319

[Codegen] Drop the workaround from EmulateNarrowType pass. #23319

Uh oh!

hanhanW commented Jan 29, 2026 •

edited

Loading

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

MaheshRavishankar Jan 30, 2026

Uh oh!

hanhanW Jan 30, 2026

Uh oh!

hanhanW Jan 30, 2026

Uh oh!

MaheshRavishankar Jan 30, 2026

Uh oh!

hanhanW Jan 30, 2026

Uh oh!

hanhanW Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Type	gfx908/90a	gfx942	gfx950	gfx11xx	gfx12xx
f8E4M3FNUZ	emu	hw	emu	emu	emu
f8E5M2FNUZ	emu	hw	emu	emu	emu
f8E4M3FN	emu	emu	hw	emu	hw
f8E5M2	emu	emu	hw	emu	hw
f4E2M1FN	emu	emu	hw	emu	hw

[Codegen] Drop the workaround from EmulateNarrowType pass. #23319

Are you sure you want to change the base?

[Codegen] Drop the workaround from EmulateNarrowType pass. #23319

Uh oh!

Conversation

hanhanW commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

MaheshRavishankar Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanhanW commented Jan 29, 2026 •

edited

Loading