Skip to content

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jan 29, 2026

llvm/llvm-project@2de936b is a reasonable support, but it breaks IREE. The reason is that it is conversative about offsets, while IREE usually tiles the dimensions to be aligned with native vector size. In this context, the stores are always aligned store. The upstream has assumeAligned mode, so we drop the workaround.

Below is an example (from #20645) that shows the dynamic offset %arg2 is always aligned with bytes.

func.func @main(%arg0: memref<8xf32>, %arg1: memref<8xi4>) {
  %c4 = arith.constant 4 : index
  %c8 = arith.constant 8 : index
  %c0 = arith.constant 0 : index
  scf.for %arg2 = %c0 to %c8 step %c4 {
    %0 = vector.load %arg0[%arg2] : memref<8xf32>, vector<4xf32>
    %1 = arith.fptoui %0 : vector<4xf32> to vector<4xi32>
    %2 = arith.trunci %1 : vector<4xi32> to vector<4xi4>
    vector.store %2, %arg1[%arg2] : memref<8xi4>, vector<4xi4>
  }
  return
}

Closes #20645

@hanhanW hanhanW force-pushed the users/hanhanW/fix-narrow-type-issue-20645 branch from 9f2c4ab to e797a2c Compare January 29, 2026 02:28
@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 29, 2026

It requires llvm/llvm-project#178565

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW force-pushed the users/hanhanW/fix-narrow-type-issue-20645 branch from e797a2c to 012b5d2 Compare January 29, 2026 23:29
@hanhanW hanhanW marked this pull request as ready for review January 29, 2026 23:30
populateIREEResolveExtractStridedMetadataPatterns(patterns);
vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns);
vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns,
/*disableAtomicRMW=*/false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to use AtomicRMW. That can be expensive? My understanding is that we should just vectorize to a size where we dont need atomics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is default option; we already use it. We can switch in a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to say that I agree with you and it should be happening on CPU side. We'll need the switch or expose the option.

I also have a local patch that makes VMVX happy for all the subtypes we have, and it doesn't use atomicRMW.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis is common CPU/GPU path. We can tolerate some regression on CPU side, but regression on GPU side is more problematic. If there was a known correctness issue, that is one thing, but do we have a known correctness issue here on GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see any failure locally when I did the switch. I need to test more, but GPU test coverage is lower than I expected. I'm not aware of any correctness issue and I did not find any on Github issues. GPU does not work for other fp8 types, btw. See the table in #23238 (comment)

Type gfx908/90a gfx942 gfx950 gfx11xx gfx12xx
f8E4M3FNUZ emu hw emu emu emu
f8E5M2FNUZ emu hw emu emu emu
f8E4M3FN emu emu hw emu hw
f8E5M2 emu emu hw emu hw
f4E2M1FN emu emu hw emu hw

emu means that it is not supported without the PR.

What I wanted to say is that it's been using AtomicRMW mode for a long time. I'll create a PR to do the experiment, let's move forward to drop the workaround?

I think the tolerance won't be impacted if the tiling config is correct. The heuristic should take it into account, like what I observed in #20645 (comment). If not, it is a bug to me and we need to fix it. Or we expose the option and we always enable the flag on GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sharktank tests are red because of azure issue, other tests look okay: #23344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Integrate] Upstream narrow type emulation is breaking iree test

2 participants