-
Notifications
You must be signed in to change notification settings - Fork 830
[Codegen] Drop the workaround from EmulateNarrowType pass. #23319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Codegen] Drop the workaround from EmulateNarrowType pass. #23319
Conversation
9f2c4ab to
e797a2c
Compare
|
It requires llvm/llvm-project#178565 |
Signed-off-by: hanhanW <[email protected]>
e797a2c to
012b5d2
Compare
| populateIREEResolveExtractStridedMetadataPatterns(patterns); | ||
| vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns); | ||
| vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns, | ||
| /*disableAtomicRMW=*/false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to use AtomicRMW. That can be expensive? My understanding is that we should just vectorize to a size where we dont need atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is default option; we already use it. We can switch in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to say that I agree with you and it should be happening on CPU side. We'll need the switch or expose the option.
I also have a local patch that makes VMVX happy for all the subtypes we have, and it doesn't use atomicRMW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis is common CPU/GPU path. We can tolerate some regression on CPU side, but regression on GPU side is more problematic. If there was a known correctness issue, that is one thing, but do we have a known correctness issue here on GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see any failure locally when I did the switch. I need to test more, but GPU test coverage is lower than I expected. I'm not aware of any correctness issue and I did not find any on Github issues. GPU does not work for other fp8 types, btw. See the table in #23238 (comment)
| Type | gfx908/90a | gfx942 | gfx950 | gfx11xx | gfx12xx |
|---|---|---|---|---|---|
| f8E4M3FNUZ | emu | hw | emu | emu | emu |
| f8E5M2FNUZ | emu | hw | emu | emu | emu |
| f8E4M3FN | emu | emu | hw | emu | hw |
| f8E5M2 | emu | emu | hw | emu | hw |
| f4E2M1FN | emu | emu | hw | emu | hw |
emu means that it is not supported without the PR.
What I wanted to say is that it's been using AtomicRMW mode for a long time. I'll create a PR to do the experiment, let's move forward to drop the workaround?
I think the tolerance won't be impacted if the tiling config is correct. The heuristic should take it into account, like what I observed in #20645 (comment). If not, it is a bug to me and we need to fix it. Or we expose the option and we always enable the flag on GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sharktank tests are red because of azure issue, other tests look okay: #23344
llvm/llvm-project@2de936b is a reasonable support, but it breaks IREE. The reason is that it is conversative about offsets, while IREE usually tiles the dimensions to be aligned with native vector size. In this context, the stores are always aligned store. The upstream has
assumeAlignedmode, so we drop the workaround.Below is an example (from #20645) that shows the dynamic offset
%arg2is always aligned with bytes.Closes #20645