-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics #128019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
c331c4c
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics
Lukacma e5873e2
Merge branch 'main' into neon-untyped-fp8
Lukacma 2e53a86
[NeonEmitter] Update ArchGuard for MFloat8 type to ensure correct arc…
Lukacma 842f197
Chnage format of Mfloat8 type in memory back to <1xi8>
Lukacma 53202b7
Simplify assertion
Lukacma cd8af9d
remove debug guards
Lukacma a3472d7
Merge branch 'main' into neon-untyped-fp8
Lukacma fd83679
Merge branch 'main' into neon-untyped-fp8
Lukacma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is the best way to solve this issue so would appreciate your feedback on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an issue here. That is exactly what should happen regardless of the target architecture any time the ABI for that architecture says values of type
Tare passed as<1 x T>.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the ABI say this? My understand is that values of type _mfp8 are floating-point 8-bit values that are passes as _mfp8. The pretend it's an
i8in some cases and<1 x i8>in others is purely an implementation detail within clang.This is not to say the code is invalid, but we should be cautious with how far down the rabbit hole we go.
FYI: As part of @MacDue's work to improve streaming-mode code generation I asked him to add the MVT
aarch64mfp8along with support to load and store it. I expect over time we'll migrate away from usingi8as our scalar type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the fallout will be from this but I think the problem here is we should not have loaded a scalar in the first place. Looking at
CodeGenTypes::ConvertTypeForMem()I can see that we're using a different type for the memory representation than the normal one, which I think is a mistake.Changing this so the types are consistent will remove the need for this code but I suspect it'll prompt further work elsewhere. My hope is that work sits in target specific areas relating to modelling the builtin so seem reasonable. Please shout though if it starts to get out of control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. Unfortunately this discussion was split and I didn't replicate all my comments here.
I consider the "natural" mapping of
__mfp8to LLVM types to bei8and<1 x i8>to be merely a hack coming from the peculiar way of implementing ABIs in clang/llvm (by implicit contracts and "mutual understading"). As such<1 x i8>out to be applicable only for values that are arguments passed in registers.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not yet confident in my understanding of the trade-offs between the two approaches, beside that one impacts target-specific code while the other affects target-independent code. As such, I don’t feel well-positioned to contribute meaningfully to this discussion. That said, I’d appreciate it if we could reach alignment here, as I’d like to merge this patch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying storage for
__mfp8is an FPR and until we decide whether to use a dedicated target type, or LLVM gains an opaque 8-bit floating point type our only option is to represent it as an i8 vector type.The reason for using
i8was for some specific code reuse but as this PR showed, that reuse is not total and so I'd rather we just be honest and insert the relevant bitcasts when necessary. This will put us in good stead if we decide to go the target type route.