-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts #130756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
8f23b43
[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts
huntergr-arm 55541fb
Use pseudo-legalization from base getCastInstrCost
huntergr-arm dc0174d
Use poison instead of undef for tests
huntergr-arm 8d0fef2
Revert poison change for existing tests, introduce symbolic constants…
huntergr-arm 29c7fff
Update llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
huntergr-arm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the cost-model is a bit of a guessing game, but is there any rationale behind picking a factor of 3? (i.e. why the cost is 12 instead of 4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fcvt instructions seem to have 1/2 to 1/8 the throughput (depending on type) compared to simple arithmetic instructions, e.g.
add, so I bumped the cost of those. The numbers may not be the best overall, but don't seem to lead to regressions at present. We may want to try a range of values at some point to see if there's a better estimate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the cost of converts of the 'not too wide' types then also be increased to reflect a higher reciprocal cost?
e.g. I see a cost of 1 for:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just discussed this offline, but just sharing my thoughts here: IMO the table should represent the cost of casts of legal types. Illegal types should be handled by generic code that multiplies the cost by the 'type legalization cost'. This is actually what happens for fixed-length types (see the code just below the table), but not (yet) for scalable types. Otherwise, any other illegal types that are not in the table (which includes types that cannot be represented by MVTs because they're "too wide") will get some default cost, which may be far too low.
It also seems that
SINT_TO_FPrecords are missing in the table for scalable vector types (only FP_TO_SINT is handled). This is probably just a historical omission because this table gets updated/botched on an ad-hoc basis when people find that the cost is wrong for some workload, for some type and operation. It would be nice to clean this up.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've changed the approach slightly to use the pseudo-legalization from the base
getCastInstrCostthat NEON uses (note the code below the table is about using SVE for fixed-length, so doesn't always apply).Using this approach, we'll still get some illegal types (e.g. mapping nxv2i16 -> nxv2f64, the input would be promoted to nxv2i64 but that's not done in the current code for NEON), but I'm covering the cases where the destination type is legal.
I've decided to back away from increasing the cost of direct fcvts here – even though they have less throughput than
add, the NEON values are not written with that in mind so we might incorrectly decide to favour NEON (or scalar) code.I'll rerun some benchmarking with these adjusted values to see whether there's any regression from doing this.