-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[llvm] Fix crash when complex deinterleaving operates on an unrolled loop #129735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
NickGuy-Arm
merged 5 commits into
llvm:main
from
NickGuy-Arm:complex-deinterleaving-unroll-crash
Mar 19, 2025
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
4ab5fed
[llvm] Fix crash when complex deinterleaving operates on an unrolled …
NickGuy-Arm 0305d3b
Add test
NickGuy-Arm 3ed40f4
Remove debug statement
NickGuy-Arm df6dbba
Prevent ReductionSingle operations from resulting in non-scalar values
NickGuy-Arm 7a7ff04
[NFC] Add TODO
NickGuy-Arm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I replace the
addin this test, by asub, the pass still crashes, so this is not sufficient.Does it matter what the operation (the one outside the loop) actually is?
I would have expected something like this:
to use
cdotinstructions as well, but this case also seems to crash. This suggests that the issue is not to do with unrolling, but rather with the user outside the loop being anything else than a reduction?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a separate issue; I wouldn't expect that snippet to be processed as the current implementation would require changing the function return type. Whereas the issue this patch is aimed at fixing is when it tries to change one operand of an add with a value of a different type.
If this were to instead reinterleave and store the complex result in
middle.block, instead of returning it, then I would expect the pass to process it and emitcdotinstructions.I'm not sure if the loop vectorizer would ever emit a sub here. Please do correct me if I'm wrong, but I'm not seeing any
VECREDUCE_ADDorvecreduce.addequivalent for subtraction, and the instruction of%bin.rdxin this case is derived from the reduction intrinsic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regular reductions (without cdot), we needed to analyse and rewrite use outside of the loop due to Real and Imaginary part extraction. See cases in
complex-deinterleaving-reductions.ll. But forcdot, we don't need to do any of that. Here's a test fromcomplex-deinterleaving-cdot.ll:It is currently transformed into:
But instead, we could ignore everything happening after the final llvm.experimental.vector.partial.reduce.add and just put one cdot on another:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @igogo-x86, I agree it makes more sense to feed the result from one
cdotinto the other, rather than changing the PHI node to be a wider type.My understanding is that the Complex Deinterleaving pass was created because for certain operations (like reductions) where there is both a PHI+reduction for the imaginary and one PHI+reduction for the real part, it is better to keep the intermediate values interleaved, because the
cmlainstruction takes interleaved tuples as input and returns interleaved tuples as output. This avoids having to deinterleave values first and it also allows using specialisedcmlainstructions to do the complex MLA operation. The reduction PHI then contains a vector of<(r, i), (r, i), ..>tuples, which need de-interleaving only when doing the final reduction.For the case of
cdotinstructions there is no need for this, because the result vector will always be deinterleaved (thecdotinstruction returns either a widened real, or a widened imaginary result).If that understanding is correct, then I don't really see a need to implement this optimization in the ComplexDeinterleave pass. This looks more like a DAGcombine of
partialreduce(mul(ext(deinterleave(a)), ext(deinterleave(b))) -> cdot(a, b, #0)(with some variation of this pattern for other rotations). With the new ISD nodeISD::PARTIAL_REDUCE_[U|S]MLAadded by @JamesChesterman this should be even easier to identify.Please let me know if I'm missing anything here though.