-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[RISCV][TTI] Enable masked interleave access for scalable vector #149981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1359,7 +1359,9 @@ class LoopVectorizationCostModel { | |||||||||||||||||
return; | ||||||||||||||||||
// Override EVL styles if needed. | ||||||||||||||||||
// FIXME: Investigate opportunity for fixed vector factor. | ||||||||||||||||||
// FIXME: Support interleave accesses. | ||||||||||||||||||
bool EVLIsLegal = UserIC <= 1 && IsScalableVF && | ||||||||||||||||||
!InterleaveInfo.hasGroups() && | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a bit unfortunate the timing of these patches, but I just LGTM'd #150074, since I think we don't actually need the EVLIsLegal check, it should be ok to keep around the mask for now and we can optimise it away to a VP intrinsic later. I guess it looks like we also need to account for the fact that we don't handle masked.{load,store} for shufflevector [de]interleaves? In any case we still need to do the actual VP intrinsic transform so I will create an issue on github for that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’ve completed the simplest initial implementation of VPInterleaveEVLRecipe in my local, but I found it couldn’t be converted into EVL recipe. That’s when I realized that masked interleave access hasn’t been enabled upstream yet. However, I believe enabling both tail folding by mask and EVL at the same time is unsafe. If there is an interleaved store with factor 2 using the values produced by the VPWidenInductionRecipe, and apply interleaved masks [T, T, T, T, T, T, T, T] and [T, T, F, F, F, F, F, F], we might end up storing incorrect values like [0, 0, 1, 1, 2, 2, 3(X), 3(X)] and [3, 3, X, X, X, X, X, X]. Wouldn’t that lead to storing incorrect values in memory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC the mask is interleaved with itself before being fed into the widen load, i.e. see these lines: https://github.com/llvm/llvm-project/pull/150074/files#diff-8b722352f6c665b720f4318006f7ae773cfaddd722fd211fdcf8ef18d8d115dcR42-R43 So the widen load's mask is actually The interleaved access pass then checks to make sure that the mask is actually interleaved before deinterleaving it again for the vlseg/vsseg intrinsic: llvm-project/llvm/lib/CodeGen/InterleavedAccessPass.cpp Lines 536 to 543 in b7889a6
So it should emit a vlseg with mask There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, although I previously said the mask might be incorrect, it’s actually different from what you described.
In reality, the two masks should be: [T, T, T, T, T, T, T, T] and [T, T, T, T, F, F, F, F]. For the first iteration: For the second iteration: Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you're right. This looks like an underlying issue with mixing the header masks and EVL based IV. I think we need to convert the header masks from Because on the second-to-last iteration the original header mask isn't going to take into account the possible truncation. So on the first iteration, it should really be: [0, 1, 2, 3] < 3 = [T, T, T, F] And on the second iteration [0, 1, 2, 3] < 2 = [T, T, F, F]. I'll create an issue for this, this is a good find. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But thinking about it more carefully, since the GEP has already been adjusted based on the EVL base PHI, even though the header mask is not in sync with the EVL, the final store result in the example should still be correct. It's just that addresses which were originally written to only once may now be written to multiple times. The value written the first time might be incorrect, but it will eventually be overwritten with the correct value. However, I'm not sure if this could cause other issues. In any case, I still recommend avoiding mixing tail folding modes—it doesn’t seem entirely safe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just FYI, this reasoning is suspect. Introducing a duplicate write to the same location is possibly a violation of the memory model. I'd have to think that through carefully to see if it was, but it's definitely undesirable if we can reasonable avoid. (Topic switch) If I'm following this thread correctly, the issue being theorized here is a problem for any interleave group with EVL right? Not just the masked variants? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replying to myself after reading more context - yeah, this is a generic EVL bug. @Mel-Chen Would you mind separating your EVLIsLegal change into a separate review with a test case which shows the bug? I'd like to get EVL disabled for this case while Luke's correctness change works it way through review. Edit: Once we do that, I can enable the masked interleave support (for what ends up being masking and non-predicated only), then we can remove the limitation once the root issue is properly fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about this later today, and realized I'd missed the obvious here. While the bug is theoretically reachable through other paths, the combination of predicatation and interleave-groups currently only happens with this patch to enable the predication. EVL without this patch can't hit this code path; I'd missed that when thinking about this earlier. @Mel-Chen I'm tempted to just take your patch as the primary path forward here towards enabling masking interleave. Any concerns with that? On reflection, I don't know the splitting I'd suggested earlier is actually worthwhile. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||
TTI.hasActiveVectorLength() && !EnableVPlanNativePath; | ||||||||||||||||||
if (EVLIsLegal) | ||||||||||||||||||
return; | ||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I've landed the last change for the fixed vector path. You can delete UseMaskForCand entirely if you want, if not, I'll do it as a follow up review.