-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[CodeGen] Use getObjectPtrOffset to generate loads/stores for mem intrinsics #80184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-backend-webassembly Author: Derek Schuff (dschuff) ChangesWhen directly generating loads/stores for small constant memset/memcpy intrinsics, See #79692 Full diff: https://github.com/llvm/llvm-project/pull/80184.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..a52bbdf92cf8d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7574,14 +7574,18 @@ static SDValue getMemcpyLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
Value = DAG.getExtLoad(
ISD::EXTLOAD, dl, NVT, Chain,
- DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
+ isDereferenceable ? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) :
+ DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
SrcPtrInfo.getWithOffset(SrcOff), VT,
commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, NewAAInfo);
OutLoadChains.push_back(Value.getValue(1));
+ isDereferenceable =
+ DstPtrInfo.getWithOffset(DstOff).isDereferenceable(VTSize, C, DL);
Store = DAG.getTruncStore(
Chain, dl, Value,
- DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+ isDereferenceable ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff)) :
+ DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, NewAAInfo);
OutStoreChains.push_back(Store);
}
@@ -7715,7 +7719,7 @@ static SDValue getMemmoveLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
MachineMemOperand::Flags SrcMMOFlags = MMOFlags;
if (isDereferenceable)
SrcMMOFlags |= MachineMemOperand::MODereferenceable;
-
+// TODO: Fix memmove too.
Value = DAG.getLoad(
VT, dl, Chain,
DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
@@ -7863,9 +7867,13 @@ static SDValue getMemsetStores(SelectionDAG &DAG, const SDLoc &dl,
Value = getMemsetValue(Src, VT, DAG, dl);
}
assert(Value.getValueType() == VT && "Value with wrong type.");
+ bool isDereferenceable = DstPtrInfo.isDereferenceable(
+ DstOff, *DAG.getContext(), DAG.getDataLayout());
SDValue Store = DAG.getStore(
Chain, dl, Value,
- DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+ isDereferenceable
+ ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff))
+ : DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), Alignment,
isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,
NewAAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
new file mode 100644
index 0000000000000..15e68ab4122f9
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mcpu=mvp -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -tail-dup-placement=0 | FileCheck %
+
+target triple = "wasm32-unknown-unknown"
+
+define void @call_memset(ptr dereferenceable(16)) #0 {
+; CHECK-LABEL: call_memset:
+; CHECK: .functype call_memset (i32) -> ()
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: i64.const $push0=, 0
+; CHECK-NEXT: i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT: i64.const $push1=, 0
+; CHECK-NEXT: i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT: return
+ call void @llvm.memset.p0.i32(ptr align 1 %0, i8 0, i32 16, i1 false)
+ ret void
+}
+
+define void @call_memcpy(ptr dereferenceable(16) %dst, ptr dereferenceable(16) %src) #0 {
+; CHECK-LABEL: call_memcpy:
+; CHECK: .functype call_memcpy (i32, i32) -> ()
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: i64.load $push0=, 8($1):p2align=0
+; CHECK-NEXT: i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT: i64.load $push1=, 0($1):p2align=0
+; CHECK-NEXT: i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT: return
+ call void @llvm.memcpy.p0.p0.i32(ptr align 1 %dst, ptr align 1 %src, i32 16, i1 false)
+ ret void
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This change as written should be straightforward, |
isDereferenceable | ||
? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) | ||
: DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should move this to a parameter of getMemBasePlusOffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's actually the only difference between the 2 functions (getObjectPtrOffset is just implemented in terms of getMemBasePlusOffset anway). So if you think it's a good idea I wouldn't mind just collapsing them as a separate refactoring (it would also fix the annoyance that the 2 functions have the same parameters but in a different order).
What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw? |
A bit more evidence in favor of this - aggregate stores already use the optimal form (godbolt link). |
The memset is dereferencing them, so yes I think this is implied |
I am getting one local test failure here, in
(where 11033905661445 is 0xA0908070605, i.e. the stored values). With this change the output for bpfel is
i.e. the 0x0A09 has been split out from the 0x8070605. I have no idea yet why this change would do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unconditionally marking these loads/stores as dereferenceable does not seem justified to me, any more than it would for a regular load/store.
(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable
metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)
I thought the point was for code motion, which is kind of useless at the use point |
It would be nice to resurrect this change... is only the I am currently working around this in a frontend, and it's a bit painful, since you need to 'unroll' your memset/memcpy using |
@jayfoad / @arsenm The code motion interpretation is the only one that makes sense to me. The description in the header says that it doesn't trap, which would allow code motion of side-effecting instructions across it. @arsenm there is one new change since I first uploaded, the one to llvm/test/CodeGen/AMDGPU/memcpy-scalar-load.ll. It looks to me like it might still be correct, but I'd appreciate if you could take a look. @yonghong-song see my comment above about the BPF test. Does any of that ring any bells to you? |
I did some investigation. 'Optimized legalized selection DAG' is responsible due to NEW code. For example, with this patch, at 'Optimized legalized selection DAG' stage:
Without this patch, at 'Optimized legalized selection DAG' stage:
But the change is OK from bpf perspective. The bpf undef.ll test diff can have
The 'memset' code is 'undefined' from linux kernel bpf verifier perspective. But from llvm compilation perspective, it is okay. |
I cannot judge your SelectionDAG change. From bpf selftest perspective, updating with new asm code is okay to me. |
The use of MODereferenceable here is incorrect, as it implies unconditional dereferenceability. The use of getObjectPtrOffset looks fine to me. |
This is sort of what I'm a bit confused about. When they are generated from memcpy, the addresses in range are in fact unconditionally dereferenced. Why is it incorrect to mark them as dereferenceable? |
If you have something like |
Got it, thanks. I've backed this PR out to just use getObjectPtrOffset. |
…rinsics (llvm#80184) This causes address arithmetic to be generated with the 'nuw' flag, allowing WebAssembly constant offset folding. Fixes llvm#79692
This causes address arithmetic to be generated with the 'nuw' flag, allowing
WebAssembly constant offset folding.
Fixes #79692