Skip to content

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Jan 31, 2024

This causes address arithmetic to be generated with the 'nuw' flag, allowing
WebAssembly constant offset folding.

Fixes #79692

@llvmbot
Copy link
Member

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-webassembly

Author: Derek Schuff (dschuff)

Changes

When directly generating loads/stores for small constant memset/memcpy intrinsics,
this change as written uses DAG.getObjectPtrOffset to generate address arithmetic
with 'nuw' when the src/dst pointers are known to be dereferenceable.
For WebAssembly, this allows the arithmetic to be folded directly into the load/store
constant offset field.

See #79692


Full diff: https://github.com/llvm/llvm-project/pull/80184.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+12-4)
  • (added) llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll (+30)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..a52bbdf92cf8d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7574,14 +7574,18 @@ static SDValue getMemcpyLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
 
       Value = DAG.getExtLoad(
           ISD::EXTLOAD, dl, NVT, Chain,
-          DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
+          isDereferenceable ? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff)) :
+            DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
           SrcPtrInfo.getWithOffset(SrcOff), VT,
           commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, NewAAInfo);
       OutLoadChains.push_back(Value.getValue(1));
 
+      isDereferenceable =
+        DstPtrInfo.getWithOffset(DstOff).isDereferenceable(VTSize, C, DL);
       Store = DAG.getTruncStore(
           Chain, dl, Value,
-          DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+          isDereferenceable ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff)) :
+            DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
           DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, NewAAInfo);
       OutStoreChains.push_back(Store);
     }
@@ -7715,7 +7719,7 @@ static SDValue getMemmoveLoadsAndStores(SelectionDAG &DAG, const SDLoc &dl,
     MachineMemOperand::Flags SrcMMOFlags = MMOFlags;
     if (isDereferenceable)
       SrcMMOFlags |= MachineMemOperand::MODereferenceable;
-
+// TODO: Fix memmove too.
     Value = DAG.getLoad(
         VT, dl, Chain,
         DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
@@ -7863,9 +7867,13 @@ static SDValue getMemsetStores(SelectionDAG &DAG, const SDLoc &dl,
         Value = getMemsetValue(Src, VT, DAG, dl);
     }
     assert(Value.getValueType() == VT && "Value with wrong type.");
+    bool isDereferenceable = DstPtrInfo.isDereferenceable(
+        DstOff, *DAG.getContext(), DAG.getDataLayout());
     SDValue Store = DAG.getStore(
         Chain, dl, Value,
-        DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
+        isDereferenceable
+            ? DAG.getObjectPtrOffset(dl, Dst, TypeSize::getFixed(DstOff))
+            : DAG.getMemBasePlusOffset(Dst, TypeSize::getFixed(DstOff), dl),
         DstPtrInfo.getWithOffset(DstOff), Alignment,
         isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone,
         NewAAInfo);
diff --git a/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
new file mode 100644
index 0000000000000..15e68ab4122f9
--- /dev/null
+++ b/llvm/test/CodeGen/WebAssembly/mem-intrinsics-offsets.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mcpu=mvp -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -tail-dup-placement=0 | FileCheck %
+
+target triple = "wasm32-unknown-unknown"
+
+define void @call_memset(ptr dereferenceable(16)) #0 {
+; CHECK-LABEL: call_memset:
+; CHECK:         .functype call_memset (i32) -> ()
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    i64.const $push0=, 0
+; CHECK-NEXT:    i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT:    i64.const $push1=, 0
+; CHECK-NEXT:    i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT:    return
+    call void @llvm.memset.p0.i32(ptr align 1 %0, i8 0, i32 16, i1 false)
+    ret void
+}
+
+define void @call_memcpy(ptr dereferenceable(16) %dst, ptr dereferenceable(16) %src) #0 {
+; CHECK-LABEL: call_memcpy:
+; CHECK:         .functype call_memcpy (i32, i32) -> ()
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    i64.load $push0=, 8($1):p2align=0
+; CHECK-NEXT:    i64.store 8($0):p2align=0, $pop0
+; CHECK-NEXT:    i64.load $push1=, 0($1):p2align=0
+; CHECK-NEXT:    i64.store 0($0):p2align=0, $pop1
+; CHECK-NEXT:    return
+    call void @llvm.memcpy.p0.p0.i32(ptr align 1 %dst, ptr align 1 %src, i32 16, i1 false)
+    ret void
+}

Copy link

github-actions bot commented Jan 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@dschuff
Copy link
Member Author

dschuff commented Jan 31, 2024

This change as written should be straightforward,
but as pointed out in the bug there is actually also a case to be made for using 'nuw' unconditionally (i.e. assuming that
the pointers are always dereferenceable up to the size of the memcpy). The langref doesn't explicitly say that it's UB if the pointers are not dereferenceable, but that's my interpretation of the langref and C standard.

Comment on lines 7577 to 7579
isDereferenceable
? DAG.getObjectPtrOffset(dl, Src, TypeSize::getFixed(SrcOff))
: DAG.getMemBasePlusOffset(Src, TypeSize::getFixed(SrcOff), dl),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should move this to a parameter of getMemBasePlusOffset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's actually the only difference between the 2 functions (getObjectPtrOffset is just implemented in terms of getMemBasePlusOffset anway). So if you think it's a good idea I wouldn't mind just collapsing them as a separate refactoring (it would also fix the annoyance that the 2 functions have the same parameters but in a different order).

@dschuff
Copy link
Member Author

dschuff commented Feb 1, 2024

What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

@SingleAccretion
Copy link
Contributor

can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

A bit more evidence in favor of this - aggregate stores already use the optimal form (godbolt link).

@arsenm
Copy link
Contributor

arsenm commented Feb 5, 2024

What do you think about the other question though: can we just unconditionally assume that the pointers are dereferenceable and always use nuw?

The memset is dereferencing them, so yes I think this is implied

@dschuff dschuff changed the title [CodeGen] Generate mem intrinsic address calculations with nuw [CodeGen] Mark mem intrinsic loads and stores as dereferenceable Feb 6, 2024
@dschuff
Copy link
Member Author

dschuff commented Feb 6, 2024

I am getting one local test failure here, in /test/CodeGen/BPF/undef.ll:
The test has a bunch of stores into an alloca, which i think are supposed to get converted to a single memset, so the test calls for

; EL: r1 = 11033905661445 ll
; CHECK: *(u64 *)(r10 - 8) = r1

(where 11033905661445 is 0xA0908070605, i.e. the stored values). With this change the output for bpfel is

	r1 = 2569
	*(u16 *)(r10 - 4) = r1
	r1 = 134678021
	*(u32 *)(r10 - 8) = r1

i.e. the 0x0A09 has been split out from the 0x8070605. I have no idea yet why this change would do that.
Also, there is actually a memset on the next line, which seem to be zeroing the memory after the alloca'd pointer (which I think is UB?). Removing it doesn't seem to affect the output, but maybe something weird is going on.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally marking these loads/stores as dereferenceable does not seem justified to me, any more than it would for a regular load/store.

(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)

@arsenm
Copy link
Contributor

arsenm commented Feb 7, 2024

(Having said that, I don't understand the point of the MODereferenceable flag. In IR derefenceable metadata is applied to the thing that creates the pointer, so you get UB at that point if it is not dereferenceable. Applying it to the load/store that uses the pointer seems redundant, since they would always give UB anyway if the pointer is not dereferenceable.)

I thought the point was for code motion, which is kind of useless at the use point

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Oct 9, 2025

It would be nice to resurrect this change... is only the getObjectPtrOffset part enough for the address mode folding?

I am currently working around this in a frontend, and it's a bit painful, since you need to 'unroll' your memset/memcpy using ptrtoint + add nuw + inttoptr (otherwise your unrolling is folded back into intrinsics, which then get suboptimally expanded).

@arsenm arsenm requested a review from efriedma-quic October 11, 2025 00:59
@dschuff
Copy link
Member Author

dschuff commented Oct 11, 2025

@jayfoad / @arsenm The code motion interpretation is the only one that makes sense to me. The description in the header says that it doesn't trap, which would allow code motion of side-effecting instructions across it.

@arsenm there is one new change since I first uploaded, the one to llvm/test/CodeGen/AMDGPU/memcpy-scalar-load.ll. It looks to me like it might still be correct, but I'd appreciate if you could take a look.

@yonghong-song see my comment above about the BPF test. Does any of that ring any bells to you?

@yonghong-song
Copy link
Contributor

I am getting one local test failure here, in /test/CodeGen/BPF/undef.ll: The test has a bunch of stores into an alloca, which i think are supposed to get converted to a single memset, so the test calls for

; EL: r1 = 11033905661445 ll
; CHECK: *(u64 *)(r10 - 8) = r1

(where 11033905661445 is 0xA0908070605, i.e. the stored values). With this change the output for bpfel is

	r1 = 2569
	*(u16 *)(r10 - 4) = r1
	r1 = 134678021
	*(u32 *)(r10 - 8) = r1

i.e. the 0x0A09 has been split out from the 0x8070605. I have no idea yet why this change would do that. Also, there is actually a memset on the next line, which seem to be zeroing the memory after the alloca'd pointer (which I think is UB?). Removing it doesn't seem to affect the output, but maybe something weird is going on.

I did some investigation. 'Optimized legalized selection DAG' is responsible due to NEW code. For example, with this patch, at 'Optimized legalized selection DAG' stage:

Optimized legalized selection DAG: %bb.0 'ebpf_filter:'
SelectionDAG has 72 nodes:
  t0: ch,glue = EntryToken
          t70: i64 = add nuw FrameIndex:i64<0>, Constant:i64<34>
        t105: ch = store<(dereferenceable store (s16) into %ir.6 + 28), trunc to i16> t0, Constant:i64<0>, t70, undef:i64
        t113: ch = store<(store (s32) into %ir.key, align 8), trunc to i32> t0, Constant:i64<84281096>, FrameIndex:i64<0>, undef:i64
          t91: i64 = or disjoint FrameIndex:i64<0>, Constant:i64<4>
        t94: ch = store<(store (s16) into %ir.4, align 4), trunc to i16> t0, Constant:i64<2314>, t91, undef:i64
          t75: i64 = add nuw FrameIndex:i64<0>, Constant:i64<30>
        t127: ch = store<(dereferenceable store (s16) into %ir.6 + 24), trunc to i16> t0, Constant:i64<0>, t75, undef:i64
          t183: i64 = add nuw FrameIndex:i64<0>, Constant:i64<32>
        t130: ch = store<(dereferenceable store (s16) into %ir.6 + 26), trunc to i16> t0, Constant:i64<0>, t183, undef:i64
          t80: i64 = add nuw FrameIndex:i64<0>, Constant:i64<22>
        t152: ch = store<(dereferenceable store (s16) into %ir.6 + 16), trunc to i16> t0, Constant:i64<0>, t80, undef:i64
          t170: i64 = add nuw FrameIndex:i64<0>, Constant:i64<24>
        t154: ch = store<(dereferenceable store (s16) into %ir.6 + 18), trunc to i16> t0, Constant:i64<0>, t170, undef:i64
          t186: i64 = add nuw FrameIndex:i64<0>, Constant:i64<26>
        t148: ch = store<(dereferenceable store (s16) into %ir.6 + 20), trunc to i16> t0, Constant:i64<0>, t186, undef:i64
          t173: i64 = add nuw FrameIndex:i64<0>, Constant:i64<28>
        t150: ch = store<(dereferenceable store (s16) into %ir.6 + 22), trunc to i16> t0, Constant:i64<0>, t173, undef:i64
          t85: i64 = add nuw FrameIndex:i64<0>, Constant:i64<14>
        t160: ch = store<(dereferenceable store (s16) into %ir.6 + 8), trunc to i16> t0, Constant:i64<0>, t85, undef:i64
          t165: i64 = add nuw FrameIndex:i64<0>, Constant:i64<16>
        t162: ch = store<(dereferenceable store (s16) into %ir.6 + 10), trunc to i16> t0, Constant:i64<0>, t165, undef:i64
          t189: i64 = add nuw FrameIndex:i64<0>, Constant:i64<18>
        t156: ch = store<(dereferenceable store (s16) into %ir.6 + 12), trunc to i16> t0, Constant:i64<0>, t189, undef:i64
          t168: i64 = add nuw FrameIndex:i64<0>, Constant:i64<20>
        t158: ch = store<(dereferenceable store (s16) into %ir.6 + 14), trunc to i16> t0, Constant:i64<0>, t168, undef:i64
          t89: i64 = or disjoint FrameIndex:i64<0>, Constant:i64<6>
        t142: ch = store<(dereferenceable store (s16) into %ir.6), trunc to i16> t0, Constant:i64<0>, t89, undef:i64
          t175: i64 = add FrameIndex:i64<0>, Constant:i64<8>
        t144: ch = store<(dereferenceable store (s16) into %ir.6 + 2), trunc to i16> t0, Constant:i64<0>, t175, undef:i64
          t181: i64 = add FrameIndex:i64<0>, Constant:i64<10>
        t138: ch = store<(dereferenceable store (s16) into %ir.6 + 4), trunc to i16> t0, Constant:i64<0>, t181, undef:i64
          t178: i64 = add FrameIndex:i64<0>, Constant:i64<12>
        t140: ch = store<(dereferenceable store (s16) into %ir.6 + 6), trunc to i16> t0, Constant:i64<0>, t178, undef:i64
      t190: ch = TokenFactor t105, t113, t94, t127, t130, t152, t154, t148, t150, t160, t162, t156, t158, t142, t144, t138, t140
    t51: ch,glue = callseq_start t190, TargetConstant:i64<0>, TargetConstant:i64<0>
    t137: i64 = LDIMM64 TargetGlobalAddress:i64<ptr @routing> 0
  t53: ch,glue = CopyToReg t51, Register:i64 $r1, t137
  t55: ch,glue = CopyToReg t53, Register:i64 $r2, FrameIndex:i64<0>, t53:1
  t58: ch,glue = BPFISD::CALL t55, TargetGlobalAddress:i64<ptr @bpf_map_lookup_elem> 0, Register:i64 $r1, Register:i64 $r2, RegisterMask:Untyped, t55:1
  t59: ch,glue = callseq_end t58, TargetConstant:i64<0>, TargetConstant:i64<0>, t58:1
    t61: i64,ch,glue = CopyFromReg t59, Register:i64 $r0, t59:1
  t64: ch,glue = CopyToReg t61:1, Register:i64 $r0, undef:i64
  t65: ch = BPFISD::RET_GLUE t64, Register:i64 $r0, t64:1

Without this patch, at 'Optimized legalized selection DAG' stage:

Optimized legalized selection DAG: %bb.0 'ebpf_filter:'
SelectionDAG has 65 nodes:
  t0: ch,glue = EntryToken
          t70: i64 = add FrameIndex:i64<0>, Constant:i64<34>
        t105: ch = store<(store (s16) into %ir.6 + 28), trunc to i16> t0, Constant:i64<0>, t70, undef:i64
        t184: ch = store<(store (s64) into %ir.key)> t0, Constant:i64<361984551142686720>, FrameIndex:i64<0>, undef:i64
          t75: i64 = add FrameIndex:i64<0>, Constant:i64<30>
        t127: ch = store<(store (s16) into %ir.6 + 24), trunc to i16> t0, Constant:i64<0>, t75, undef:i64
          t188: i64 = add FrameIndex:i64<0>, Constant:i64<32>
        t130: ch = store<(store (s16) into %ir.6 + 26), trunc to i16> t0, Constant:i64<0>, t188, undef:i64
          t80: i64 = add FrameIndex:i64<0>, Constant:i64<22>
        t152: ch = store<(store (s16) into %ir.6 + 16), trunc to i16> t0, Constant:i64<0>, t80, undef:i64
          t173: i64 = add FrameIndex:i64<0>, Constant:i64<24>
        t154: ch = store<(store (s16) into %ir.6 + 18), trunc to i16> t0, Constant:i64<0>, t173, undef:i64
          t191: i64 = add FrameIndex:i64<0>, Constant:i64<26>
        t148: ch = store<(store (s16) into %ir.6 + 20), trunc to i16> t0, Constant:i64<0>, t191, undef:i64
          t176: i64 = add FrameIndex:i64<0>, Constant:i64<28>
        t150: ch = store<(store (s16) into %ir.6 + 22), trunc to i16> t0, Constant:i64<0>, t176, undef:i64
          t85: i64 = add FrameIndex:i64<0>, Constant:i64<14>
        t160: ch = store<(store (s16) into %ir.6 + 8), trunc to i16> t0, Constant:i64<0>, t85, undef:i64
          t168: i64 = add FrameIndex:i64<0>, Constant:i64<16>
        t162: ch = store<(store (s16) into %ir.6 + 10), trunc to i16> t0, Constant:i64<0>, t168, undef:i64
          t194: i64 = add FrameIndex:i64<0>, Constant:i64<18>
        t156: ch = store<(store (s16) into %ir.6 + 12), trunc to i16> t0, Constant:i64<0>, t194, undef:i64
          t171: i64 = add FrameIndex:i64<0>, Constant:i64<20>
        t158: ch = store<(store (s16) into %ir.6 + 14), trunc to i16> t0, Constant:i64<0>, t171, undef:i64
          t178: i64 = add FrameIndex:i64<0>, Constant:i64<8>
        t144: ch = store<(store (s16) into %ir.6 + 2), trunc to i16> t0, Constant:i64<0>, t178, undef:i64
          t186: i64 = add FrameIndex:i64<0>, Constant:i64<10>
        t138: ch = store<(store (s16) into %ir.6 + 4), trunc to i16> t0, Constant:i64<0>, t186, undef:i64
          t181: i64 = add FrameIndex:i64<0>, Constant:i64<12>
        t140: ch = store<(store (s16) into %ir.6 + 6), trunc to i16> t0, Constant:i64<0>, t181, undef:i64
      t195: ch = TokenFactor t105, t184, t127, t130, t152, t154, t148, t150, t160, t162, t156, t158, t144, t138, t140
    t51: ch,glue = callseq_start t195, TargetConstant:i64<0>, TargetConstant:i64<0>
    t137: i64 = LDIMM64 TargetGlobalAddress:i64<ptr @routing> 0
  t53: ch,glue = CopyToReg t51, Register:i64 $r1, t137
  t55: ch,glue = CopyToReg t53, Register:i64 $r2, FrameIndex:i64<0>, t53:1
  t58: ch,glue = BPFISD::CALL t55, TargetGlobalAddress:i64<ptr @bpf_map_lookup_elem> 0, Register:i64 $r1, Register:i64 $r2, RegisterMask:Untyped, t55:1
  t59: ch,glue = callseq_end t58, TargetConstant:i64<0>, TargetConstant:i64<0>, t58:1
    t61: i64,ch,glue = CopyFromReg t59, Register:i64 $r0, t59:1
  t64: ch,glue = CopyToReg t61:1, Register:i64 $r0, undef:i64
  t65: ch = BPFISD::RET_GLUE t64, Register:i64 $r0, t64:1

But the change is OK from bpf perspective. The bpf undef.ll test diff can have

-; EL: r1 = 11033905661445 ll
-; EB: r1 = 361984551142686720 ll
-; CHECK: *(u64 *)(r10 - 8) = r1
+; EL: r1 = 2569
+; EB: r1 = 2314
+; CHECK: *(u16 *)(r10 - 4) = r1
+; EL: r1 = 134678021
+; EB: r1 = 84281096
+; CHECK: *(u32 *)(r10 - 8) = r1

The 'memset' code is 'undefined' from linux kernel bpf verifier perspective. But from llvm compilation perspective, it is okay.

@yonghong-song
Copy link
Contributor

@yonghong-song see my comment above about the BPF test. Does any of that ring any bells to you?

I cannot judge your SelectionDAG change. From bpf selftest perspective, updating with new asm code is okay to me.

@nikic
Copy link
Contributor

nikic commented Oct 13, 2025

The use of MODereferenceable here is incorrect, as it implies unconditional dereferenceability. The use of getObjectPtrOffset looks fine to me.

@dschuff
Copy link
Member Author

dschuff commented Oct 13, 2025

The use of MODereferenceable here is incorrect, as it implies unconditional dereferenceability. The use of getObjectPtrOffset looks fine to me.

This is sort of what I'm a bit confused about. When they are generated from memcpy, the addresses in range are in fact unconditionally dereferenced. Why is it incorrect to mark them as dereferenceable?

@nikic
Copy link
Contributor

nikic commented Oct 13, 2025

The use of MODereferenceable here is incorrect, as it implies unconditional dereferenceability. The use of getObjectPtrOffset looks fine to me.

This is sort of what I'm a bit confused about. When they are generated from memcpy, the addresses in range are in fact unconditionally dereferenced. Why is it incorrect to mark them as dereferenceable?

If you have something like if (x) { memcpy(p) } then p is not (generally) known to be dereferenceable outside the if block, which is the claim this flag would be making.

@dschuff
Copy link
Member Author

dschuff commented Oct 13, 2025

Got it, thanks. I've backed this PR out to just use getObjectPtrOffset.

@dschuff dschuff changed the title [CodeGen] Mark mem intrinsic loads and stores as dereferenceable [CodeGen] Use getObjectPtrOffset to generate loads/stores for mem intrinsics Oct 13, 2025
@dschuff dschuff merged commit 3e22438 into llvm:main Oct 14, 2025
10 checks passed
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
…rinsics (llvm#80184)

This causes address arithmetic to be generated with the 'nuw' flag, 
allowing WebAssembly constant offset folding.

Fixes llvm#79692
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[WebAssembly] Suboptimal lowering of small memsets/memcpys

7 participants