Skip to content

Conversation

azwolski
Copy link
Contributor

@azwolski azwolski commented Oct 14, 2025

This PR fixes a bug in combineX86CloadCstore where an optimization was being applied too broadly, causing incorrect code generation.

Without any assumptions about X this transformation is only valid when Y is a non zero power of two/single-bit mask.

      // res, flags2 = sub 0, (and (xor X, -1), Y)
      // cload/cstore ..., cond_ne, flag2
      // ->
      // res, flags2 = sub 0, (and X, Y)
      // cload/cstore ..., cond_e, flag2

We can restrict the optimization to most important case, so only apply when llvm::isOneConstant(Op1.getOperand(1)). It might be not trivial to find code that creates a SelectionDag with other values of Y.

Basline test: #163354

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-backend-x86

Author: None (azwolski)

Changes

This PR fixes a bug in combineX86CloadCstore where an optimization was being applied too broadly, causing incorrect code generation.

Without any assumptions about X this transformation is only valid when Y is a non zero power of two/single-bit mask.

      // res, flags2 = sub 0, (and (xor X, -1), Y)
      // cload/cstore ..., cond_ne, flag2
      // ->
      // res, flags2 = sub 0, (and X, Y)
      // cload/cstore ..., cond_e, flag2

We can restrict the optimization to most important case, so only apply when llvm::isOneConstant(Op1.getOperand(1)). It might be not trivial to find code that creates a SelectionDag with other values of Y.

Basline test: #163354


Full diff: https://github.com/llvm/llvm-project/pull/163353.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+4-3)
  • (modified) llvm/test/CodeGen/X86/apx/cf.ll (+19)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index eea84a2841764..dabd898819e26 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -58332,11 +58332,12 @@ static SDValue combineX86CloadCstore(SDNode *N, SelectionDAG &DAG) {
   } else if (Op1.getOpcode() == ISD::AND && Sub.getValue(0).use_empty()) {
     SDValue Src = Op1;
     SDValue Op10 = Op1.getOperand(0);
-    if (Op10.getOpcode() == ISD::XOR && isAllOnesConstant(Op10.getOperand(1))) {
-      // res, flags2 = sub 0, (and (xor X, -1), Y)
+    if (Op10.getOpcode() == ISD::XOR && isAllOnesConstant(Op10.getOperand(1)) &&
+        llvm::isOneConstant(Op1.getOperand(1))) {
+      // res, flags2 = sub 0, (and (xor X, -1), 1)
       // cload/cstore ..., cond_ne, flag2
       // ->
-      // res, flags2 = sub 0, (and X, Y)
+      // res, flags2 = sub 0, (and X, 1)
       // cload/cstore ..., cond_e, flag2
       Src = DAG.getNode(ISD::AND, DL, Op1.getValueType(), Op10.getOperand(0),
                         Op1.getOperand(1));
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index b2651e91134ee..4c689d9d0a7a9 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -230,6 +230,25 @@ entry:
   ret void
 }
 
+define void @and_cond(i32 %a, i1 %b) {
+; CHECK-LABEL: and_cond:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    testl %edi, %edi
+; CHECK-NEXT:    setg %al
+; CHECK-NEXT:    notb %sil
+; CHECK-NEXT:    xorl %ecx, %ecx
+; CHECK-NEXT:    testb %al, %sil
+; CHECK-NEXT:    cfcmovnel %ecx, 0
+; CHECK-NEXT:    retq
+entry:
+  %0 = icmp sgt i32 %a, 0
+  %1 = xor i1 %b, true
+  %3 = and i1 %1, %0
+  %4 = insertelement <1 x i1> zeroinitializer, i1 %3, i64 0
+  call void @llvm.masked.store.v1i32.p0(<1 x i32> zeroinitializer, ptr null, i32 1, <1 x i1> %4)
+  ret void
+}
+
 define i64 @redundant_test(i64 %num, ptr %p1, i64 %in) {
 ; CHECK-LABEL: redundant_test:
 ; CHECK:       # %bb.0:

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

RKSimon pushed a commit that referenced this pull request Oct 15, 2025
…ug (#163354)

This PR adds a baseline test that exposes a bug in the current
`combineX86CloadCstore` optimization. The generated assembly
demonstrates incorrect behavior when the optimization is applied without
proper constraints.

Without any assumptions about `X` this transformation is only valid when
`Y` is a non zero power of two/single-bit mask.
```cpp
      // res, flags2 = sub 0, (and (xor X, -1), Y)
      // cload/cstore ..., cond_ne, flag2
      // ->
      // res, flags2 = sub 0, (and X, Y)
      // cload/cstore ..., cond_e, flag2
```

In the provided test case, the value in `%al` is unknown at compile
time. If `%al` contains `0`, the optimization cannot be applied, because
`(and (xor X, -1), 0)` is not equal to `(and X, 0)`.

Fix: #163353
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 15, 2025
…imization bug (#163354)

This PR adds a baseline test that exposes a bug in the current
`combineX86CloadCstore` optimization. The generated assembly
demonstrates incorrect behavior when the optimization is applied without
proper constraints.

Without any assumptions about `X` this transformation is only valid when
`Y` is a non zero power of two/single-bit mask.
```cpp
      // res, flags2 = sub 0, (and (xor X, -1), Y)
      // cload/cstore ..., cond_ne, flag2
      // ->
      // res, flags2 = sub 0, (and X, Y)
      // cload/cstore ..., cond_e, flag2
```

In the provided test case, the value in `%al` is unknown at compile
time. If `%al` contains `0`, the optimization cannot be applied, because
`(and (xor X, -1), 0)` is not equal to `(and X, 0)`.

Fix: llvm/llvm-project#163353
@e-kud e-kud reopened this Oct 15, 2025
@RKSimon RKSimon merged commit 316c076 into llvm:main Oct 15, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants