[InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) #167101

Susikrishna · 2025-11-08T06:47:56Z

This patch adds a new InstCombine transformation that simplifies the pattern:

zext (sub (0, trunc X))
  → and (sub (0, X), (bitwidth - 1))

This canonicalization removes redundant trunc/zext pairs surrounding a negate
operation and replaces them with a masked negate of the original operand.
The transform helps expose rotate idioms in vector code, enabling targets such
as X86 (AVX2/AVX-512) to generate more efficient vpror/vpsllvq/vpsrlvq
instructions.

Fixes #165306
[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes

Implementation details

Added pattern matching logic to InstCombineCasts.cpp.
Added a dedicated test file rotate-trunc-zext.ll covering:
- Scalar case (i64)
- Vector cases (<2 x i64>, <4 x i64>, <8 x i64>)

… (bitwidth-1))

github-actions · 2025-11-08T06:48:14Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-08T06:48:47Z

@llvm/pr-subscribers-llvm-transforms

Author: V S Susi Krishna (Susikrishna)

Changes

This patch adds a new InstCombine transformation that simplifies the pattern:

zext (sub (0, trunc X))
  → and (sub (0, X), (bitwidth - 1))

This canonicalization removes redundant trunc/zext pairs surrounding a negate
operation and replaces them with a masked negate of the original operand.
The transform helps expose rotate idioms in vector code, enabling targets such
as X86 (AVX2/AVX-512) to generate more efficient vpror/vpsllvq/vpsrlvq
instructions.

This change is motivated by issue #165306:
[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes

Implementation details

Added pattern matching logic to InstCombineCasts.cpp.
Added a dedicated test file rotate-trunc-zext.ll covering:
- Scalar case (i64)
- Vector cases (<2 x i64>, <4 x i64>, <8 x i64>)

Full diff: https://github.com/llvm/llvm-project/pull/167101.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (+16)
(added) llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll (+55)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
index 614c6ebd63be6..ebe1b747e6be4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
@@ -1366,6 +1366,22 @@ Instruction *InstCombinerImpl::visitZExt(ZExtInst &Zext) {
     }
   }
 
+  {
+    Value *TruncSrc = nullptr;
+    if (match(&Zext, m_ZExt(m_Sub(m_Zero(), m_Trunc(m_Value(TruncSrc)))))) {
+      IRBuilder<> Builder(&Zext);
+      Type *Ty = TruncSrc->getType();
+      unsigned BitWidth = Ty->getScalarSizeInBits();
+      unsigned MaskVal = BitWidth - 1;
+
+      Value *Zero = ConstantInt::get(Ty, 0);
+      Value *Neg = Builder.CreateSub(Zero, TruncSrc);
+      Value *Mask = ConstantInt::get(Ty, MaskVal);
+      Value *Masked = Builder.CreateAnd(Neg, Mask);
+      return replaceInstUsesWith(Zext, Masked);
+    }
+  }
+
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
new file mode 100644
index 0000000000000..31c7ba4a26796
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
@@ -0,0 +1,55 @@
+; RUN: opt -passes=instcombine -S %s | FileCheck %s
+
+; ================================================================
+; Test: Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), (bitwidth-1))
+; Purpose: Check that InstCombine detects and simplifies the pattern
+;          seen in rotate idioms, enabling backend rotate lowering.
+; ================================================================
+
+; === Scalar Case (i64) =========================================
+define i64 @neg_trunc_zext(i64 %a) {
+; CHECK-LABEL: @neg_trunc_zext(
+; CHECK-NEXT: %[[NEG:[0-9]+]] = sub i64 0, %a
+; CHECK-NEXT: %[[MASKED:[0-9A-Za-z_]+]] = and i64 %[[NEG]], 63
+; CHECK-NEXT: ret i64 %[[MASKED]]
+  %t = trunc i64 %a to i6
+  %n = sub i6 0, %t
+  %z = zext i6 %n to i64
+  ret i64 %z
+}
+
+; === Vector Case 1: <2 x i64> ==================================
+define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) {
+; CHECK-LABEL: @foo(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <2 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <2 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <2 x i64> %[[MASK]]
+  %t = trunc <2 x i64> %n to <2 x i6>
+  %neg = sub <2 x i6> zeroinitializer, %t
+  %z = zext <2 x i6> %neg to <2 x i64>
+  ret <2 x i64> %z
+}
+
+; === Vector Case 2: <4 x i64> ==================================
+define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) {
+; CHECK-LABEL: @bar(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <4 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <4 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <4 x i64> %[[MASK]]
+  %t = trunc <4 x i64> %n to <4 x i6>
+  %neg = sub <4 x i6> zeroinitializer, %t
+  %z = zext <4 x i6> %neg to <4 x i64>
+  ret <4 x i64> %z
+}
+
+; === Vector Case 3: <8 x i64> ==================================
+define <8 x i64> @baz(<8 x i64> %x, <8 x i64> %n) {
+; CHECK-LABEL: @baz(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <8 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <8 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <8 x i64> %[[MASK]]
+  %t = trunc <8 x i64> %n to <8 x i6>
+  %neg = sub <8 x i6> zeroinitializer, %t
+  %z = zext <8 x i6> %neg to <8 x i64>
+  ret <8 x i64> %z
+}

dtcxzyw

I'd expect this pattern to be handled in CanEvaluateZExtd.

Susikrishna · 2025-11-08T11:35:31Z

I'd expect this pattern to be handled in CanEvaluateZExtd.

Hi @dtcxzyw, I'm still learning my way around InstCombine. I tried to trace the logic, and it looks like canEvaluateZExtd is called from inside an if statement that first checks shouldChangeType(SrcTy, DestTy)
shouldChangeType.

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this, so the canEvaluateZExtd function is never actually called.

I'm guessing this is because the cost model doesn't recommend promoting from a small type like i6 all the way to i64?

Is that the correct approach for this kind of special pattern, or is there a better way to do this that I'm not seeing?

dtcxzyw · 2025-11-08T11:54:55Z

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

RKSimon · 2025-11-08T12:11:32Z

IIRC i6 was used just to stop alive2 from timing out

Susikrishna · 2025-11-08T12:36:38Z

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

@dtcxzyw, Sorry, I'm not sure what you mean by datalayout. Could you clarify what you're looking for?

dtcxzyw · 2025-11-08T12:41:10Z

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

@dtcxzyw, Sorry, I'm not sure what you mean by datalayout. Could you clarify what you're looking for?

I added the datalayout to the motivating case. It got folded by InstCombine without your patch:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i64 @neg_trunc_zext(i64 %a) {
  %t = trunc i64 %a to i6
  %n = sub i6 0, %t
  %z = zext i6 %n to i64
  ret i64 %z
}

define i64 @neg_trunc_zext(i64 %a) {
  %n = sub i64 0, %a
  %z = and i64 %n, 63
  ret i64 %z
}

So looks like it is not a real issue?

RKSimon · 2025-11-08T13:06:49Z

@dtcxzyw This is the original test case, I may have oversimplified the missing transform:

define <2 x i64> @src(<2 x i64> %0, <2 x i64> %1) {
Entry:
  %2 = trunc <2 x i64> %1 to <2 x i6>
  %3 = sub <2 x i6> zeroinitializer, %2
  %4 = zext <2 x i6> %3 to <2 x i64>
  %5 = shl <2 x i64> %0, %4
  %6 = and <2 x i64> %1, splat (i64 63)
  %7 = lshr <2 x i64> %0, %6
  %8 = or <2 x i64> %5, %7
  ret <2 x i64> %8
}

Susikrishna · 2025-11-08T13:15:56Z

@dtcxzyw You were right about the scalar case, it worked when I kept the datalayout.
Like RKSimon mentioned, when I run the test (even with the datalayout), my output shows that:

@foo (which uses <2 x i64>)

@bar (which uses <4 x i64>)

@baz (which uses <8 x i64>) ...are all still not being simplified.

I think I found the reason for the difference. I was looking at the shouldChangeType function, and the Type* version has this check at the very top:
C++

bool InstCombinerImpl::shouldChangeType(Type *From, Type *To) const {
  if (!From->isIntegerTy() || !To->isIntegerTy())
    return false; 
...
}

This check returns false for all vector types, so the canEvaluateZExtd path is never reached for them (but it is reached for the scalar i6 type).

RKSimon · 2025-11-10T09:22:20Z

llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll

+}
+
+; === Vector Case 1: <2 x i64> ==================================
+define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) {


(style) better naming - neg_trunc_zext_v2i64?

RKSimon · 2025-11-10T09:22:49Z

llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll

+}
+
+; === Vector Case 2: <4 x i64> ==================================
+define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) {


Not sure if these give us any additional coverage vs the v2i64 case?

dtcxzyw · 2025-11-10T13:36:21Z

@dtcxzyw You were right about the scalar case, it worked when I kept the datalayout. Like RKSimon mentioned, when I run the test (even with the datalayout), my output shows that:
@foo (which uses <2 x i64>)

@bar (which uses <4 x i64>)

@baz (which uses <8 x i64>) ...are all still not being simplified.
I think I found the reason for the difference. I was looking at the shouldChangeType function, and the Type* version has this check at the very top: C++
bool InstCombinerImpl::shouldChangeType(Type *From, Type *To) const {
  if (!From->isIntegerTy() || !To->isIntegerTy())
    return false; 
...
}
This check returns false for all vector types, so the canEvaluateZExtd path is never reached for them (but it is reached for the scalar i6 type).

InstCombine is conservative about changing vector types due to the lack of cost modeling. Perhaps we can do some simple fold in VectorCombine instead.

Susikrishna added 2 commits November 7, 2025 18:55

[LLVM][InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x),…

70dc7b7

… (bitwidth-1))

[NFC][InstCombine] Move zext(sub 0, trunc X) combine to end of function

27cf5b5

Susikrishna requested a review from nikic as a code owner November 8, 2025 06:47

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 8, 2025

dtcxzyw reviewed Nov 8, 2025

View reviewed changes

dtcxzyw requested a review from RKSimon November 8, 2025 09:39

RKSimon reviewed Nov 10, 2025

View reviewed changes

RKSimon changed the title ~~[InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) (Fixes #165306)~~ [InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) Nov 13, 2025

[InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) #167101

Are you sure you want to change the base?

[InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) #167101

Conversation

Susikrishna commented Nov 8, 2025 • edited by RKSimon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation details

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

llvmbot commented Nov 8, 2025

Implementation details

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Susikrishna commented Nov 8, 2025

Uh oh!

dtcxzyw commented Nov 8, 2025

Uh oh!

RKSimon commented Nov 8, 2025

Uh oh!

Susikrishna commented Nov 8, 2025

Uh oh!

dtcxzyw commented Nov 8, 2025

Uh oh!

RKSimon commented Nov 8, 2025

Uh oh!

Susikrishna commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Susikrishna commented Nov 8, 2025 •

edited by RKSimon

Loading

Susikrishna commented Nov 8, 2025 •

edited

Loading