Skip to content

Conversation

@Susikrishna
Copy link

@Susikrishna Susikrishna commented Nov 8, 2025

This patch adds a new InstCombine transformation that simplifies the pattern:

zext (sub (0, trunc X))
  → and (sub (0, X), (bitwidth - 1))

This canonicalization removes redundant trunc/zext pairs surrounding a negate
operation and replaces them with a masked negate of the original operand.
The transform helps expose rotate idioms in vector code, enabling targets such
as X86 (AVX2/AVX-512) to generate more efficient vpror/vpsllvq/vpsrlvq
instructions.

Fixes #165306
[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes

Implementation details

  • Added pattern matching logic to InstCombineCasts.cpp.
  • Added a dedicated test file rotate-trunc-zext.ll covering:
    • Scalar case (i64)
    • Vector cases (<2 x i64>, <4 x i64>, <8 x i64>)

@Susikrishna Susikrishna requested a review from nikic as a code owner November 8, 2025 06:47
@github-actions
Copy link

github-actions bot commented Nov 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 8, 2025

@llvm/pr-subscribers-llvm-transforms

Author: V S Susi Krishna (Susikrishna)

Changes

This patch adds a new InstCombine transformation that simplifies the pattern:

zext (sub (0, trunc X))
  → and (sub (0, X), (bitwidth - 1))

This canonicalization removes redundant trunc/zext pairs surrounding a negate
operation and replaces them with a masked negate of the original operand.
The transform helps expose rotate idioms in vector code, enabling targets such
as X86 (AVX2/AVX-512) to generate more efficient vpror/vpsllvq/vpsrlvq
instructions.

This change is motivated by issue #165306:
[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes

Implementation details

  • Added pattern matching logic to InstCombineCasts.cpp.
  • Added a dedicated test file rotate-trunc-zext.ll covering:
    • Scalar case (i64)
    • Vector cases (<2 x i64>, <4 x i64>, <8 x i64>)

Full diff: https://github.com/llvm/llvm-project/pull/167101.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (+16)
  • (added) llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll (+55)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
index 614c6ebd63be6..ebe1b747e6be4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
@@ -1366,6 +1366,22 @@ Instruction *InstCombinerImpl::visitZExt(ZExtInst &Zext) {
     }
   }
 
+  {
+    Value *TruncSrc = nullptr;
+    if (match(&Zext, m_ZExt(m_Sub(m_Zero(), m_Trunc(m_Value(TruncSrc)))))) {
+      IRBuilder<> Builder(&Zext);
+      Type *Ty = TruncSrc->getType();
+      unsigned BitWidth = Ty->getScalarSizeInBits();
+      unsigned MaskVal = BitWidth - 1;
+
+      Value *Zero = ConstantInt::get(Ty, 0);
+      Value *Neg = Builder.CreateSub(Zero, TruncSrc);
+      Value *Mask = ConstantInt::get(Ty, MaskVal);
+      Value *Masked = Builder.CreateAnd(Neg, Mask);
+      return replaceInstUsesWith(Zext, Masked);
+    }
+  }
+
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
new file mode 100644
index 0000000000000..31c7ba4a26796
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/rotate-trunc-zext.ll
@@ -0,0 +1,55 @@
+; RUN: opt -passes=instcombine -S %s | FileCheck %s
+
+; ================================================================
+; Test: Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), (bitwidth-1))
+; Purpose: Check that InstCombine detects and simplifies the pattern
+;          seen in rotate idioms, enabling backend rotate lowering.
+; ================================================================
+
+; === Scalar Case (i64) =========================================
+define i64 @neg_trunc_zext(i64 %a) {
+; CHECK-LABEL: @neg_trunc_zext(
+; CHECK-NEXT: %[[NEG:[0-9]+]] = sub i64 0, %a
+; CHECK-NEXT: %[[MASKED:[0-9A-Za-z_]+]] = and i64 %[[NEG]], 63
+; CHECK-NEXT: ret i64 %[[MASKED]]
+  %t = trunc i64 %a to i6
+  %n = sub i6 0, %t
+  %z = zext i6 %n to i64
+  ret i64 %z
+}
+
+; === Vector Case 1: <2 x i64> ==================================
+define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) {
+; CHECK-LABEL: @foo(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <2 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <2 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <2 x i64> %[[MASK]]
+  %t = trunc <2 x i64> %n to <2 x i6>
+  %neg = sub <2 x i6> zeroinitializer, %t
+  %z = zext <2 x i6> %neg to <2 x i64>
+  ret <2 x i64> %z
+}
+
+; === Vector Case 2: <4 x i64> ==================================
+define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) {
+; CHECK-LABEL: @bar(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <4 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <4 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <4 x i64> %[[MASK]]
+  %t = trunc <4 x i64> %n to <4 x i6>
+  %neg = sub <4 x i6> zeroinitializer, %t
+  %z = zext <4 x i6> %neg to <4 x i64>
+  ret <4 x i64> %z
+}
+
+; === Vector Case 3: <8 x i64> ==================================
+define <8 x i64> @baz(<8 x i64> %x, <8 x i64> %n) {
+; CHECK-LABEL: @baz(
+; CHECK: %[[NEG:[0-9A-Za-z_]+]] = sub <8 x i64> zeroinitializer, %n
+; CHECK: %[[MASK:[0-9A-Za-z_]+]] = and <8 x i64> %[[NEG]], splat (i64 63)
+; CHECK: ret <8 x i64> %[[MASK]]
+  %t = trunc <8 x i64> %n to <8 x i6>
+  %neg = sub <8 x i6> zeroinitializer, %t
+  %z = zext <8 x i6> %neg to <8 x i64>
+  ret <8 x i64> %z
+}

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this pattern to be handled in CanEvaluateZExtd.

@dtcxzyw dtcxzyw requested a review from RKSimon November 8, 2025 09:39
@Susikrishna
Copy link
Author

I'd expect this pattern to be handled in CanEvaluateZExtd.

Hi @dtcxzyw, I'm still learning my way around InstCombine. I tried to trace the logic, and it looks like canEvaluateZExtd is called from inside an if statement that first checks shouldChangeType(SrcTy, DestTy)
shouldChangeType.

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this, so the canEvaluateZExtd function is never actually called.

I'm guessing this is because the cost model doesn't recommend promoting from a small type like i6 all the way to i64?

Is that the correct approach for this kind of special pattern, or is there a better way to do this that I'm not seeing?

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 8, 2025

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 8, 2025

IIRC i6 was used just to stop alive2 from timing out

@Susikrishna
Copy link
Author

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

@dtcxzyw, Sorry, I'm not sure what you mean by datalayout. Could you clarify what you're looking for?

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 8, 2025

In my test case, the types are i6 (Source) and i64 (Destination). It seems shouldChangeType returns false for this

It is a bit strange to me. What is the datalayout you use?

@dtcxzyw, Sorry, I'm not sure what you mean by datalayout. Could you clarify what you're looking for?

I added the datalayout to the motivating case. It got folded by InstCombine without your patch:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i64 @neg_trunc_zext(i64 %a) {
  %t = trunc i64 %a to i6
  %n = sub i6 0, %t
  %z = zext i6 %n to i64
  ret i64 %z
}
define i64 @neg_trunc_zext(i64 %a) {
  %n = sub i64 0, %a
  %z = and i64 %n, 63
  ret i64 %z
}

So looks like it is not a real issue?

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 8, 2025

@dtcxzyw This is the original test case, I may have oversimplified the missing transform:

define <2 x i64> @src(<2 x i64> %0, <2 x i64> %1) {
Entry:
  %2 = trunc <2 x i64> %1 to <2 x i6>
  %3 = sub <2 x i6> zeroinitializer, %2
  %4 = zext <2 x i6> %3 to <2 x i64>
  %5 = shl <2 x i64> %0, %4
  %6 = and <2 x i64> %1, splat (i64 63)
  %7 = lshr <2 x i64> %0, %6
  %8 = or <2 x i64> %5, %7
  ret <2 x i64> %8
}

@Susikrishna
Copy link
Author

Susikrishna commented Nov 8, 2025

@dtcxzyw You were right about the scalar case, it worked when I kept the datalayout.
Like RKSimon mentioned, when I run the test (even with the datalayout), my output shows that:

@foo (which uses <2 x i64>)

@bar (which uses <4 x i64>)

@baz (which uses <8 x i64>) ...are all still not being simplified.

I think I found the reason for the difference. I was looking at the shouldChangeType function, and the Type* version has this check at the very top:
C++

bool InstCombinerImpl::shouldChangeType(Type *From, Type *To) const {
  if (!From->isIntegerTy() || !To->isIntegerTy())
    return false; 
...
}

This check returns false for all vector types, so the canEvaluateZExtd path is never reached for them (but it is reached for the scalar i6 type).

}

; === Vector Case 1: <2 x i64> ==================================
define <2 x i64> @foo(<2 x i64> %x, <2 x i64> %n) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) better naming - neg_trunc_zext_v2i64?

}

; === Vector Case 2: <4 x i64> ==================================
define <4 x i64> @bar(<4 x i64> %x, <4 x i64> %n) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these give us any additional coverage vs the v2i64 case?

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 10, 2025

@dtcxzyw You were right about the scalar case, it worked when I kept the datalayout. Like RKSimon mentioned, when I run the test (even with the datalayout), my output shows that:

@foo (which uses <2 x i64>)

@bar (which uses <4 x i64>)

@baz (which uses <8 x i64>) ...are all still not being simplified.

I think I found the reason for the difference. I was looking at the shouldChangeType function, and the Type* version has this check at the very top: C++

bool InstCombinerImpl::shouldChangeType(Type *From, Type *To) const {
  if (!From->isIntegerTy() || !To->isIntegerTy())
    return false; 
...
}

This check returns false for all vector types, so the canEvaluateZExtd path is never reached for them (but it is reached for the scalar i6 type).

InstCombine is conservative about changing vector types due to the lack of cost modeling. Perhaps we can do some simple fold in VectorCombine instead.

@RKSimon RKSimon changed the title [InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) (Fixes #165306) [InstCombine] Simplify zext(sub(0, trunc(x))) -> and(sub(0, x), mask) Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AVX-512] Look for vector bit rotates on vectors larger than 16 bytes

4 participants