Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 25, 2025

As noted in the reproducer provided in #164762 (comment), on RISC-V after LTO we sometimes have trip counts exposed to vectorized loops. The loop vectorizer will have generated calls to @llvm.experimental.get.vector.length, but there are some properties about the intrinsic we can use to simplify it:

  • The result is always less than both Count and MaxLanes
  • If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows some single-iteration-after-LTO loops to be unfolded.

#169293 is related and also simplifies the intrinsic in InstCombine via computeKnownBits, but it can't fully remove the loop since computeKnownBits only does limited reasoning on recurrences.

@llvmbot
Copy link
Member

llvmbot commented Nov 25, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

As noted in the reproducer provided in #164762 (comment), on RISC-V after LTO we sometimes have trip counts exposed to vectorized loops. The loop vectorizer will have generated calls to @llvm.experimental.get.vector.length, but there are some properties about the intrinsic we can use to simplify it:

  • The result is always less than both Count and MaxLanes
  • If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows some single-iteration-after-LTO loops to be unfolded.

#169293 is related and also simplifies the intrinsic in InstCombine via computeKnownBits, but it can't fully remove the loop since computeKnownBits only does limited reasoning on recurrences.


Full diff: https://github.com/llvm/llvm-project/pull/169527.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SCCPSolver.cpp (+26)
  • (added) llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll (+136)
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 4947d03a2dc66..7a5cc289fa5c9 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -2098,6 +2098,32 @@ void SCCPInstVisitor::handleCallResult(CallBase &CB) {
       return (void)mergeInValue(ValueState[II], II,
                                 ValueLatticeElement::getRange(Result));
     }
+    if (II->getIntrinsicID() == Intrinsic::experimental_get_vector_length) {
+      unsigned BitWidth = CB.getType()->getScalarSizeInBits();
+      Value *CountArg = II->getArgOperand(0);
+      Value *VF = II->getArgOperand(1);
+      bool Scalable = cast<ConstantInt>(II->getArgOperand(2))->isOne();
+      ConstantRange Count = getValueState(CountArg)
+                                .asConstantRange(CountArg->getType(), false)
+                                .zextOrTrunc(BitWidth);
+      ConstantRange MaxLanes =
+          getValueState(VF).asConstantRange(BitWidth, false);
+      if (Scalable)
+        MaxLanes =
+            MaxLanes.multiply(getVScaleRange(II->getFunction(), BitWidth));
+
+      // The result is always less than both Count and MaxLanes.
+      ConstantRange Result(
+          APInt::getZero(BitWidth),
+          APIntOps::umin(Count.getUpper(), MaxLanes.getUpper()));
+
+      // If Count <= MaxLanes, getvectorlength(Count, MaxLanes) = Count
+      if (Count.icmp(CmpInst::ICMP_ULE, MaxLanes))
+        Result = Count;
+
+      return (void)mergeInValue(ValueState[II], II,
+                                ValueLatticeElement::getRange(Result));
+    }
 
     if (ConstantRange::isIntrinsicSupported(II->getIntrinsicID())) {
       // Compute result range for intrinsics supported by ConstantRange.
diff --git a/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll b/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll
new file mode 100644
index 0000000000000..3cb6154447631
--- /dev/null
+++ b/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll
@@ -0,0 +1,136 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -p sccp -S | FileCheck %s
+
+define i1 @result_le_count() {
+; CHECK-LABEL: define i1 @result_le_count() {
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 3, i32 4, i1 false)
+  %res = icmp ule i32 %x, 3
+  ret i1 %res
+}
+
+define i1 @result_le_max_lanes(i32 %count) {
+; CHECK-LABEL: define i1 @result_le_max_lanes(
+; CHECK-SAME: i32 [[COUNT:%.*]]) {
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[COUNT]], i32 3, i1 false)
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %count, i32 3, i1 false)
+  %res = icmp ule i32 %x, 3
+  ret i1 %res
+}
+
+define i1 @result_le_max_lanes_scalable(i32 %count) vscale_range(2, 4) {
+; CHECK-LABEL: define i1 @result_le_max_lanes_scalable(
+; CHECK-SAME: i32 [[COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[COUNT]], i32 4, i1 true)
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %count, i32 4, i1 true)
+  %res = icmp ule i32 %x, 16
+  ret i1 %res
+}
+
+define i32 @count_le_max_lanes() {
+; CHECK-LABEL: define i32 @count_le_max_lanes() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 4
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [4, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 false)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+; Can't simplify because %iv isn't <= max lanes.
+define i32 @count_not_le_max_lanes() {
+; CHECK-LABEL: define range(i32 0, 5) i32 @count_not_le_max_lanes() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 6, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[IV]], i32 4, i1 false)
+; CHECK-NEXT:    [[IV_NEXT]] = sub i32 [[IV]], [[X]]
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 0
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 [[X]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [6, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 false)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+define i32 @count_le_max_lanes_scalable_known() vscale_range(4, 8) {
+; CHECK-LABEL: define i32 @count_le_max_lanes_scalable_known(
+; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 16
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [16, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 true)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+; Can't simplify because %iv isn't guaranteed <= max lanes.
+define i32 @count_le_max_lanes_scalable_unknown() {
+; CHECK-LABEL: define range(i32 0, -1) i32 @count_le_max_lanes_scalable_unknown() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[IV]], i32 4, i1 true)
+; CHECK-NEXT:    [[IV_NEXT]] = sub i32 [[IV]], [[X]]
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 0
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 [[X]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [16, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 true)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}

@llvmbot
Copy link
Member

llvmbot commented Nov 25, 2025

@llvm/pr-subscribers-function-specialization

Author: Luke Lau (lukel97)

Changes

As noted in the reproducer provided in #164762 (comment), on RISC-V after LTO we sometimes have trip counts exposed to vectorized loops. The loop vectorizer will have generated calls to @llvm.experimental.get.vector.length, but there are some properties about the intrinsic we can use to simplify it:

  • The result is always less than both Count and MaxLanes
  • If Count <= MaxLanes, then the result is Count

This teaches SCCP to handle these cases with the intrinsic, which allows some single-iteration-after-LTO loops to be unfolded.

#169293 is related and also simplifies the intrinsic in InstCombine via computeKnownBits, but it can't fully remove the loop since computeKnownBits only does limited reasoning on recurrences.


Full diff: https://github.com/llvm/llvm-project/pull/169527.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SCCPSolver.cpp (+26)
  • (added) llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll (+136)
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 4947d03a2dc66..7a5cc289fa5c9 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -2098,6 +2098,32 @@ void SCCPInstVisitor::handleCallResult(CallBase &CB) {
       return (void)mergeInValue(ValueState[II], II,
                                 ValueLatticeElement::getRange(Result));
     }
+    if (II->getIntrinsicID() == Intrinsic::experimental_get_vector_length) {
+      unsigned BitWidth = CB.getType()->getScalarSizeInBits();
+      Value *CountArg = II->getArgOperand(0);
+      Value *VF = II->getArgOperand(1);
+      bool Scalable = cast<ConstantInt>(II->getArgOperand(2))->isOne();
+      ConstantRange Count = getValueState(CountArg)
+                                .asConstantRange(CountArg->getType(), false)
+                                .zextOrTrunc(BitWidth);
+      ConstantRange MaxLanes =
+          getValueState(VF).asConstantRange(BitWidth, false);
+      if (Scalable)
+        MaxLanes =
+            MaxLanes.multiply(getVScaleRange(II->getFunction(), BitWidth));
+
+      // The result is always less than both Count and MaxLanes.
+      ConstantRange Result(
+          APInt::getZero(BitWidth),
+          APIntOps::umin(Count.getUpper(), MaxLanes.getUpper()));
+
+      // If Count <= MaxLanes, getvectorlength(Count, MaxLanes) = Count
+      if (Count.icmp(CmpInst::ICMP_ULE, MaxLanes))
+        Result = Count;
+
+      return (void)mergeInValue(ValueState[II], II,
+                                ValueLatticeElement::getRange(Result));
+    }
 
     if (ConstantRange::isIntrinsicSupported(II->getIntrinsicID())) {
       // Compute result range for intrinsics supported by ConstantRange.
diff --git a/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll b/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll
new file mode 100644
index 0000000000000..3cb6154447631
--- /dev/null
+++ b/llvm/test/Transforms/SCCP/get_vector_length-intrinsic.ll
@@ -0,0 +1,136 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -p sccp -S | FileCheck %s
+
+define i1 @result_le_count() {
+; CHECK-LABEL: define i1 @result_le_count() {
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 3, i32 4, i1 false)
+  %res = icmp ule i32 %x, 3
+  ret i1 %res
+}
+
+define i1 @result_le_max_lanes(i32 %count) {
+; CHECK-LABEL: define i1 @result_le_max_lanes(
+; CHECK-SAME: i32 [[COUNT:%.*]]) {
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[COUNT]], i32 3, i1 false)
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %count, i32 3, i1 false)
+  %res = icmp ule i32 %x, 3
+  ret i1 %res
+}
+
+define i1 @result_le_max_lanes_scalable(i32 %count) vscale_range(2, 4) {
+; CHECK-LABEL: define i1 @result_le_max_lanes_scalable(
+; CHECK-SAME: i32 [[COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[COUNT]], i32 4, i1 true)
+; CHECK-NEXT:    ret i1 true
+;
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %count, i32 4, i1 true)
+  %res = icmp ule i32 %x, 16
+  ret i1 %res
+}
+
+define i32 @count_le_max_lanes() {
+; CHECK-LABEL: define i32 @count_le_max_lanes() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 4
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [4, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 false)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+; Can't simplify because %iv isn't <= max lanes.
+define i32 @count_not_le_max_lanes() {
+; CHECK-LABEL: define range(i32 0, 5) i32 @count_not_le_max_lanes() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 6, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[IV]], i32 4, i1 false)
+; CHECK-NEXT:    [[IV_NEXT]] = sub i32 [[IV]], [[X]]
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 0
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 [[X]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [6, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 false)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+define i32 @count_le_max_lanes_scalable_known() vscale_range(4, 8) {
+; CHECK-LABEL: define i32 @count_le_max_lanes_scalable_known(
+; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 16
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [16, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 true)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}
+
+; Can't simplify because %iv isn't guaranteed <= max lanes.
+define i32 @count_le_max_lanes_scalable_unknown() {
+; CHECK-LABEL: define range(i32 0, -1) i32 @count_le_max_lanes_scalable_unknown() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[X:%.*]] = call i32 @llvm.experimental.get.vector.length.i32(i32 [[IV]], i32 4, i1 true)
+; CHECK-NEXT:    [[IV_NEXT]] = sub i32 [[IV]], [[X]]
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 0
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 [[X]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [16, %entry], [%iv.next, %loop]
+  %x = call i32 @llvm.experimental.get.vector.length(i32 %iv, i32 4, i1 true)
+  %iv.next = sub i32 %iv, %x
+  %ec = icmp eq i32 %iv.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 %x
+}

Copy link
Contributor Author

@lukel97 lukel97 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't handle vscale/get.vector.length in ConstantRange::intrinsic because we need to access the Function. In a follow up we could plumb through the Function and move the logic for the two intrinsics in there?

bool Scalable = cast<ConstantInt>(II->getArgOperand(2))->isOne();
ConstantRange Count = getValueState(CountArg)
.asConstantRange(CountArg->getType(), false)
.zextOrTrunc(BitWidth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32 @llvm.experimental.get.vector.length.i64(i64 2**33, i32 4, i1 false) will be folded to zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed it to use the larger of the two types in 67801c5. I think that matches how SelectionDAGBuilder.cpp expands the intrinsic if the target doesn't natively support it.

if (Count.icmp(CmpInst::ICMP_ULE, MaxLanes))
Result = Count;

Result = Result.zextOrTrunc(II->getType()->getScalarSizeInBits());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this Trunc causes an issue in cases where:

  • Count is of a type > i32 (e.g. i64)
  • Vscale is such that Vscale * Count > 2^32
  • Count is > 2^32
define i32 @trunc() vscale_range(4, 4) {
  %x = call i32 @llvm.experimental.get.vector.length(i64 4294967296, i32 2147483647, i1 true)
  ret i32 %x
}

after running SCCP (built from 67801c5):

define i32 @trunc() #0 {
  ret i32 0
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case the result is poison from the definition in the LangRef:

If the result value does not fit in the result type, then the result is a poison value.

MaxLanes is 0x1FFFFFFFC and Count is 0x100000000, and because Count <= MaxLanes the result is Count. But 0x100000000 is larger than 32 bits so it returns poison, so I think the transform should be valid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxLanes * VScale should not overflow.

@lukel97 lukel97 requested a review from mshockwave November 27, 2025 04:00
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG


ConstantRange Count = getValueState(CountArg)
.asConstantRange(CountArg->getType(), false)
.zextOrTrunc(BitWidth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.zextOrTrunc(BitWidth);
.zext(BitWidth);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantRange::zeroExtend unfortunately asserts that the new BitWidth needs to be larger, not just equal to in size. So I used zextOrTrunc which returns the input when OldBitWidth == NewBitWidth.

Should we relax the assertion? It seems to be from the early days in 2004. Allowing equal bitwidths would make it more consistent with IRBuilder::CreateZExt

.zextOrTrunc(BitWidth);
ConstantRange MaxLanes = getValueState(VF)
.asConstantRange(VF->getType(), false)
.zextOrTrunc(BitWidth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.zextOrTrunc(BitWidth);
.zext(BitWidth);

if (Count.icmp(CmpInst::ICMP_ULE, MaxLanes))
Result = Count;

Result = Result.zextOrTrunc(II->getType()->getScalarSizeInBits());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Result = Result.zextOrTrunc(II->getType()->getScalarSizeInBits());
Result = Result.trunc(II->getType()->getScalarSizeInBits());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants