Skip to content

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Sep 16, 2025

Following C and C++ standards, generate llvm.assume statements for array
subscript bounds to provide optimization hints.

For this code:

int arr[10];
int example(int i) {
  return arr[i];
}

clang now generates an assume(i < 10):

define i32 @example(i32 noundef %i) local_unnamed_addr #0 {
entry:
  %idxprom = zext nneg i32 %i to i64
  %bounds.constraint = icmp ult i32 %i, 10
  tail call void @llvm.assume(i1 %bounds.constraint)
  %arrayidx = getelementptr inbounds nuw i32, ptr @arr, i64 %idxprom
  %0 = load i32, ptr %arrayidx, align 4, !tbaa !2
  ret i32 %0
}

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Sep 16, 2025
@sebpop sebpop requested a review from nikic September 16, 2025 11:43
@llvmbot
Copy link
Member

llvmbot commented Sep 16, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Sebastian Pop (sebpop)

Changes

Following C and C++ standards, generate code for semantic undefined behavior for out-of-bounds array subscripts. Generate a select between 'undef' and a valid pointer for loads and stores.

For this code:

int arr[10];
int example(int i) {
  return arr[i];
}

We now generate:

define i32 @<!-- -->example(i32 noundef %i) local_unnamed_addr #<!-- -->0 {
entry:
  %out.of.bounds = icmp ugt i32 %i, 9
  %idxprom = zext nneg i32 %i to i64
  %arrayidx = getelementptr inbounds nuw i32, ptr @<!-- -->arr, i64 %idxprom
  %bounds.ptr = select i1 %out.of.bounds, ptr undef, ptr %arrayidx
  %0 = load i32, ptr %bounds.ptr, align 4, !tbaa !2
  ret i32 %0
}

Full diff: https://github.com/llvm/llvm-project/pull/159046.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+111-1)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4)
  • (added) clang/test/CodeGen/array-bounds-constraints.c (+56)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index e6e4947882544..d322805b43aac 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4559,6 +4559,92 @@ void CodeGenFunction::EmitCountedByBoundsChecking(
   }
 }
 
+/// Emit array bounds constraints using undef for out-of-bounds access. Return
+/// the bounds check condition that can be used to make the array access result
+/// 'undef' when out of bounds. Return nullptr when no checks are needed.
+///
+/// C Standard (ISO/IEC 9899:2011 - C11)
+/// Section J.2 (Undefined behavior): An array subscript is out of range, even
+/// if an object is apparently accessible with the given subscript (as in the
+/// lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
+///
+/// Section 6.5.6 (Additive operators): If both the pointer operand and the
+/// result point to elements of the same array object, or one past the last
+/// element of the array object, the evaluation shall not produce an overflow;
+/// otherwise, the behavior is undefined.
+///
+/// C++ Standard (ISO/IEC 14882 - 2017)
+/// Section 8.7 (Additive operators):
+/// 4 When an expression that has integral type is added to or subtracted from a
+///   pointer, the result has the type of the pointer operand. If the expression
+///   P points to element x[i] of an array object x with n elements,^86 the
+///   expressions P + J and J + P (where J has the value j) point to the
+///   (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the
+///   behavior is undefined. Likewise, the expression P - J points to the
+///   (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the
+///   behavior is undefined.
+/// ^86 A pointer past the last element of an array x of n elements is considered
+///     to be equivalent to a pointer to a hypothetical element x[n] for this
+///     purpose; see 6.9.2.
+llvm::Value *
+CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E) {
+  // Only emit array bound constraints if we have optimization enabled and no
+  // sanitizers (to avoid conflicts with bounds checking.)
+  if (CGM.getCodeGenOpts().OptimizationLevel == 0 ||
+      SanOpts.has(SanitizerKind::ArrayBounds))
+    return nullptr;
+
+  const Expr *Base = E->getBase();
+  const Expr *Idx = E->getIdx();
+  QualType BaseType = Base->getType();
+
+  if (const auto *ICE = dyn_cast<ImplicitCastExpr>(Base)) {
+    if (ICE->getCastKind() == CK_ArrayToPointerDecay) {
+      BaseType = ICE->getSubExpr()->getType();
+    }
+  }
+
+  // For now: only handle constant array types.
+  const ConstantArrayType *CAT = getContext().getAsConstantArrayType(BaseType);
+  if (!CAT)
+    return nullptr;
+
+  llvm::APInt ArraySize = CAT->getSize();
+  if (ArraySize == 0)
+    return nullptr;
+
+  QualType IdxType = Idx->getType();
+  llvm::Type *IndexType = ConvertType(IdxType);
+  llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
+
+  uint64_t ArraySizeValue = ArraySize.getLimitedValue();
+  llvm::Value *ArraySizeVal = llvm::ConstantInt::get(IndexType, ArraySizeValue);
+
+  llvm::Value *IndexVal = EmitScalarExpr(Idx);
+  if (!IndexVal)
+    return nullptr;
+
+  if (IndexVal->getType() != IndexType) {
+    bool IsSigned = IdxType->isSignedIntegerOrEnumerationType();
+    IndexVal = Builder.CreateIntCast(IndexVal, IndexType, IsSigned, "idx.cast");
+  }
+
+  llvm::Value *LowerBound, *UpperBound;
+  if (IdxType->isSignedIntegerOrEnumerationType()) {
+    // For signed indices add "index >= 0 && index < size".
+    LowerBound = Builder.CreateICmpSGE(IndexVal, Zero, "idx.ge.zero");
+    UpperBound = Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.lt.size");
+  } else {
+    // For unsigned "indices >= 0" is implicit: add "true && index < size".
+    LowerBound = Builder.getTrue();
+    UpperBound = Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.lt.size");
+  }
+
+  return Builder.CreateOr(Builder.CreateNot(LowerBound, "oob.lower"),
+                          Builder.CreateNot(UpperBound, "oob.upper"),
+                          "out.of.bounds");
+}
+
 LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
                                                bool Accessed) {
   // The index must always be an integer, which is not an aggregate.  Emit it
@@ -4588,6 +4674,9 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
   };
   IdxPre = nullptr;
 
+  // Get array bounds constraint condition for potential undef generation.
+  llvm::Value *OutOfBoundsCondition = EmitArrayBoundsConstraints(E);
+
   // If the base is a vector type, then we are forming a vector element lvalue
   // with this subscript.
   if (E->getBase()->getType()->isSubscriptableVectorType() &&
@@ -4755,7 +4844,28 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
     }
   }
 
-  LValue LV = MakeAddrLValue(Addr, E->getType(), EltBaseInfo, EltTBAAInfo);
+  LValue LV;
+
+  // If we have a bounds check condition, modify the address to be undef when
+  // out of bounds.
+  if (OutOfBoundsCondition) {
+    // Create an undef address of the same type.
+    llvm::Value *UndefPtr =
+        llvm::UndefValue::get(Addr.emitRawPointer(*this)->getType());
+    Address UndefAddr(UndefPtr, Addr.getElementType(), Addr.getAlignment());
+
+    // Use select to conditionally use 'undef' address when out of bounds. This
+    // makes both loads and stores from/to this location 'undef' when bounds are
+    // violated.
+    llvm::Value *FinalPtr = Builder.CreateSelect(
+        OutOfBoundsCondition, UndefAddr.emitRawPointer(*this),
+        Addr.emitRawPointer(*this), "bounds.ptr");
+
+    Address FinalAddr(FinalPtr, Addr.getElementType(), Addr.getAlignment());
+    LV = MakeAddrLValue(FinalAddr, E->getType(), EltBaseInfo, EltTBAAInfo);
+  } else {
+    LV = MakeAddrLValue(Addr, E->getType(), EltBaseInfo, EltTBAAInfo);
+  }
 
   if (getLangOpts().ObjC &&
       getLangOpts().getGC() != LangOptions::NonGC) {
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 727487b46054f..493d6a3534da1 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -3341,6 +3341,10 @@ class CodeGenFunction : public CodeGenTypeCache {
                            llvm::Value *Index, QualType IndexType,
                            QualType IndexedType, bool Accessed);
 
+  /// Emit array bounds constraints using 'undef' for out-of-bounds access.
+  /// Returns nullptr if no bounds checking should be performed.
+  llvm::Value *EmitArrayBoundsConstraints(const ArraySubscriptExpr *E);
+
   /// Returns debug info, with additional annotation if
   /// CGM.getCodeGenOpts().SanitizeAnnotateDebugInfo[Ordinal] is enabled for
   /// any of the ordinals.
diff --git a/clang/test/CodeGen/array-bounds-constraints.c b/clang/test/CodeGen/array-bounds-constraints.c
new file mode 100644
index 0000000000000..734fdcd368b86
--- /dev/null
+++ b/clang/test/CodeGen/array-bounds-constraints.c
@@ -0,0 +1,56 @@
+// Test that array bounds constraints generate undef for out-of-bounds access.
+// RUN: %clang_cc1 -emit-llvm -O2 %s -o - | FileCheck %s
+
+// Run with sanitizers to verify no undef generation.
+// RUN: %clang_cc1 -emit-llvm -O2 -fsanitize=array-bounds %s -o - | FileCheck %s -check-prefix=SANITIZER
+
+// CHECK-LABEL: define {{.*}} @test_simple_array
+int test_simple_array(int i) {
+  int arr[10];
+  // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+  // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+  // CHECK: load {{.*}}, ptr %[[FINAL_PTR]]
+  return arr[i];
+}
+
+// CHECK-LABEL: define {{.*}} @test_multidimensional_array
+int test_multidimensional_array(int i, int j) {
+  int arr[5][8];
+  // CHECK: icmp ugt i32 %j, 7
+  // CHECK: icmp ugt i32 %i, 4
+  // CHECK: select i1 {{.*}}, ptr undef, ptr {{.*}}
+  // CHECK: load {{.*}}, ptr {{.*}}
+  return arr[i][j];
+}
+
+// CHECK-LABEL: define {{.*}} @test_unsigned_index
+int test_unsigned_index(unsigned int i) {
+  int arr[10];
+  // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+  // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+  // CHECK: load {{.*}}, ptr %[[FINAL_PTR]]
+  return arr[i];
+}
+
+// CHECK-LABEL: define {{.*}} @test_store_undef
+void test_store_undef(int i, int value) {
+  int arr[10];
+  // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+  // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+  // CHECK: store {{.*}}, ptr %[[FINAL_PTR]]
+  arr[i] = value;
+}
+
+// SANITIZER-LABEL: define {{.*}} @test_pointer_array
+int test_pointer_array(int *ptr) {
+  // Should not generate undef for pointer access (no known bounds)
+  // SANITIZER-NOT: select {{.*}} undef
+  return ptr[5];
+}
+
+// SANITIZER-LABEL: define {{.*}} @test_variable_length_array
+int test_variable_length_array(int n, int i) {
+  int arr[n];
+  // SANITIZER-NOT: select {{.*}} undef
+  return arr[i];
+}

Copy link

github-actions bot commented Sep 16, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp,h,c -- clang/test/CodeGen/array-bounds-constraints-safety.c clang/test/CodeGen/array-bounds-constraints.c clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGExprScalar.cpp clang/lib/CodeGen/CodeGenFunction.h

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 9e1c733b4..d55ab8411 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4886,17 +4886,19 @@ void CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E,
     llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
     llvm::Value *LowerBound =
         Builder.CreateICmpSGE(IndexVal, Zero, "idx.ge.zero");
-    llvm::Value *UpperBound = Accessed
-        ? Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.slt.size")
-        : Builder.CreateICmpSLE(IndexVal, ArraySizeVal, "idx.sle.size");
+    llvm::Value *UpperBound =
+        Accessed
+            ? Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.slt.size")
+            : Builder.CreateICmpSLE(IndexVal, ArraySizeVal, "idx.sle.size");
     llvm::Value *BoundsConstraint =
         Builder.CreateAnd(LowerBound, UpperBound, "bounds.constraint");
     Builder.CreateAssumption(BoundsConstraint);
   } else {
     // For unsigned indices: index [<|<=] size. (>= 0 is implicit.)
-    llvm::Value *UpperBound = Accessed
-        ? Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.ult.size")
-        : Builder.CreateICmpULE(IndexVal, ArraySizeVal, "idx.ule.size");
+    llvm::Value *UpperBound =
+        Accessed
+            ? Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.ult.size")
+            : Builder.CreateICmpULE(IndexVal, ArraySizeVal, "idx.ule.size");
     Builder.CreateAssumption(UpperBound);
   }
 }

Copy link

github-actions bot commented Sep 16, 2025

✅ With the latest revision this PR passed the undef deprecator.

@sebpop
Copy link
Contributor Author

sebpop commented Sep 16, 2025

@sjoerdmeijer shared with me the fact that the added selects are melting away under optimization:
https://godbolt.org/z/MPdz4qYvW
We will post performance numbers on how this change impacts benchmarks.

@sebpop sebpop requested a review from sjoerdmeijer September 16, 2025 13:59
@efriedma-quic
Copy link
Collaborator

I suspect we want to do something with llvm.assume, or something like that, not use a select.

The thing I'm most worried about here is the compiler performance impact; we're doing extra work to preserve bounds from the source code, but in a lot of cases we can infer them in SCEV, and most of the cases where we can't infer them aren't actually performance-sensitive. But that's hard to judge without numbers.

@sebpop sebpop requested a review from Meinersbur September 18, 2025 09:17
@sebpop
Copy link
Contributor Author

sebpop commented Sep 18, 2025

I suspect we want to do something with llvm.assume, or something like that, not use a select.

No.
assume is used as "optimization hint".
undef is used as "semantic undefined load".
See https://llvm.org/docs/UndefinedBehavior.html#undef-values

Undef values are deprecated and should be used only when strictly necessary. Uses of undef values should be restricted to representing loads of uninitialized memory. This is the only part of the IR semantics that cannot be replaced with alternatives yet (work in ongoing).

ISEL discards undef and assume, which brings the code to the back-ends with no select, see llc pipeline in https://godbolt.org/z/MPdz4qYvW

@sebpop
Copy link
Contributor Author

sebpop commented Sep 18, 2025

we're doing extra work to preserve bounds from the source code,

LLVM IR expands slightly, yes.
Generated code should have no negative impact
(modulo bugs:

  • in ISEL not able to remove the select undefs,
  • in my implementation that may duplicate side-effects in the creation of subscript constraints: A[expr] -> 0 <= expr && expr < size - 1 if expr is a call to printf() or has side-effects. I need to check if expr gets duplicated, which is a bug.)

but in a lot of cases we can infer them in SCEV

SCEV cannot infer anything from the LLVM IR if the front-end says nothing about the out-of-bounds semantics.

@nikic
Copy link
Contributor

nikic commented Sep 18, 2025

I suspect we want to do something with llvm.assume, or something like that, not use a select.

No. assume is used as "optimization hint".

Not sure if that's what you mean, but violating an assume is immediate undefined behavior. It's not a "hint" in the heuristic sense, it encodes a precondition. Of course, the purpose of assumes is to improve optimizations by providing more information, so it is an optimization hint in that sense.

undef is used as "semantic undefined load". See https://llvm.org/docs/UndefinedBehavior.html#undef-values

Undef values are deprecated and should be used only when strictly necessary. Uses of undef values should be restricted to representing loads of uninitialized memory. This is the only part of the IR semantics that cannot be replaced with alternatives yet (work in ongoing).

This is referring to the result of loading uninitialized memory being undef. What you are doing here is loading from an undef pointer, which is UB. This does encode the precondition as well, but in a way that is likely to get lost in optimization (I have a patch somewhere that removes that select in favor of a freeze, which will break this pattern) and is not understood by LLVM in the way assumes are.

Overall, I'm not entirely clear on why using this select form is better than using assume.

@sjoerdmeijer
Copy link
Collaborator

Yeah, we need numbers.
I took the motivating example from the description, slightly modified it, and added a vectorisable loop:

int test_simple_array(int i, int n, int * __restrict A, int * __restrict B) {
  int arr[10];
  for (int i = 0; i< n; ++i)
      arr[i] += A[i] * B[i];
   return arr[i];
}

This gets vectorised, see: https://godbolt.org/z/GWb5h7hMb

With this patch locally applied, it is no longer vectorised, and I am getting the following with -Rpass-analysis=loop-vectorize:

t.c:13:11: remark: loop not vectorized: unsafe dependent memory operations in loop. Use #pragma clang loop  distribute(enable) to allow loop
  distribution to attempt to isolate the offending operations into a separate loop 
Unsafe indirect dependence. Memory location is the same as accessed at t.c:13:4 [-Rpass-analysis=loop-vectorize]

When I slightly modify the input example, and not let it accumulate, i.e. just have this:

arr[i] = A[i] * B[i];

I am getting:

t.c:13:11: remark: Recipe with invalid costs prevented vectorization at VF=(vscale x 1): store [-Rpass-analysis=loop-
t.c:12:3: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-analysis=loop-vectorize]

And this version gets also vectorised with unpatched clang.

I hope I didn't make a silly mistake with this quick little exercise, but it looks like this gets into the way of vectorisation and that this doesn't bode very well for perf numbers....

@sjoerdmeijer
Copy link
Collaborator

Maybe with the accumulation version it reads uninitialised values, but it looks like the point that this might get in the way still stands (with the other example).

@sebpop
Copy link
Contributor Author

sebpop commented Sep 19, 2025

violating an assume is immediate undefined behavior. It's not a "hint" in the heuristic sense, it encodes a precondition. Of course, the purpose of assumes is to improve optimizations by providing more information, so it is an optimization hint in that sense.

Thank you Nikita for clarifying that part.

I'm not entirely clear on why using this select form is better than using assume.

My first way to fix this was using an assume. I will follow your recommendation and I will amend the patch with an assume.

Thank you for your valuable advice.

@sebpop sebpop changed the title [clang] add array bounds constraints using undef for out-of-bounds access [clang] add array out-of-bounds access constraints using llvm.assume Sep 19, 2025
Following C and C++ standards, generate llvm.assume statements for array
subscript bounds to provide optimization hints.

For this code:
```
int arr[10];
int example(int i) {
  return arr[i];
}
```
clang now generates an `assume(i < 10)`:
```
define i32 @example(i32 noundef %i) local_unnamed_addr #0 {
entry:
  %idxprom = zext nneg i32 %i to i64
  %bounds.constraint = icmp ult i32 %i, 10
  tail call void @llvm.assume(i1 %bounds.constraint)
  %arrayidx = getelementptr inbounds nuw i32, ptr @arr, i64 %idxprom
  %0 = load i32, ptr %arrayidx, align 4, !tbaa !2
  ret i32 %0
}
```
@sjoerdmeijer
Copy link
Collaborator

I had a little play again with this patch, the updated one. The short summary is:

  • I am a little concerned how intrusive this is, i.e. I'm concerned about the impact on compile time and performance. For my little example, the number of IR instructions in the vector body is about twice bigger, but the final codegen for the vector body is the same though, which is a good thing and an improvement. But there are some codegen changes in the scalar loop. So my prediction is that it is not going to be compile-time friendly, and second, we might see all sorts of performance corner cases, but only numbers will tell I guess...
  • Maybe this is getting ahead of things (i.e. numbers), but maybe we can have a little think whether we can be more selective with emitting this intrinsics.

Here's the longer story, the code examples I played with.

Small extension of the example in the description:

int arr[10];
int test_simple_array(int i, int n, int * __restrict A, int * __restrict B) {
  for (int i = 0; i< n; ++i)
      arr[i] += A[i] * B[i];
   return arr[i];
}

The original vector body before is:

11:                                               ; preds = %11, %9
  %12 = phi i64 [ 0, %9 ], [ %29, %11 ]
  %13 = getelementptr inbounds nuw i32, ptr %2, i64 %12
  %14 = getelementptr inbounds nuw i8, ptr %13, i64 16
  %15 = load <4 x i32>, ptr %13, align 4, !tbaa !6
  %16 = load <4 x i32>, ptr %14, align 4, !tbaa !6
  %17 = getelementptr inbounds nuw i32, ptr %3, i64 %12
  %18 = getelementptr inbounds nuw i8, ptr %17, i64 16
  %19 = load <4 x i32>, ptr %17, align 4, !tbaa !6
  %20 = load <4 x i32>, ptr %18, align 4, !tbaa !6
  %21 = mul nsw <4 x i32> %19, %15
  %22 = mul nsw <4 x i32> %20, %16
  %23 = getelementptr inbounds nuw i32, ptr @arr, i64 %12
  %24 = getelementptr inbounds nuw i8, ptr %23, i64 16
  %25 = load <4 x i32>, ptr %23, align 4, !tbaa !6
  %26 = load <4 x i32>, ptr %24, align 4, !tbaa !6
  %27 = add nsw <4 x i32> %25, %21
  %28 = add nsw <4 x i32> %26, %22
  store <4 x i32> %27, ptr %23, align 4, !tbaa !6
  store <4 x i32> %28, ptr %24, align 4, !tbaa !6
  %29 = add nuw i64 %12, 8
  %30 = icmp eq i64 %29, %10
  br i1 %30, label %31, label %11, !llvm.loop !10

And after with this patch:

11:                                               ; preds = %11, %9
  %12 = phi i64 [ 0, %9 ], [ %41, %11 ]
  %13 = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %9 ], [ %42, %11 ]
  %14 = add <4 x i64> %13, splat (i64 4)
  %15 = getelementptr inbounds nuw i32, ptr %2, i64 %12
  %16 = getelementptr inbounds nuw i8, ptr %15, i64 16
  %17 = load <4 x i32>, ptr %15, align 4, !tbaa !6
  %18 = load <4 x i32>, ptr %16, align 4, !tbaa !6
  %19 = getelementptr inbounds nuw i32, ptr %3, i64 %12
  %20 = getelementptr inbounds nuw i8, ptr %19, i64 16
  %21 = load <4 x i32>, ptr %19, align 4, !tbaa !6
  %22 = load <4 x i32>, ptr %20, align 4, !tbaa !6
  %23 = mul nsw <4 x i32> %21, %17
  %24 = mul nsw <4 x i32> %22, %18
  %25 = icmp ult <4 x i64> %13, splat (i64 10)
  %26 = icmp ult <4 x i64> %14, splat (i64 10)
  %27 = extractelement <4 x i1> %25, i64 0
  tail call void @llvm.assume(i1 %27)
  %28 = extractelement <4 x i1> %25, i64 1
  tail call void @llvm.assume(i1 %28)
  %29 = extractelement <4 x i1> %25, i64 2
  tail call void @llvm.assume(i1 %29)
  %30 = extractelement <4 x i1> %25, i64 3
  tail call void @llvm.assume(i1 %30)
  %31 = extractelement <4 x i1> %26, i64 0
  tail call void @llvm.assume(i1 %31)
  %32 = extractelement <4 x i1> %26, i64 1
  tail call void @llvm.assume(i1 %32)
  %33 = extractelement <4 x i1> %26, i64 2
  tail call void @llvm.assume(i1 %33)
  %34 = extractelement <4 x i1> %26, i64 3
  tail call void @llvm.assume(i1 %34)
  %35 = getelementptr inbounds nuw i32, ptr @arr, i64 %12
  %36 = getelementptr inbounds nuw i8, ptr %35, i64 16
  %37 = load <4 x i32>, ptr %35, align 4, !tbaa !6
  %38 = load <4 x i32>, ptr %36, align 4, !tbaa !6
  %39 = add nsw <4 x i32> %37, %23
  %40 = add nsw <4 x i32> %38, %24
  store <4 x i32> %39, ptr %35, align 4, !tbaa !6
  store <4 x i32> %40, ptr %36, align 4, !tbaa !6
  %41 = add nuw i64 %12, 8
  %42 = add <4 x i64> %13, splat (i64 8)
  %43 = icmp eq i64 %41, %10
  br i1 %43, label %44, label %11, !llvm.loop !10

As I mentioned, the good thing is that this gets optimised away, and the final codegen is the same, but it is quite an expansion.

The scalar loop before is:

.LBB0_7:                                // =>This Inner Loop Header: Depth=1
        ldr     w10, [x12], #4
        ldr     w15, [x13]
        ldr     w14, [x11], #4
        subs    x9, x9, #1
        madd    w10, w14, w10, w15
        str     w10, [x13], #4
        b.ne    .LBB0_7

And after with this patch is:

.LBB0_6:                                // =>This Inner Loop Header: Depth=1
        ldr     w11, [x2, x10, lsl #2]
        ldr     w12, [x3, x10, lsl #2]
        ldr     w13, [x8, x10, lsl #2]
        madd    w11, w12, w11, w13
        str     w11, [x8, x10, lsl #2]
        add     x10, x10, #1
        cmp     x9, x10
        b.ne    .LBB0_6

It might perform the same, but the only thing I'm saying is that it is different and the new version is one instruction longer because the loop is no longer counting down but counting up.

@sebpop
Copy link
Contributor Author

sebpop commented Oct 1, 2025

the loop is no longer counting down but counting up.

Looks like cost model in loop-strength-reduction LSR has more code dependent on the main IV in the assumes. Then the assumes melt away in inst-combine and we're left with the longer loop body counting up.

@sebpop
Copy link
Contributor Author

sebpop commented Oct 1, 2025

Maybe we should teach LSR cost model to ignore all computations leading into an assume.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the general problem with assumes is that, while they help some optimizations by providing additional information, they can regress others due to the added instructions/uses. Hard to say in advance whether a specific assume use it more beneficial than detrimental.

Possibly LoopVectorize should be dropping assumes altogether -- generating assume of extractelement doesn't look particularly useful to me. Adding another late run of the DropUnnecessaryAssumes pass I recently added would probably also avoid that particular issue.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been past discussions about whether it is legal to explot this in C/C++. See e.g.

In short, not everybody agreed that this is allowed in every version of C or C++. At least what I don't see in this PR that it is legal to build a pointer for one-past-the-last element of an array. So for float A[10], A+10 is valid, but must not be indirected. The assume is only emitted for the subscript operator which technically is syntactic sugar that includes the indirection. However, in practice programmers will use &A[10] to create apointer to the one-past-the end, e.g.:

float A[10];
n = 10;
...
for (float *p = &A[0]; p < &A[n]; ++p) { ... }
if (n != 10) abort();

We should be very careful to not miscompile this because we added an assume(n < 10).

There are also codes around that assume a flattened layout of multi-dimensional arrays. For instance:

float A[10][10];
(&A[0][0])[99]; // or more bluntly: `A[0][99]`

since technically, &A[0][0] is a pointer to the first subobject of the outer array which is an array of 10 elements.

I would be careful exploiting this kind of information, possibly protect is with a compiler switch in the tradition of -fstrict-aliasing.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments other than what @nikic had, everything else looks reasonable to me.

@cor3ntin
Copy link
Contributor

cor3ntin commented Oct 3, 2025

What are the implications here in terms of safety?
I wouldn't be surprised if some code out there does purposeful access to memory past the array, even though it's UB.

As @nikic said, assume is not a constraint, it's a way to make scenarios which are UB have potentially worse effects.

In recent times, I think the trend is to check the bounds rather than assume them.

How does your change interact with sanitizers?

Sanitizer interaction: assume generation is disabled when -fsanitize=array-bounds is active.

Flexible array detection: skip size-1 arrays as last struct field.
@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Oct 4, 2025
@sebpop
Copy link
Contributor Author

sebpop commented Oct 4, 2025

I would be carful exploiting this kind of information, possibly protect is with a compiler switch in the tradition of -fstrict-aliasing.

702d9dd adds a flag -fassume-array-bounds disabled by default for now.

How does your change interact with sanitizers?

702d9dd disables assume generation when sanitizer array-bounds is on.

I wouldn't be surprised if some code out there does purposeful access to memory past the array, even though it's UB.

702d9dd detects struct with last field a flexible size array.

Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to start off with this under a flag -fassume-array-bounds that is off by default, and it looks like an interesting option on itself that people may want to play with, but for our use-case we would like to have this on by default at some point. So, my high-level question is: what do we think the chance is to get this enabled by default?

// CHECK-NOT: call void @llvm.assume
// Taking address of one-past-the-end is allowed by C standard.
// We should NOT assume anything about this access.
return &extern_array[100]; // Legal: one past the end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried

extern int extern_array[100];
int *test_extern_array_val(int i) {
  return &extern_array[i];
}

with this PR an it generates

  %bounds.constraint = icmp ult i32 %i, 100
  tail call void @llvm.assume(i1 %bounds.constraint)

if &extern_array[100] is legal, so must test_extern_array_val(100).

Did you consider C++ references?

int &test_extern_array_val(int i) {
  return extern_array[i];
}

I think a reference must always point to valid memory, so here one can apply the stricter i < 100.

// Zero-length arrays: "struct { int len; char data[0]; }" (GCC extension
// https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html)
// Both patterns use arrays as placeholders for variable-length data.
if (CAT && (ArraySize == 0 || ArraySize == 1)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should respect StrictFlexArraysLevel, probably.

@AaronBallman
Copy link
Collaborator

I would be carful exploiting this kind of information, possibly protect is with a compiler switch in the tradition of -fstrict-aliasing.

+1 to this.

It probably makes sense to start off with this under a flag -fassume-array-bounds that is off by default, and it looks like an interesting option on itself that people may want to play with, but for our use-case we would like to have this on by default at some point. So, my high-level question is: what do we think the chance is to get this enabled by default?

I think the answer to that requires more data. How much of a performance benefit do these assumptions get the user under a wide varieties of workloads? When trying the changes on a wide corpus of test cases, how many silent behavioral changes do the assumptions cause? Are tools still able to help folks catch the bugs with their code or do we need to invest in tooling more before we should enable this option by default?

My misgivings here boil down to the usual tension between eeking out as much performance as possible and avoiding catastrophic security concerns. Personally, I don't know that I'd want to see this option on by default unless the benefits were quite significant; I think the ecosystem is strongly leaning towards making things more secure by default. I generally think something like -fbounds-safety makes more sense as a default and require people to opt out of it if they want to eek more performance out.

@sebpop
Copy link
Contributor Author

sebpop commented Oct 9, 2025

I am closing this PR because this is not tractable compared to another solution that we already have: #156342

This current PR #159046 attaches the same information again and again to every load, store, and array address. This is harmful for the size of the IR and for a lot of optimization cost functions that are not trained to deal with the assumes.

In contrast, PR #156342 only attaches the info once at the level of the declaration (currently alloca and globals declared types, and in the future it can be transitioned to meta-data attached to these decls.) LLVM IR analyses (delinearization, SCEV, DA, etc.) then extract and instantiate the array sizes in the context of the uses (loads and stores.)

@sebpop sebpop closed this Oct 9, 2025
@efriedma-quic
Copy link
Collaborator

I think this patch could still be useful as a starting point for experimentation, but if you don't want to pursue it right now, that's fine.

@sebpop
Copy link
Contributor Author

sebpop commented Oct 9, 2025

I think we need to generate only one assume per declaration instead of littering the IR with assumes for every use and rely on optimizers to fold all the assumes to a single one.

The current way is the declarations of arrays also contain the dimensions sizes. If this type gets removed from alloca/globals (from what I understood in the next ~2 years) a possible way to provide the array dimensions is to maintain the info in a symbol table (as meta-data or so.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants