-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[clang] add array out-of-bounds access constraints using llvm.assume #159046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Sebastian Pop (sebpop) ChangesFollowing C and C++ standards, generate code for semantic undefined behavior for out-of-bounds array subscripts. Generate a select between 'undef' and a valid pointer for loads and stores. For this code:
We now generate:
Full diff: https://github.com/llvm/llvm-project/pull/159046.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index e6e4947882544..d322805b43aac 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4559,6 +4559,92 @@ void CodeGenFunction::EmitCountedByBoundsChecking(
}
}
+/// Emit array bounds constraints using undef for out-of-bounds access. Return
+/// the bounds check condition that can be used to make the array access result
+/// 'undef' when out of bounds. Return nullptr when no checks are needed.
+///
+/// C Standard (ISO/IEC 9899:2011 - C11)
+/// Section J.2 (Undefined behavior): An array subscript is out of range, even
+/// if an object is apparently accessible with the given subscript (as in the
+/// lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).
+///
+/// Section 6.5.6 (Additive operators): If both the pointer operand and the
+/// result point to elements of the same array object, or one past the last
+/// element of the array object, the evaluation shall not produce an overflow;
+/// otherwise, the behavior is undefined.
+///
+/// C++ Standard (ISO/IEC 14882 - 2017)
+/// Section 8.7 (Additive operators):
+/// 4 When an expression that has integral type is added to or subtracted from a
+/// pointer, the result has the type of the pointer operand. If the expression
+/// P points to element x[i] of an array object x with n elements,^86 the
+/// expressions P + J and J + P (where J has the value j) point to the
+/// (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the
+/// behavior is undefined. Likewise, the expression P - J points to the
+/// (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the
+/// behavior is undefined.
+/// ^86 A pointer past the last element of an array x of n elements is considered
+/// to be equivalent to a pointer to a hypothetical element x[n] for this
+/// purpose; see 6.9.2.
+llvm::Value *
+CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E) {
+ // Only emit array bound constraints if we have optimization enabled and no
+ // sanitizers (to avoid conflicts with bounds checking.)
+ if (CGM.getCodeGenOpts().OptimizationLevel == 0 ||
+ SanOpts.has(SanitizerKind::ArrayBounds))
+ return nullptr;
+
+ const Expr *Base = E->getBase();
+ const Expr *Idx = E->getIdx();
+ QualType BaseType = Base->getType();
+
+ if (const auto *ICE = dyn_cast<ImplicitCastExpr>(Base)) {
+ if (ICE->getCastKind() == CK_ArrayToPointerDecay) {
+ BaseType = ICE->getSubExpr()->getType();
+ }
+ }
+
+ // For now: only handle constant array types.
+ const ConstantArrayType *CAT = getContext().getAsConstantArrayType(BaseType);
+ if (!CAT)
+ return nullptr;
+
+ llvm::APInt ArraySize = CAT->getSize();
+ if (ArraySize == 0)
+ return nullptr;
+
+ QualType IdxType = Idx->getType();
+ llvm::Type *IndexType = ConvertType(IdxType);
+ llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
+
+ uint64_t ArraySizeValue = ArraySize.getLimitedValue();
+ llvm::Value *ArraySizeVal = llvm::ConstantInt::get(IndexType, ArraySizeValue);
+
+ llvm::Value *IndexVal = EmitScalarExpr(Idx);
+ if (!IndexVal)
+ return nullptr;
+
+ if (IndexVal->getType() != IndexType) {
+ bool IsSigned = IdxType->isSignedIntegerOrEnumerationType();
+ IndexVal = Builder.CreateIntCast(IndexVal, IndexType, IsSigned, "idx.cast");
+ }
+
+ llvm::Value *LowerBound, *UpperBound;
+ if (IdxType->isSignedIntegerOrEnumerationType()) {
+ // For signed indices add "index >= 0 && index < size".
+ LowerBound = Builder.CreateICmpSGE(IndexVal, Zero, "idx.ge.zero");
+ UpperBound = Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.lt.size");
+ } else {
+ // For unsigned "indices >= 0" is implicit: add "true && index < size".
+ LowerBound = Builder.getTrue();
+ UpperBound = Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.lt.size");
+ }
+
+ return Builder.CreateOr(Builder.CreateNot(LowerBound, "oob.lower"),
+ Builder.CreateNot(UpperBound, "oob.upper"),
+ "out.of.bounds");
+}
+
LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
bool Accessed) {
// The index must always be an integer, which is not an aggregate. Emit it
@@ -4588,6 +4674,9 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
};
IdxPre = nullptr;
+ // Get array bounds constraint condition for potential undef generation.
+ llvm::Value *OutOfBoundsCondition = EmitArrayBoundsConstraints(E);
+
// If the base is a vector type, then we are forming a vector element lvalue
// with this subscript.
if (E->getBase()->getType()->isSubscriptableVectorType() &&
@@ -4755,7 +4844,28 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E,
}
}
- LValue LV = MakeAddrLValue(Addr, E->getType(), EltBaseInfo, EltTBAAInfo);
+ LValue LV;
+
+ // If we have a bounds check condition, modify the address to be undef when
+ // out of bounds.
+ if (OutOfBoundsCondition) {
+ // Create an undef address of the same type.
+ llvm::Value *UndefPtr =
+ llvm::UndefValue::get(Addr.emitRawPointer(*this)->getType());
+ Address UndefAddr(UndefPtr, Addr.getElementType(), Addr.getAlignment());
+
+ // Use select to conditionally use 'undef' address when out of bounds. This
+ // makes both loads and stores from/to this location 'undef' when bounds are
+ // violated.
+ llvm::Value *FinalPtr = Builder.CreateSelect(
+ OutOfBoundsCondition, UndefAddr.emitRawPointer(*this),
+ Addr.emitRawPointer(*this), "bounds.ptr");
+
+ Address FinalAddr(FinalPtr, Addr.getElementType(), Addr.getAlignment());
+ LV = MakeAddrLValue(FinalAddr, E->getType(), EltBaseInfo, EltTBAAInfo);
+ } else {
+ LV = MakeAddrLValue(Addr, E->getType(), EltBaseInfo, EltTBAAInfo);
+ }
if (getLangOpts().ObjC &&
getLangOpts().getGC() != LangOptions::NonGC) {
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 727487b46054f..493d6a3534da1 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -3341,6 +3341,10 @@ class CodeGenFunction : public CodeGenTypeCache {
llvm::Value *Index, QualType IndexType,
QualType IndexedType, bool Accessed);
+ /// Emit array bounds constraints using 'undef' for out-of-bounds access.
+ /// Returns nullptr if no bounds checking should be performed.
+ llvm::Value *EmitArrayBoundsConstraints(const ArraySubscriptExpr *E);
+
/// Returns debug info, with additional annotation if
/// CGM.getCodeGenOpts().SanitizeAnnotateDebugInfo[Ordinal] is enabled for
/// any of the ordinals.
diff --git a/clang/test/CodeGen/array-bounds-constraints.c b/clang/test/CodeGen/array-bounds-constraints.c
new file mode 100644
index 0000000000000..734fdcd368b86
--- /dev/null
+++ b/clang/test/CodeGen/array-bounds-constraints.c
@@ -0,0 +1,56 @@
+// Test that array bounds constraints generate undef for out-of-bounds access.
+// RUN: %clang_cc1 -emit-llvm -O2 %s -o - | FileCheck %s
+
+// Run with sanitizers to verify no undef generation.
+// RUN: %clang_cc1 -emit-llvm -O2 -fsanitize=array-bounds %s -o - | FileCheck %s -check-prefix=SANITIZER
+
+// CHECK-LABEL: define {{.*}} @test_simple_array
+int test_simple_array(int i) {
+ int arr[10];
+ // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+ // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+ // CHECK: load {{.*}}, ptr %[[FINAL_PTR]]
+ return arr[i];
+}
+
+// CHECK-LABEL: define {{.*}} @test_multidimensional_array
+int test_multidimensional_array(int i, int j) {
+ int arr[5][8];
+ // CHECK: icmp ugt i32 %j, 7
+ // CHECK: icmp ugt i32 %i, 4
+ // CHECK: select i1 {{.*}}, ptr undef, ptr {{.*}}
+ // CHECK: load {{.*}}, ptr {{.*}}
+ return arr[i][j];
+}
+
+// CHECK-LABEL: define {{.*}} @test_unsigned_index
+int test_unsigned_index(unsigned int i) {
+ int arr[10];
+ // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+ // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+ // CHECK: load {{.*}}, ptr %[[FINAL_PTR]]
+ return arr[i];
+}
+
+// CHECK-LABEL: define {{.*}} @test_store_undef
+void test_store_undef(int i, int value) {
+ int arr[10];
+ // CHECK: %[[OOB:.*]] = icmp ugt i32 %i, 9
+ // CHECK: %[[FINAL_PTR:.*]] = select i1 %[[OOB]], ptr undef, ptr {{.*}}
+ // CHECK: store {{.*}}, ptr %[[FINAL_PTR]]
+ arr[i] = value;
+}
+
+// SANITIZER-LABEL: define {{.*}} @test_pointer_array
+int test_pointer_array(int *ptr) {
+ // Should not generate undef for pointer access (no known bounds)
+ // SANITIZER-NOT: select {{.*}} undef
+ return ptr[5];
+}
+
+// SANITIZER-LABEL: define {{.*}} @test_variable_length_array
+int test_variable_length_array(int n, int i) {
+ int arr[n];
+ // SANITIZER-NOT: select {{.*}} undef
+ return arr[i];
+}
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp,h,c -- clang/test/CodeGen/array-bounds-constraints-safety.c clang/test/CodeGen/array-bounds-constraints.c clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGExprScalar.cpp clang/lib/CodeGen/CodeGenFunction.h
View the diff from clang-format here.diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 9e1c733b4..d55ab8411 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -4886,17 +4886,19 @@ void CodeGenFunction::EmitArrayBoundsConstraints(const ArraySubscriptExpr *E,
llvm::Value *Zero = llvm::ConstantInt::get(IndexType, 0);
llvm::Value *LowerBound =
Builder.CreateICmpSGE(IndexVal, Zero, "idx.ge.zero");
- llvm::Value *UpperBound = Accessed
- ? Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.slt.size")
- : Builder.CreateICmpSLE(IndexVal, ArraySizeVal, "idx.sle.size");
+ llvm::Value *UpperBound =
+ Accessed
+ ? Builder.CreateICmpSLT(IndexVal, ArraySizeVal, "idx.slt.size")
+ : Builder.CreateICmpSLE(IndexVal, ArraySizeVal, "idx.sle.size");
llvm::Value *BoundsConstraint =
Builder.CreateAnd(LowerBound, UpperBound, "bounds.constraint");
Builder.CreateAssumption(BoundsConstraint);
} else {
// For unsigned indices: index [<|<=] size. (>= 0 is implicit.)
- llvm::Value *UpperBound = Accessed
- ? Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.ult.size")
- : Builder.CreateICmpULE(IndexVal, ArraySizeVal, "idx.ule.size");
+ llvm::Value *UpperBound =
+ Accessed
+ ? Builder.CreateICmpULT(IndexVal, ArraySizeVal, "idx.ult.size")
+ : Builder.CreateICmpULE(IndexVal, ArraySizeVal, "idx.ule.size");
Builder.CreateAssumption(UpperBound);
}
}
|
✅ With the latest revision this PR passed the undef deprecator. |
@sjoerdmeijer shared with me the fact that the added selects are melting away under optimization: |
I suspect we want to do something with llvm.assume, or something like that, not use a select. The thing I'm most worried about here is the compiler performance impact; we're doing extra work to preserve bounds from the source code, but in a lot of cases we can infer them in SCEV, and most of the cases where we can't infer them aren't actually performance-sensitive. But that's hard to judge without numbers. |
No.
ISEL discards |
LLVM IR expands slightly, yes.
SCEV cannot infer anything from the LLVM IR if the front-end says nothing about the out-of-bounds semantics. |
Not sure if that's what you mean, but violating an assume is immediate undefined behavior. It's not a "hint" in the heuristic sense, it encodes a precondition. Of course, the purpose of assumes is to improve optimizations by providing more information, so it is an optimization hint in that sense.
This is referring to the result of loading uninitialized memory being undef. What you are doing here is loading from an undef pointer, which is UB. This does encode the precondition as well, but in a way that is likely to get lost in optimization (I have a patch somewhere that removes that select in favor of a freeze, which will break this pattern) and is not understood by LLVM in the way assumes are. Overall, I'm not entirely clear on why using this select form is better than using assume. |
Yeah, we need numbers.
This gets vectorised, see: https://godbolt.org/z/GWb5h7hMb With this patch locally applied, it is no longer vectorised, and I am getting the following with -Rpass-analysis=loop-vectorize:
When I slightly modify the input example, and not let it accumulate, i.e. just have this:
I am getting:
And this version gets also vectorised with unpatched clang. I hope I didn't make a silly mistake with this quick little exercise, but it looks like this gets into the way of vectorisation and that this doesn't bode very well for perf numbers.... |
Maybe with the accumulation version it reads uninitialised values, but it looks like the point that this might get in the way still stands (with the other example). |
Thank you Nikita for clarifying that part.
My first way to fix this was using an assume. I will follow your recommendation and I will amend the patch with an assume. Thank you for your valuable advice. |
035737d
to
a3d5ea6
Compare
Following C and C++ standards, generate llvm.assume statements for array subscript bounds to provide optimization hints. For this code: ``` int arr[10]; int example(int i) { return arr[i]; } ``` clang now generates an `assume(i < 10)`: ``` define i32 @example(i32 noundef %i) local_unnamed_addr #0 { entry: %idxprom = zext nneg i32 %i to i64 %bounds.constraint = icmp ult i32 %i, 10 tail call void @llvm.assume(i1 %bounds.constraint) %arrayidx = getelementptr inbounds nuw i32, ptr @arr, i64 %idxprom %0 = load i32, ptr %arrayidx, align 4, !tbaa !2 ret i32 %0 } ```
a3d5ea6
to
7fdec0a
Compare
I had a little play again with this patch, the updated one. The short summary is:
Here's the longer story, the code examples I played with. Small extension of the example in the description:
The original vector body before is:
And after with this patch:
As I mentioned, the good thing is that this gets optimised away, and the final codegen is the same, but it is quite an expansion. The scalar loop before is:
And after with this patch is:
It might perform the same, but the only thing I'm saying is that it is different and the new version is one instruction longer because the loop is no longer counting down but counting up. |
Looks like cost model in loop-strength-reduction LSR has more code dependent on the main IV in the assumes. Then the assumes melt away in inst-combine and we're left with the longer loop body counting up. |
Maybe we should teach LSR cost model to ignore all computations leading into an assume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the general problem with assumes is that, while they help some optimizations by providing additional information, they can regress others due to the added instructions/uses. Hard to say in advance whether a specific assume use it more beneficial than detrimental.
Possibly LoopVectorize should be dropping assumes altogether -- generating assume of extractelement doesn't look particularly useful to me. Adding another late run of the DropUnnecessaryAssumes pass I recently added would probably also avoid that particular issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been past discussions about whether it is legal to explot this in C/C++. See e.g.
- https://discourse.llvm.org/t/delinearization-validity-checks-in-dependenceanalysis/52000/10
- https://discourse.llvm.org/t/rfc-adding-range-metadata-to-array-subscripts/57912
- https://reviews.llvm.org/D114988
In short, not everybody agreed that this is allowed in every version of C or C++. At least what I don't see in this PR that it is legal to build a pointer for one-past-the-last element of an array. So for float A[10]
, A+10
is valid, but must not be indirected. The assume is only emitted for the subscript operator which technically is syntactic sugar that includes the indirection. However, in practice programmers will use &A[10]
to create apointer to the one-past-the end, e.g.:
float A[10];
n = 10;
...
for (float *p = &A[0]; p < &A[n]; ++p) { ... }
if (n != 10) abort();
We should be very careful to not miscompile this because we added an assume(n < 10)
.
There are also codes around that assume a flattened layout of multi-dimensional arrays. For instance:
float A[10][10];
(&A[0][0])[99]; // or more bluntly: `A[0][99]`
since technically, &A[0][0]
is a pointer to the first subobject of the outer array which is an array of 10 elements.
I would be careful exploiting this kind of information, possibly protect is with a compiler switch in the tradition of -fstrict-aliasing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comments other than what @nikic had, everything else looks reasonable to me.
What are the implications here in terms of safety? As @nikic said, assume is not a constraint, it's a way to make scenarios which are UB have potentially worse effects. In recent times, I think the trend is to check the bounds rather than assume them. How does your change interact with sanitizers? |
Sanitizer interaction: assume generation is disabled when -fsanitize=array-bounds is active. Flexible array detection: skip size-1 arrays as last struct field.
702d9dd adds a flag -fassume-array-bounds disabled by default for now.
702d9dd disables assume generation when sanitizer array-bounds is on.
702d9dd detects struct with last field a flexible size array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to start off with this under a flag -fassume-array-bounds that is off by default, and it looks like an interesting option on itself that people may want to play with, but for our use-case we would like to have this on by default at some point. So, my high-level question is: what do we think the chance is to get this enabled by default?
// CHECK-NOT: call void @llvm.assume | ||
// Taking address of one-past-the-end is allowed by C standard. | ||
// We should NOT assume anything about this access. | ||
return &extern_array[100]; // Legal: one past the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried
extern int extern_array[100];
int *test_extern_array_val(int i) {
return &extern_array[i];
}
with this PR an it generates
%bounds.constraint = icmp ult i32 %i, 100
tail call void @llvm.assume(i1 %bounds.constraint)
if &extern_array[100]
is legal, so must test_extern_array_val(100)
.
Did you consider C++ references?
int &test_extern_array_val(int i) {
return extern_array[i];
}
I think a reference must always point to valid memory, so here one can apply the stricter i < 100
.
// Zero-length arrays: "struct { int len; char data[0]; }" (GCC extension | ||
// https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html) | ||
// Both patterns use arrays as placeholders for variable-length data. | ||
if (CAT && (ArraySize == 0 || ArraySize == 1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should respect StrictFlexArraysLevel, probably.
+1 to this.
I think the answer to that requires more data. How much of a performance benefit do these assumptions get the user under a wide varieties of workloads? When trying the changes on a wide corpus of test cases, how many silent behavioral changes do the assumptions cause? Are tools still able to help folks catch the bugs with their code or do we need to invest in tooling more before we should enable this option by default? My misgivings here boil down to the usual tension between eeking out as much performance as possible and avoiding catastrophic security concerns. Personally, I don't know that I'd want to see this option on by default unless the benefits were quite significant; I think the ecosystem is strongly leaning towards making things more secure by default. I generally think something like -fbounds-safety makes more sense as a default and require people to opt out of it if they want to eek more performance out. |
I am closing this PR because this is not tractable compared to another solution that we already have: #156342 This current PR #159046 attaches the same information again and again to every load, store, and array address. This is harmful for the size of the IR and for a lot of optimization cost functions that are not trained to deal with the assumes. In contrast, PR #156342 only attaches the info once at the level of the declaration (currently alloca and globals declared types, and in the future it can be transitioned to meta-data attached to these decls.) LLVM IR analyses (delinearization, SCEV, DA, etc.) then extract and instantiate the array sizes in the context of the uses (loads and stores.) |
I think this patch could still be useful as a starting point for experimentation, but if you don't want to pursue it right now, that's fine. |
I think we need to generate only one assume per declaration instead of littering the IR with assumes for every use and rely on optimizers to fold all the assumes to a single one. The current way is the declarations of arrays also contain the dimensions sizes. If this type gets removed from alloca/globals (from what I understood in the next ~2 years) a possible way to provide the array dimensions is to maintain the info in a symbol table (as meta-data or so.) |
Following C and C++ standards, generate llvm.assume statements for array
subscript bounds to provide optimization hints.
For this code:
clang now generates an
assume(i < 10)
: