Skip to content

Conversation

@vasu-the-sharma
Copy link

@vasu-the-sharma vasu-the-sharma commented Oct 22, 2025

Description

This PR adds sanitizer checks for null pointers and misalignment during aggregate copy operations in EmitAggregateCopy.

Fixes #61466.

Current State

When copying aggregates (structs), the compiler does not validate that source and destination pointers are non-null or properly aligned when sanitizers are enabled. This allows undefined behavior from invalid pointers to go undetected at runtime.

Solution

When null or alignment sanitizers are active, this change:

  • Retrieves raw pointer addresses from both source and destination LValues
  • Performs EmitTypeCheck on both pointers to verify they are non-null and meet alignment requirements
  • Records these checks in the SanitizerSet for runtime emission

This ensures memory safety violations are caught early, preventing crashes and corruption from invalid aggregate copies.

Testing

Added comprehensive test cases covering:

  • Null source pointer detection
  • Null destination pointer detection
  • Misaligned source pointer detection
  • Misaligned destination pointer detection
  • Normal copy operations (no false positives)

Screenshots

image (4) image (5)

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: VASU SHARMA (vasu-the-sharma)

Changes

This PR adds null and alignment checks for aggregates like scalars


Full diff: https://github.com/llvm/llvm-project/pull/164548.diff

1 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprAgg.cpp (+15)
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index eee397f1f3d19..de6d80a273dbd 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -2249,6 +2249,21 @@ void CodeGenFunction::EmitAggregateCopy(LValue Dest, LValue Src, QualType Ty,
                                         bool isVolatile) {
   assert(!Ty->isAnyComplexType() && "Shouldn't happen for complex");
 
+  if (SanOpts.hasOneOf(SanitizerKind::Null | SanitizerKind::Alignment)) {
+    Address SrcAddr = Src.getAddress();
+    Address DestAddr = Dest.getAddress();
+
+    // Check source pointer for null and alignment violations
+    EmitTypeCheck(TCK_Load, SourceLocation(),
+                  SrcAddr.emitRawPointer(*this), Ty, SrcAddr.getAlignment(),
+                  SanitizerSet());
+
+    // Check destination pointer for null and alignment violations
+    EmitTypeCheck(TCK_Store, SourceLocation(),
+                  DestAddr.emitRawPointer(*this), Ty, DestAddr.getAlignment(),
+                  SanitizerSet());
+  }
+
   Address DestPtr = Dest.getAddress();
   Address SrcPtr = Src.getAddress();
 

@github-actions
Copy link

github-actions bot commented Oct 22, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- clang/test/CodeGen/ubsan-aggregate-null-align.cpp clang/lib/CodeGen/CGExprAgg.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/CGExprAgg.cpp b/clang/lib/CodeGen/CGExprAgg.cpp
index cf215da08..df15be955 100644
--- a/clang/lib/CodeGen/CGExprAgg.cpp
+++ b/clang/lib/CodeGen/CGExprAgg.cpp
@@ -2251,19 +2251,18 @@ void CodeGenFunction::EmitAggregateCopy(LValue Dest, LValue Src, QualType Ty,
 
   // Sanitizer checks to verify source and destination pointers are
   // non-null and properly aligned before copying.
-  // Without these checks, undefined behavior from invalid pointers goes undetected.
+  // Without these checks, undefined behavior from invalid pointers goes
+  // undetected.
   Address SrcAddr = Src.getAddress();
   Address DestAddr = Dest.getAddress();
 
   // Check source pointer for null and alignment violations
-  EmitTypeCheck(TCK_Load, SourceLocation(),
-                SrcAddr.emitRawPointer(*this), Ty, SrcAddr.getAlignment(),
-                SanitizerSet());
+  EmitTypeCheck(TCK_Load, SourceLocation(), SrcAddr.emitRawPointer(*this), Ty,
+                SrcAddr.getAlignment(), SanitizerSet());
 
   // Check destination pointer for null and alignment violations
-  EmitTypeCheck(TCK_Store, SourceLocation(),
-                DestAddr.emitRawPointer(*this), Ty, DestAddr.getAlignment(),
-                SanitizerSet());
+  EmitTypeCheck(TCK_Store, SourceLocation(), DestAddr.emitRawPointer(*this), Ty,
+                DestAddr.getAlignment(), SanitizerSet());
 
   Address DestPtr = Dest.getAddress();
   Address SrcPtr = Src.getAddress();

Copy link
Contributor

@tonykuttai tonykuttai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test support this change

Copy link
Collaborator

@hubert-reinterpretcast hubert-reinterpretcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should be a Clang CodeGen test for this.

Address DestAddr = Dest.getAddress();

// Check source pointer for null and alignment violations
EmitTypeCheck(TCK_Load, SourceLocation(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scope should be expanded to other cases covered by EmitCheckedLValue.

Copy link
Author

@vasu-the-sharma vasu-the-sharma Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @hubert-reinterpretcast
I've reviewed the two EmitCheckedLValue usage sites in CGExprAgg.cpp:

Line 802 (VisitCastExpr): Uses EmitCheckedLValue with TCK_Load for dynamic_cast operations
Line 1313 (VisitBinAssign): Uses EmitCheckedLValue with TCK_Store, then calls EmitCopy which delegates to EmitAggregateCopy

Both cases are already covered:
EmitCheckedLValue performs type checking on the LValue expression itself
My changes to EmitAggregateCopy add sanitizer checks at the actual copy operation (the memcpy call)

These checks are complementary rather than redundant:
EmitCheckedLValue: Validates the expression evaluation produces a valid LValue
EmitAggregateCopy: Validates the source and destination pointers during the memory copy operation

The EmitAggregateCopy checks catch cases where pointers might become invalid between LValue emission and the actual copy (like array indexing or pointer arithmetic).
Do you see other specific cases in EmitCheckedLValue usage that would benefit from additional instrumentation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmitCheckedLvalue will consider the expression to be accessed (and therefore flag past-the-end array elements).

Consider:

#if !SCALAR
typedef struct A { int x; } A;
#else
typedef int A;
#endif
A arr[1];
A *ap = arr;

int main(void) {
  A a = { 0 };
  arr[1] = a;
}

It looks like EmitAggregateCopy may be too "low-level" for all the necessary information to be available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changes like those from vasu-the-sharma#1.
I have not updated and checked the tests in detail though (and some additional test coverage will probably be needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding the following test matrix (potentially using macros to increase reuse of CHECK lines; refer to the use of --check-prefixes in the aforementioned PR).

Note that there is, at this time, a general issue to how EmitCheckedLValue works such that it does not "see through" enough AST nodes to identify the array access case in C++ mode.

Type:

  • plain
  • (for C only) atomic LHS
  • (for C only) volatile LHS

Operand form:

  • arr[idx]
  • *ap

Operation:

  • assignment
    • lhs = rhs
    • (C++ only) lhs.operator=(rhs)
    • (C++ only) ap->operator=(rhs)
  • initialization
    • A a = rhs;
    • (C++ only) A a( rhs );
    • (C++ only) A a{ rhs };
    • (C++ only) A a = { rhs };
    • (C++ only) new A( rhs )
    • (C++ only) new A{ rhs }
    • (C++ only) A( rhs )
    • (C++ only) A{ rhs }
    • (C++ only) static_cast<A>( rhs )
    • (C++ only) (A)rhs
    • A a[] = { rhs };
    • A a[] = { [0] = rhs }; (C extension in C++ mode)
    • (C++ only) struct AA { A a; }; (void)new AA{ rhs };
    • (C++ only) struct AA : virtual A { AA() : A( rhs ) { } } aa;
  • lvalue-to-rvalue conversion
    • (C only) rhs;
    • extern void f(int, ...); f(0, *ap);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of review, it may make sense to pre-commit the test before this PR (so that the change in this PR will simply be the addition of the sanitizer checks).

@vitalybuka vitalybuka self-requested a review October 29, 2025 06:45
@vasu-the-sharma
Copy link
Author

ping

int z;
};

// Stack-to-stack copies are optimized away (compiler knows they're valid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs -NOT check lines to actually test that the cases are optimized away; however, I don't think this is something we need to test in this PR because there should already be tests covering the underlying internal Clang APIs that we are calling. On top of the functional verification (that various specific checks are implemented), we only need enough to demonstrate that we are probably passing the right arguments to the APIs.


// CHECK-UBSAN-LABEL: define {{.*}}void @_Z18test_normal_structv()
// CHECK-NO-UBSAN-LABEL: define {{.*}}void @_Z18test_normal_structv()
void test_normal_struct() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the same comment as above. The testing associated with this function as written does not really cover much (and maybe the test is not helpful).

Comment on lines +73 to +79
// CHECK-UBSAN: br i1 %[[DEST_OK]], label %cont{{.*}}, label %handler.type_mismatch

// CHECK-UBSAN: handler.type_mismatch{{.*}}:
// CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1_abort
// CHECK-UBSAN: unreachable

// CHECK-UBSAN: cont{{.*}}:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label can also be captured and matched as a FileCheck variable.

*dest = *src;
}

// Array copies also need checks for non-constant indices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does not use non-constant indices though. Please modify to perform a single access with a non-constant index (instead of performing multiple accesses with constant index values).

Comment on lines +128 to +129
// CHECK-UBSAN: icmp ne ptr %{{.*}}, null
// CHECK-UBSAN: call void @__ubsan_handle_type_mismatch_v1_abort
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand to trace the control flow by using FileCheck variables, checking for the branch to the label for the abort, and checking for the target labels of the branch.

Comment on lines +3 to +4
// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \
// RUN: | FileCheck %s --check-prefix=CHECK-NO-UBSAN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Remove the cases with no sanitizer enabled. Coverage almost certainly exists already.

// Check source pointer for null and alignment violations
EmitTypeCheck(TCK_Load, SourceLocation(),
SrcAddr.emitRawPointer(*this), Ty, SrcAddr.getAlignment(),
SanitizerSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SanitizerSet() is default

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a real SourceLocation().

I applied the patch and see:

test.so+0xbc38): runtime error store to misaligned address

but for existing alignment checks I see:
test.cc:221:39: runtime error: load of misaligned

I guess we need to forward getExprLoc from callers.

@hubert-reinterpretcast
Copy link
Collaborator

We need a real SourceLocation().

The approach from vasu-the-sharma#1 will get us real source locations (but does not cover all cases where EmitAggregateCopy is called).

@vitalybuka
Copy link
Collaborator

We need a real SourceLocation().

The approach from vasu-the-sharma#1 will get us real source locations (but does not cover all cases where EmitAggregateCopy is called).

For UBSAN we need to keep the location. It's supposed to be usable without debug info. Without location it's hard to act on such reports.

However, how hard to pass it into EmitAggregateCopy?
Maybe we can fork EmitAggregateCopy -> EmitAggregateCopyCurrentOne + EmitAggregateCopyWithSourceLocation
and incrementally transition from first to another?

@hubert-reinterpretcast
Copy link
Collaborator

We need a real SourceLocation().

The approach from vasu-the-sharma#1 will get us real source locations (but does not cover all cases where EmitAggregateCopy is called).

For UBSAN we need to keep the location. It's supposed to be usable without debug info. Without location it's hard to act on such reports.

However, how hard to pass it into EmitAggregateCopy? Maybe we can fork EmitAggregateCopy -> EmitAggregateCopyCurrentOne + EmitAggregateCopyWithSourceLocation and incrementally transition from first to another?

The source location is not the only issue. As noted in #164548 (comment), if we use EmitCheckedLValue while we still have the source expression, we get better checking for array subscript operations.

Additionally, the approach in vasu-the-sharma#1 also covers some cases where we don't (always) call EmitAggregateCopy such as in https://github.com/vasu-the-sharma/llvm-project/blob/bc7f979bcb205b82d43d99fd940889f3d8801ca4/clang/lib/CodeGen/CGExprAgg.cpp#L1331-L1341.

TL;DR: The issue is not necessarily with EmitAggregateCopy not generating UBSAN checks. The case can be made that the issue is that the LValue arguments did not come from calls to EmitCheckedLValue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt:ubsan Undefined behavior sanitizer compiler-rt

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UBsan misses a Null-pointer-dereference

5 participants