Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
c796357
[𝘀𝗽𝗿] changes to main this commit is based on
melver Sep 4, 2025
a2e11fc
[𝘀𝗽𝗿] initial version
melver Sep 4, 2025
f9a8b15
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 4, 2025
1bc3905
rebase
melver Sep 4, 2025
8da5f63
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 5, 2025
2ca9c72
fixup! Switch to fixed MD
melver Sep 5, 2025
85dc54d
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 8, 2025
465097f
fixup! fix for incomplete types
melver Sep 8, 2025
6d9fc6a
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 8, 2025
f7d3204
fixup!
melver Sep 8, 2025
9ca8ddc
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 18, 2025
4284a8a
fixup! address reviewer comments
melver Sep 18, 2025
14c47f8
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 19, 2025
fa5f672
fixup! address reviewer comments round 2
melver Sep 19, 2025
0e30e56
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 22, 2025
ce655c3
fixup! use update_test_checks.py for opt tests
melver Sep 22, 2025
5bac7bb
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 23, 2025
ce53f3d
fixup! do not strip _
melver Sep 23, 2025
63e68f7
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 26, 2025
cb42dcf
fixup! address some comments
melver Sep 26, 2025
1f4e3e2
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 26, 2025
bac3951
fixup! address more comments
melver Sep 26, 2025
636c880
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 29, 2025
81d45c4
rebase
melver Sep 29, 2025
0c933b6
[𝘀𝗽𝗿] changes introduced through rebase
melver Sep 30, 2025
20b5a41
fixup! address comments
melver Sep 30, 2025
02b014d
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 2, 2025
6f8d25b
fixup!
melver Oct 2, 2025
e34c2d9
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 2, 2025
45a77f2
fixup! switch clang tests back to manually written
melver Oct 2, 2025
e2fe2ea
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
90a394b
rebase
melver Oct 7, 2025
f8a1390
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
3fee412
rebase
melver Oct 7, 2025
75b4ea6
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
adcaf3a
rebase
melver Oct 7, 2025
a0f7dcb
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
e533ccc
rebase
melver Oct 7, 2025
a5fb2d4
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
64b8cf9
rebase
melver Oct 7, 2025
e3a1d21
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
3e49b6e
rebase
melver Oct 7, 2025
64a2d7d
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 7, 2025
e613813
rebase
melver Oct 7, 2025
6ec0253
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 8, 2025
11188d2
rebase
melver Oct 8, 2025
4be1d65
[𝘀𝗽𝗿] changes introduced through rebase
melver Oct 8, 2025
3ffb904
rebase
melver Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 36 additions & 2 deletions clang/docs/AllocToken.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,18 @@ Token Assignment Mode

The default mode to calculate tokens is:

* ``typehash``: This mode assigns a token ID based on the hash of the allocated
type's name.
* ``typehashpointersplit``: This mode assigns a token ID based on the hash of
the allocated type's name, where the top half ID-space is reserved for types
that contain pointers and the bottom half for types that do not contain
pointers.

Other token ID assignment modes are supported, but they may be subject to
change or removal. These may (experimentally) be selected with ``-mllvm
-alloc-token-mode=<mode>``:

* ``typehash``: This mode assigns a token ID based on the hash of the allocated
type's name.

* ``random``: This mode assigns a statically-determined random token ID to each
allocation site.

Expand Down Expand Up @@ -117,6 +122,35 @@ which encodes the token ID hint in the allocation function name.
This ABI provides a more efficient alternative where
``-falloc-token-max`` is small.

Instrumenting Non-Standard Allocation Functions
-----------------------------------------------

By default, AllocToken only instruments standard library allocation functions.
This simplifies adoption, as a compatible allocator only needs to provide
token-enabled variants for a well-defined set of standard functions.

To extend instrumentation to custom allocation functions, enable broader
coverage with ``-fsanitize-alloc-token-extended``. Such functions require being
marked with the `malloc
<https://clang.llvm.org/docs/AttributeReference.html#malloc>`_ or `alloc_size
<https://clang.llvm.org/docs/AttributeReference.html#alloc-size>`_ attributes
(or a combination).

For example:

.. code-block:: c

void *custom_malloc(size_t size) __attribute__((malloc));
void *my_malloc(size_t size) __attribute__((alloc_size(1)));

// Original:
ptr1 = custom_malloc(size);
ptr2 = my_malloc(size);

// Instrumented:
ptr1 = __alloc_token_custom_malloc(size, token_id);
ptr2 = __alloc_token_my_malloc(size, token_id);

Disabling Instrumentation
-------------------------

Expand Down
199 changes: 192 additions & 7 deletions clang/lib/CodeGen/CGExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "clang/AST/Attr.h"
#include "clang/AST/DeclObjC.h"
#include "clang/AST/NSAPI.h"
#include "clang/AST/ParentMapContext.h"
#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Builtins.h"
#include "clang/Basic/CodeGenOptions.h"
Expand Down Expand Up @@ -1272,23 +1273,196 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
EmitCheck(std::make_pair(Check, CheckKind), CheckHandler, StaticData, Index);
}

static bool
typeContainsPointer(QualType T,
llvm::SmallPtrSet<const RecordDecl *, 4> &VisitedRD,
bool &IncompleteType) {
QualType CanonicalType = T.getCanonicalType();
if (CanonicalType->isPointerType())
return true; // base case

// Look through typedef chain to check for special types.
for (QualType CurrentT = T; const auto *TT = CurrentT->getAs<TypedefType>();
CurrentT = TT->getDecl()->getUnderlyingType()) {
const IdentifierInfo *II = TT->getDecl()->getIdentifier();
// Special Case: Syntactically uintptr_t is not a pointer; semantically,
// however, very likely used as such. Therefore, classify uintptr_t as a
// pointer, too.
if (II && II->isStr("uintptr_t"))
return true;
}

// The type is an array; check the element type.
if (const ArrayType *AT = dyn_cast<ArrayType>(CanonicalType))
return typeContainsPointer(AT->getElementType(), VisitedRD, IncompleteType);
// The type is a struct, class, or union.
if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
if (!RD->isCompleteDefinition()) {
IncompleteType = true;
return false;
}
if (!VisitedRD.insert(RD).second)
return false; // already visited
// Check all fields.
for (const FieldDecl *Field : RD->fields()) {
if (typeContainsPointer(Field->getType(), VisitedRD, IncompleteType))
return true;
}
// For C++ classes, also check base classes.
if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {
// Polymorphic types require a vptr.
if (CXXRD->isDynamicClass())
return true;
for (const CXXBaseSpecifier &Base : CXXRD->bases()) {
if (typeContainsPointer(Base.getType(), VisitedRD, IncompleteType))
return true;
}
}
}
return false;
}

void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, QualType AllocType) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");

llvm::MDBuilder MDB(getLLVMContext());

// Get unique type name.
PrintingPolicy Policy(CGM.getContext().getLangOpts());
Policy.SuppressTagKeyword = true;
Policy.FullyQualifiedName = true;
SmallString<64> TypeName;
llvm::raw_svector_ostream TypeNameOS(TypeName);
AllocType.getCanonicalType().print(TypeNameOS, Policy);
auto *TypeMDS = llvm::MDString::get(CGM.getLLVMContext(), TypeNameOS.str());
auto *TypeNameMD = MDB.createString(TypeNameOS.str());

// Check if QualType contains a pointer. Implements a simple DFS to
// recursively check if a type contains a pointer type.
llvm::SmallPtrSet<const RecordDecl *, 4> VisitedRD;
bool IncompleteType = false;
const bool ContainsPtr =
typeContainsPointer(AllocType, VisitedRD, IncompleteType);
if (!ContainsPtr && IncompleteType)
return;
auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);

// Format: !{<type-name>}
auto *MDN = llvm::MDNode::get(CGM.getLLVMContext(), {TypeMDS});
// Format: !{<type-name>, <contains-pointer>}
auto *MDN =
llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
}

namespace {
/// Infer type from a simple sizeof expression.
QualType inferTypeFromSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
if (const auto *UET = dyn_cast<UnaryExprOrTypeTraitExpr>(Arg)) {
if (UET->getKind() == UETT_SizeOf) {
if (UET->isArgumentType())
return UET->getArgumentTypeInfo()->getType();
else
return UET->getArgumentExpr()->getType();
}
}
return QualType();
}

/// Infer type from an arithmetic expression involving a sizeof. For example:
///
/// malloc(sizeof(MyType) + padding); // infers 'MyType'
/// malloc(sizeof(MyType) * 32); // infers 'MyType'
/// malloc(32 * sizeof(MyType)); // infers 'MyType'
/// malloc(sizeof(MyType) << 1); // infers 'MyType'
/// ...
///
/// More complex arithmetic expressions are supported, but are a heuristic, e.g.
/// when considering allocations for structs with flexible array members:
///
/// malloc(sizeof(HasFlexArray) + sizeof(int) * 32); // infers 'HasFlexArray'
///
QualType inferPossibleTypeFromArithSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
// The argument is a lone sizeof expression.
if (QualType T = inferTypeFromSizeofExpr(Arg); !T.isNull())
return T;
if (const auto *BO = dyn_cast<BinaryOperator>(Arg)) {
// Argument is an arithmetic expression. Cover common arithmetic patterns
// involving sizeof.
switch (BO->getOpcode()) {
case BO_Add:
case BO_Div:
case BO_Mul:
case BO_Shl:
case BO_Shr:
case BO_Sub:
if (QualType T = inferPossibleTypeFromArithSizeofExpr(BO->getLHS());
!T.isNull())
return T;
if (QualType T = inferPossibleTypeFromArithSizeofExpr(BO->getRHS());
!T.isNull())
return T;
break;
default:
break;
}
}
return QualType();
}

/// If the expression E is a reference to a variable, infer the type from a
/// variable's initializer if it contains a sizeof. Beware, this is a heuristic
/// and ignores if a variable is later reassigned. For example:
///
/// size_t my_size = sizeof(MyType);
/// void *x = malloc(my_size); // infers 'MyType'
///
QualType inferPossibleTypeFromVarInitSizeofExpr(const Expr *E) {
const Expr *Arg = E->IgnoreParenImpCasts();
if (const auto *DRE = dyn_cast<DeclRefExpr>(Arg)) {
if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
if (const Expr *Init = VD->getInit())
return inferPossibleTypeFromArithSizeofExpr(Init);
}
}
return QualType();
}

/// Deduces the allocated type by checking if the allocation call's result
/// is immediately used in a cast expression. For example:
///
/// MyType *x = (MyType *)malloc(4096); // infers 'MyType'
///
QualType inferPossibleTypeFromCastExpr(const CallExpr *CallE,
const CastExpr *CastE) {
if (!CastE)
return QualType();
QualType PtrType = CastE->getType();
if (PtrType->isPointerType())
return PtrType->getPointeeType();
return QualType();
}
} // end anonymous namespace

void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, const CallExpr *E) {
QualType AllocType;
// First check arguments.
for (const Expr *Arg : E->arguments()) {
AllocType = inferPossibleTypeFromArithSizeofExpr(Arg);
if (AllocType.isNull())
AllocType = inferPossibleTypeFromVarInitSizeofExpr(Arg);
if (!AllocType.isNull())
break;
}
// Then check later casts.
if (AllocType.isNull())
AllocType = inferPossibleTypeFromCastExpr(E, CurCast);
// Emit if we were able to infer the type.
if (!AllocType.isNull())
EmitAllocToken(CB, AllocType);
}

CodeGenFunction::ComplexPairTy CodeGenFunction::
EmitComplexPrePostIncDec(const UnaryOperator *E, LValue LV,
bool isInc, bool isPre) {
Expand Down Expand Up @@ -5659,6 +5833,9 @@ LValue CodeGenFunction::EmitConditionalOperatorLValue(
/// are permitted with aggregate result, including noop aggregate casts, and
/// cast from scalar to union.
LValue CodeGenFunction::EmitCastLValue(const CastExpr *E) {
auto RestoreCurCast =
llvm::make_scope_exit([this, Prev = CurCast] { CurCast = Prev; });
CurCast = E;
switch (E->getCastKind()) {
case CK_ToVoid:
case CK_BitCast:
Expand Down Expand Up @@ -6604,16 +6781,24 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
E == MustTailCall, E->getExprLoc());

// Generate function declaration DISuprogram in order to be used
// in debug info about call sites.
if (CGDebugInfo *DI = getDebugInfo()) {
if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
// Generate function declaration DISuprogram in order to be used
// in debug info about call sites.
if (CGDebugInfo *DI = getDebugInfo()) {
FunctionArgList Args;
QualType ResTy = BuildFunctionArgList(CalleeDecl, Args);
DI->EmitFuncDeclForCallSite(LocalCallOrInvoke,
DI->getFunctionType(CalleeDecl, ResTy, Args),
CalleeDecl);
}
if (CalleeDecl->hasAttr<RestrictAttr>() ||
CalleeDecl->hasAttr<AllocSizeAttr>()) {
// Function has 'malloc' (aka. 'restrict') or 'alloc_size' attribute.
if (SanOpts.has(SanitizerKind::AllocToken)) {
// Set !alloc_token metadata.
EmitAllocToken(LocalCallOrInvoke, E);
}
}
}
if (CallOrInvoke)
*CallOrInvoke = LocalCallOrInvoke;
Expand Down
12 changes: 10 additions & 2 deletions clang/lib/CodeGen/CGExprCXX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1371,8 +1371,16 @@ RValue CodeGenFunction::EmitBuiltinNewDeleteCall(const FunctionProtoType *Type,

for (auto *Decl : Ctx.getTranslationUnitDecl()->lookup(Name))
if (auto *FD = dyn_cast<FunctionDecl>(Decl))
if (Ctx.hasSameType(FD->getType(), QualType(Type, 0)))
return EmitNewDeleteCall(*this, FD, Type, Args);
if (Ctx.hasSameType(FD->getType(), QualType(Type, 0))) {
RValue RV = EmitNewDeleteCall(*this, FD, Type, Args);
if (auto *CB = dyn_cast_if_present<llvm::CallBase>(RV.getScalarVal())) {
if (SanOpts.has(SanitizerKind::AllocToken)) {
// Set !alloc_token metadata.
EmitAllocToken(CB, TheCall);
}
}
return RV;
}
llvm_unreachable("predeclared global operator new/delete is missing");
}

Expand Down
5 changes: 5 additions & 0 deletions clang/lib/CodeGen/CGExprScalar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#include "clang/Basic/DiagnosticTrap.h"
#include "clang/Basic/TargetInfo.h"
#include "llvm/ADT/APFixedPoint.h"
#include "llvm/ADT/ScopeExit.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"
Expand Down Expand Up @@ -2434,6 +2435,10 @@ static Value *EmitHLSLElementwiseCast(CodeGenFunction &CGF, LValue SrcVal,
// have to handle a more broad range of conversions than explicit casts, as they
// handle things like function to ptr-to-function decay etc.
Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
auto RestoreCurCast =
llvm::make_scope_exit([this, Prev = CGF.CurCast] { CGF.CurCast = Prev; });
CGF.CurCast = CE;

Expr *E = CE->getSubExpr();
QualType DestTy = CE->getType();
CastKind Kind = CE->getCastKind();
Expand Down
7 changes: 7 additions & 0 deletions clang/lib/CodeGen/CodeGenFunction.h
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,10 @@ class CodeGenFunction : public CodeGenTypeCache {
QualType FnRetTy;
llvm::Function *CurFn = nullptr;

/// If a cast expression is being visited, this holds the current cast's
/// expression.
const CastExpr *CurCast = nullptr;

/// Save Parameter Decl for coroutine.
llvm::SmallVector<const ParmVarDecl *, 4> FnArgs;

Expand Down Expand Up @@ -3350,6 +3354,9 @@ class CodeGenFunction : public CodeGenTypeCache {

/// Emit additional metadata used by the AllocToken instrumentation.
void EmitAllocToken(llvm::CallBase *CB, QualType AllocType);
/// Emit additional metadata used by the AllocToken instrumentation,
/// inferring the type from an allocation call expression.
void EmitAllocToken(llvm::CallBase *CB, const CallExpr *E);

llvm::Value *GetCountedByFieldExprGEP(const Expr *Base, const FieldDecl *FD,
const FieldDecl *CountDecl);
Expand Down
14 changes: 13 additions & 1 deletion clang/test/CodeGen/alloc-token-lower.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ typedef __typeof(sizeof(int)) size_t;
void *malloc(size_t size);

// CHECK-LABEL: @test_malloc(
// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
// CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 2689373973731826898){{.*}} !alloc_token [[META_INT:![0-9]+]]
void *test_malloc() {
return malloc(sizeof(int));
}
Expand All @@ -20,3 +20,15 @@ void *test_malloc() {
void *no_sanitize_malloc(size_t size) __attribute__((no_sanitize("alloc-token"))) {
return malloc(sizeof(int));
}

// By default, we should not be touching malloc-attributed non-libcall
// functions: there might be an arbitrary number of these, and a compatible
// allocator will only implement standard allocation functions.
void *nonstandard_malloc(size_t size) __attribute__((malloc));
// CHECK-LABEL: @test_nonlibcall_malloc(
// CHECK: call{{.*}} ptr @nonstandard_malloc(i64 noundef 4){{.*}} !alloc_token [[META_INT]]
void *test_nonlibcall_malloc() {
return nonstandard_malloc(sizeof(int));
}

// CHECK: [[META_INT]] = !{!"int", i1 false}
Loading
Loading