Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 33 additions & 10 deletions clang/docs/AllocToken.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,39 @@ change or removal. These may (experimentally) be selected with ``-mllvm
* *Increment* (mode=0): This mode assigns a simple, incrementally increasing
token ID to each allocation site.

The following command-line options affect generated token IDs:

* ``-falloc-token-max=<N>``
Configures the maximum number of tokens. No max by default (tokens bounded
by ``UINT64_MAX``).

Querying Token IDs with ``__builtin_alloc_token_infer``
=======================================================

For use cases where the token ID must be known at compile time, Clang provides
a builtin function:

.. code-block:: c

uint64_t __builtin_alloc_token_infer(<args>, ...);

This builtin returns the token ID inferred from its argument expressions, which
mirror arguments normally passed to any allocation function. The argument
expressions are **unevaluated**, so it can be used with expressions that would
have side effects without any runtime impact.

For example, it can be used as follows:

.. code-block:: c

struct MyType { ... };
void *__partition_alloc(size_t size, uint64_t partition);
#define partition_alloc(...) __partition_alloc(__VA_ARGS__, __builtin_alloc_token_infer(__VA_ARGS__))

void foo(void) {
MyType *x = partition_alloc(sizeof(*x));
}

Allocation Token Instrumentation
================================

Expand All @@ -70,16 +103,6 @@ example:
// Instrumented:
ptr = __alloc_token_malloc(size, token_id);

In addition, it is typically recommended to configure the following:

* ``-falloc-token-max=<N>``
Configures the maximum number of tokens. No max by default (tokens bounded
by ``UINT64_MAX``).

.. code-block:: console

% clang++ -fsanitize=alloc-token -falloc-token-max=512 example.cc

Runtime Interface
-----------------

Expand Down
5 changes: 4 additions & 1 deletion clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,10 @@ Non-comprehensive list of changes in this release

- Introduce support for allocation tokens to enable allocator-level heap
organization strategies. A feature to instrument all allocation functions
with a token ID can be enabled via the ``-fsanitize=alloc-token`` flag.
with a token ID can be enabled via the ``-fsanitize=alloc-token`` flag. A
builtin ``__builtin_alloc_token_infer(<args>, ...)`` is provided to allow
compile-time querying of allocation token IDs, where the builtin arguments
mirror those normally passed to an allocation function.

New Compiler Flags
------------------
Expand Down
6 changes: 6 additions & 0 deletions clang/include/clang/Basic/Builtins.td
Original file line number Diff line number Diff line change
Expand Up @@ -1274,6 +1274,12 @@ def AllocaWithAlignUninitialized : Builtin {
let Prototype = "void*(size_t, _Constant size_t)";
}

def AllocTokenInfer : Builtin {
let Spellings = ["__builtin_alloc_token_infer"];
let Attributes = [NoThrow, Const, Pure, CustomTypeChecking, UnevaluatedArguments];
let Prototype = "unsigned long long int(...)";
}

def CallWithStaticChain : Builtin {
let Spellings = ["__builtin_call_with_static_chain"];
let Attributes = [NoThrow, CustomTypeChecking];
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -2946,6 +2946,9 @@ class Sema final : public SemaBase {
/// than 8.
bool BuiltinAllocaWithAlign(CallExpr *TheCall);

/// Handle __builtin_alloc_token_infer.
bool BuiltinAllocTokenInfer(CallExpr *TheCall);

/// BuiltinArithmeticFence - Handle __arithmetic_fence.
bool BuiltinArithmeticFence(CallExpr *TheCall);

Expand Down
27 changes: 17 additions & 10 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -794,16 +794,6 @@ static void addSanitizers(const Triple &TargetTriple,
if (LangOpts.Sanitize.has(SanitizerKind::DataFlow)) {
MPM.addPass(DataFlowSanitizerPass(LangOpts.NoSanitizeFiles));
}

if (LangOpts.Sanitize.has(SanitizerKind::AllocToken)) {
if (Level == OptimizationLevel::O0) {
// The default pass builder only infers libcall function attrs when
// optimizing, so we insert it here because we need it for accurate
// memory allocation function detection.
MPM.addPass(InferFunctionAttrsPass());
}
MPM.addPass(AllocTokenPass(getAllocTokenOptions(CodeGenOpts)));
}
};
if (ClSanitizeOnOptimizerEarlyEP) {
PB.registerOptimizerEarlyEPCallback(
Expand Down Expand Up @@ -846,6 +836,22 @@ static void addSanitizers(const Triple &TargetTriple,
}
}

static void addAllocTokenPass(const Triple &TargetTriple,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather separate sema changes from codegen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate llvm/.. from clang/.. changes?
Or clang/lib/CodeGen from others?

One is useless without the other, and at least this way, if there's a problem, we can atomically revert one commit instead of several or partially reverting one or several commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang part looks deppends on LLVM but can be separated.
To avoid unlikely complex revert I'd just rather lang them with delay in a few days.

const CodeGenOptions &CodeGenOpts,
const LangOptions &LangOpts, PassBuilder &PB) {
PB.registerOptimizerLastEPCallback(
[&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) {
if (Level == OptimizationLevel::O0 &&
LangOpts.Sanitize.has(SanitizerKind::AllocToken)) {
// The default pass builder only infers libcall function attrs when
// optimizing, so we insert it here because we need it for accurate
// memory allocation function detection with -fsanitize=alloc-token.
MPM.addPass(InferFunctionAttrsPass());
}
MPM.addPass(AllocTokenPass(getAllocTokenOptions(CodeGenOpts)));
});
}

void EmitAssemblyHelper::RunOptimizationPipeline(
BackendAction Action, std::unique_ptr<raw_pwrite_stream> &OS,
std::unique_ptr<llvm::ToolOutputFile> &ThinLinkOS, BackendConsumer *BC) {
Expand Down Expand Up @@ -1101,6 +1107,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
if (!IsThinLTOPostLink) {
addSanitizers(TargetTriple, CodeGenOpts, LangOpts, PB);
addKCFIPass(TargetTriple, LangOpts, PB);
addAllocTokenPass(TargetTriple, CodeGenOpts, LangOpts, PB);
}

if (std::optional<GCOVOptions> Options =
Expand Down
8 changes: 8 additions & 0 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4475,6 +4475,14 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
return RValue::get(AI);
}

case Builtin::BI__builtin_alloc_token_infer: {
llvm::MDNode *MDN = EmitAllocTokenHint(E);
llvm::Value *MDV = MetadataAsValue::get(getLLVMContext(), MDN);
llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::alloc_token_id);
llvm::CallBase *TokenID = Builder.CreateCall(F, MDV);
return RValue::get(TokenID);
}

case Builtin::BIbzero:
case Builtin::BI__builtin_bzero: {
Address Dest = EmitPointerWithAlignment(E->getArg(0));
Expand Down
34 changes: 22 additions & 12 deletions clang/lib/CodeGen/CGExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1273,11 +1273,7 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
EmitCheck(std::make_pair(Check, CheckKind), CheckHandler, StaticData, Index);
}

void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
QualType AllocType) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");

llvm::MDNode *CodeGenFunction::EmitAllocTokenHint(QualType AllocType) {
llvm::MDBuilder MDB(getLLVMContext());

// Get unique type name.
Expand Down Expand Up @@ -1340,14 +1336,20 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
};
const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
if (!ContainsPtr && IncompleteType)
return;
return nullptr;
auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);

// Format: !{<type-name>, <contains-pointer>}
auto *MDN =
llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
return llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
}

void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
QualType AllocType) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint,
EmitAllocTokenHint(AllocType));
}

/// Infer type from a simple sizeof expression.
Expand Down Expand Up @@ -1423,8 +1425,7 @@ static QualType inferTypeFromCastExpr(const CallExpr *CallE,
return QualType();
}

void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
const CallExpr *E) {
llvm::MDNode *CodeGenFunction::EmitAllocTokenHint(const CallExpr *E) {
QualType AllocType;
// First check arguments.
for (const Expr *Arg : E->arguments()) {
Expand All @@ -1439,7 +1440,16 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
AllocType = inferTypeFromCastExpr(E, CurCast);
// Emit if we were able to infer the type.
if (!AllocType.isNull())
EmitAllocTokenHint(CB, AllocType);
return EmitAllocTokenHint(AllocType);
return nullptr;
}

void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
const CallExpr *E) {
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
if (llvm::MDNode *MDN = EmitAllocTokenHint(E))
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
}

CodeGenFunction::ComplexPairTy CodeGenFunction::
Expand Down
7 changes: 6 additions & 1 deletion clang/lib/CodeGen/CodeGenFunction.h
Original file line number Diff line number Diff line change
Expand Up @@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache {
SanitizerAnnotateDebugInfo(ArrayRef<SanitizerKind::SanitizerOrdinal> Ordinals,
SanitizerHandler Handler);

/// Emit additional metadata used by the AllocToken instrumentation.
/// Emit metadata used by the AllocToken instrumentation.
llvm::MDNode *EmitAllocTokenHint(QualType AllocType);
/// Emit and set additional metadata used by the AllocToken instrumentation.
void EmitAllocTokenHint(llvm::CallBase *CB, QualType AllocType);
/// Emit additional metadata used by the AllocToken instrumentation,
/// inferring the type from an allocation call expression.
llvm::MDNode *EmitAllocTokenHint(const CallExpr *E);
/// Emit and set additional metadata used by the AllocToken instrumentation,
/// inferring the type from an allocation call expression.
void EmitAllocTokenHint(llvm::CallBase *CB, const CallExpr *E);

llvm::Value *GetCountedByFieldExprGEP(const Expr *Base, const FieldDecl *FD,
Expand Down
22 changes: 22 additions & 0 deletions clang/lib/Sema/SemaChecking.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2638,6 +2638,10 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID,
builtinAllocaAddrSpace(*this, TheCall);
}
break;
case Builtin::BI__builtin_alloc_token_infer:
if (BuiltinAllocTokenInfer(TheCall))
return ExprError();
break;
case Builtin::BI__arithmetic_fence:
if (BuiltinArithmeticFence(TheCall))
return ExprError();
Expand Down Expand Up @@ -5760,6 +5764,24 @@ bool Sema::BuiltinAllocaWithAlign(CallExpr *TheCall) {
return false;
}

bool Sema::BuiltinAllocTokenInfer(CallExpr *TheCall) {
if (checkArgCountAtLeast(TheCall, 1))
return true;

for (Expr *Arg : TheCall->arguments()) {
// If argument is dependent on a template parameter, we can't resolve now.
if (Arg->isTypeDependent() || Arg->isValueDependent())
continue;
// Reject void types.
QualType ArgTy = Arg->IgnoreParenImpCasts()->getType();
if (ArgTy->isVoidType())
return Diag(Arg->getBeginLoc(), diag::err_param_with_void_type);
}

TheCall->setType(Context.UnsignedLongLongTy);
return false;
}

bool Sema::BuiltinAssumeAligned(CallExpr *TheCall) {
if (checkArgCountRange(TheCall, 2, 3))
return true;
Expand Down
8 changes: 6 additions & 2 deletions clang/test/CodeGen/lto-newpm-pipeline.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,12 @@
// CHECK-FULL-O0-NEXT: Running pass: AlwaysInlinerPass
// CHECK-FULL-O0-NEXT: Running analysis: ProfileSummaryAnalysis
// CHECK-FULL-O0-NEXT: Running pass: CoroConditionalWrapper
// CHECK-FULL-O0-NEXT: Running pass: AllocTokenPass
// CHECK-FULL-O0-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
// CHECK-FULL-O0-NEXT: Running analysis: TargetLibraryAnalysis
// CHECK-FULL-O0-NEXT: Running pass: CanonicalizeAliasesPass
// CHECK-FULL-O0-NEXT: Running pass: NameAnonGlobalPass
// CHECK-FULL-O0-NEXT: Running pass: AnnotationRemarksPass
// CHECK-FULL-O0-NEXT: Running analysis: TargetLibraryAnalysis
// CHECK-FULL-O0-NEXT: Running pass: VerifierPass
// CHECK-FULL-O0-NEXT: Running pass: BitcodeWriterPass

Expand All @@ -46,10 +48,12 @@
// CHECK-THIN-O0-NEXT: Running pass: AlwaysInlinerPass
// CHECK-THIN-O0-NEXT: Running analysis: ProfileSummaryAnalysis
// CHECK-THIN-O0-NEXT: Running pass: CoroConditionalWrapper
// CHECK-THIN-O0-NEXT: Running pass: AllocTokenPass
// CHECK-THIN-O0-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
// CHECK-THIN-O0-NEXT: Running analysis: TargetLibraryAnalysis
// CHECK-THIN-O0-NEXT: Running pass: CanonicalizeAliasesPass
// CHECK-THIN-O0-NEXT: Running pass: NameAnonGlobalPass
// CHECK-THIN-O0-NEXT: Running pass: AnnotationRemarksPass
// CHECK-THIN-O0-NEXT: Running analysis: TargetLibraryAnalysis
// CHECK-THIN-O0-NEXT: Running pass: VerifierPass
// CHECK-THIN-O0-NEXT: Running pass: ThinLTOBitcodeWriterPass

Expand Down
76 changes: 76 additions & 0 deletions clang/test/CodeGenCXX/alloc-token-builtin.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// Test IR generation of the builtin without evaluating the LLVM intrinsic.
// RUN: %clang_cc1 -triple x86_64-linux-gnu -Werror -std=c++20 -emit-llvm -disable-llvm-passes %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-CODEGEN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backendutils and CG* changes also seems unrelated

// RUN: %clang_cc1 -triple x86_64-linux-gnu -Werror -std=c++20 -emit-llvm -falloc-token-max=2 %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-LOWER

extern "C" void *my_malloc(unsigned long, unsigned long);

struct NoPtr {
int x;
long y;
};

struct WithPtr {
int a;
char *buf;
};

int unevaluated_fn();

// CHECK-LABEL: @_Z16test_builtin_intv(
unsigned long long test_builtin_int() {
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_INT:[0-9]+]])
// CHECK-LOWER: ret i64 0
return __builtin_alloc_token_infer(sizeof(1));
}

// CHECK-LABEL: @_Z16test_builtin_ptrv(
unsigned long long test_builtin_ptr() {
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_PTR:[0-9]+]])
// CHECK-LOWER: ret i64 1
return __builtin_alloc_token_infer(sizeof(int *));
}

// CHECK-LABEL: @_Z25test_builtin_struct_noptrv(
unsigned long long test_builtin_struct_noptr() {
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_NOPTR:[0-9]+]])
// CHECK-LOWER: ret i64 0
return __builtin_alloc_token_infer(sizeof(NoPtr));
}

// CHECK-LABEL: @_Z25test_builtin_struct_w_ptrv(
unsigned long long test_builtin_struct_w_ptr() {
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_WITHPTR:[0-9]+]])
// CHECK-LOWER: ret i64 1
return __builtin_alloc_token_infer(sizeof(WithPtr), 123);
}

// CHECK-LABEL: @_Z24test_builtin_unevaluatedv(
unsigned long long test_builtin_unevaluated() {
// CHECK-NOT: call{{.*}}unevaluated_fn
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_INT:[0-9]+]])
// CHECK-LOWER: ret i64 0
return __builtin_alloc_token_infer(sizeof(int) * unevaluated_fn());
}

// CHECK-LABEL: @_Z36test_builtin_unsequenced_unevaluatedi(
void test_builtin_unsequenced_unevaluated(int x) {
// CHECK: add nsw
// CHECK-NOT: add nsw
// CHECK-CODEGEN: %[[REG:[0-9]+]] = call i64 @llvm.alloc.token.id(metadata ![[MD_UNKNOWN:[0-9]+]])
// CHECK-CODEGEN: call{{.*}}@my_malloc({{.*}}, i64 noundef %[[REG]])
// CHECK-LOWER: call{{.*}}@my_malloc({{.*}}, i64 noundef 0)
my_malloc(++x, __builtin_alloc_token_infer(++x));
}

// CHECK-LABEL: @_Z20test_builtin_unknownv(
unsigned long long test_builtin_unknown() {
// CHECK-CODEGEN: call i64 @llvm.alloc.token.id(metadata ![[MD_UNKNOWN:[0-9]+]])
// CHECK-LOWER: ret i64 0
return __builtin_alloc_token_infer(4096);
}

// CHECK-CODEGEN: ![[MD_INT]] = !{!"int", i1 false}
// CHECK-CODEGEN: ![[MD_PTR]] = !{!"int *", i1 true}
// CHECK-CODEGEN: ![[MD_NOPTR]] = !{!"NoPtr", i1 false}
// CHECK-CODEGEN: ![[MD_WITHPTR]] = !{!"WithPtr", i1 true}
// CHECK-CODEGEN: ![[MD_UNKNOWN]] = !{}
8 changes: 8 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2852,7 +2852,15 @@ def int_ptrauth_blend :
def int_ptrauth_sign_generic :
DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty], [IntrNoMem]>;

//===----------------- AllocToken Intrinsics ------------------------------===//

// Return the token ID for the given !alloc_token_hint metadata.
def int_alloc_token_id :
DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_metadata_ty],
[IntrNoMem, NoUndef<RetIndex>]>;

//===----------------------------------------------------------------------===//

//===------- Convergence Intrinsics ---------------------------------------===//

def int_experimental_convergence_entry
Expand Down
Loading
Loading