Skip to content

Commit b689546

Browse files
committed
[𝘀𝗽𝗿] initial version
Created using spr 1.3.8-beta.1
2 parents a1bfa2f + 75bfb7a commit b689546

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1836
-13
lines changed

clang/docs/AllocToken.rst

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
=================
2+
Allocation Tokens
3+
=================
4+
5+
.. contents::
6+
:local:
7+
8+
Introduction
9+
============
10+
11+
Clang provides support for allocation tokens to enable allocator-level heap
12+
organization strategies. Clang assigns mode-dependent token IDs to allocation
13+
calls; the runtime behavior depends entirely on the implementation of a
14+
compatible memory allocator.
15+
16+
Possible allocator strategies include:
17+
18+
* **Security Hardening**: Placing allocations into separate, isolated heap
19+
partitions. For example, separating pointer-containing types from raw data
20+
can mitigate exploits that rely on overflowing a primitive buffer to corrupt
21+
object metadata.
22+
23+
* **Memory Layout Optimization**: Grouping related allocations to improve data
24+
locality and cache utilization.
25+
26+
* **Custom Allocation Policies**: Applying different management strategies to
27+
different partitions.
28+
29+
Token Assignment Mode
30+
=====================
31+
32+
The default mode to calculate tokens is:
33+
34+
* *TypeHashPointerSplit* (mode=3): This mode assigns a token ID based on
35+
the hash of the allocated type's name, where the top half ID-space is
36+
reserved for types that contain pointers and the bottom half for types that
37+
do not contain pointers.
38+
39+
Other token ID assignment modes are supported, but they may be subject to
40+
change or removal. These may (experimentally) be selected with ``-mllvm
41+
-alloc-token-mode=<mode>``:
42+
43+
* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
44+
the allocated type's name.
45+
46+
* *Random* (mode=1): This mode assigns a statically-determined random token ID
47+
to each allocation site.
48+
49+
* *Increment* (mode=0): This mode assigns a simple, incrementally increasing
50+
token ID to each allocation site.
51+
52+
Allocation Token Instrumentation
53+
================================
54+
55+
To enable instrumentation of allocation functions, code can be compiled with
56+
the ``-fsanitize=alloc-token`` flag:
57+
58+
.. code-block:: console
59+
60+
% clang++ -fsanitize=alloc-token example.cc
61+
62+
The instrumentation transforms allocation calls to include a token ID. For
63+
example:
64+
65+
.. code-block:: c
66+
67+
// Original:
68+
ptr = malloc(size);
69+
70+
// Instrumented:
71+
ptr = __alloc_token_malloc(size, token_id);
72+
73+
In addition, it is typically recommended to configure the following:
74+
75+
* ``-falloc-token-max=<N>``
76+
Configures the maximum number of tokens. No max by default (tokens bounded
77+
by ``UINT64_MAX``).
78+
79+
.. code-block:: console
80+
81+
% clang++ -fsanitize=alloc-token -falloc-token-max=512 example.cc
82+
83+
Runtime Interface
84+
-----------------
85+
86+
A compatible runtime must be provided that implements the token-enabled
87+
allocation functions. The instrumentation generates calls to functions that
88+
take a final ``uint64_t token_id`` argument.
89+
90+
.. code-block:: c
91+
92+
// C standard library functions
93+
void *__alloc_token_malloc(size_t size, uint64_t token_id);
94+
void *__alloc_token_calloc(size_t count, size_t size, uint64_t token_id);
95+
void *__alloc_token_realloc(void *ptr, size_t size, uint64_t token_id);
96+
// ...
97+
98+
// C++ operators (mangled names)
99+
// operator new(size_t, uint64_t)
100+
void *__alloc_token_Znwm(size_t size, uint64_t token_id);
101+
// operator new[](size_t, uint64_t)
102+
void *__alloc_token_Znam(size_t size, uint64_t token_id);
103+
// ... other variants like nothrow, etc., are also instrumented.
104+
105+
Fast ABI
106+
--------
107+
108+
An alternative ABI can be enabled with ``-fsanitize-alloc-token-fast-abi``,
109+
which encodes the token ID hint in the allocation function name.
110+
111+
.. code-block:: c
112+
113+
void *__alloc_token_0_malloc(size_t size);
114+
void *__alloc_token_1_malloc(size_t size);
115+
void *__alloc_token_2_malloc(size_t size);
116+
...
117+
void *__alloc_token_0_Znwm(size_t size);
118+
void *__alloc_token_1_Znwm(size_t size);
119+
void *__alloc_token_2_Znwm(size_t size);
120+
...
121+
122+
This ABI provides a more efficient alternative where
123+
``-falloc-token-max`` is small.
124+
125+
Disabling Instrumentation
126+
-------------------------
127+
128+
To exclude specific functions from instrumentation, you can use the
129+
``no_sanitize("alloc-token")`` attribute:
130+
131+
.. code-block:: c
132+
133+
__attribute__((no_sanitize("alloc-token")))
134+
void* custom_allocator(size_t size) {
135+
return malloc(size); // Uses original malloc
136+
}
137+
138+
Note: Independent of any given allocator support, the instrumentation aims to
139+
remain performance neutral. As such, ``no_sanitize("alloc-token")``
140+
functions may be inlined into instrumented functions and vice-versa. If
141+
correctness is affected, such functions should explicitly be marked
142+
``noinline``.
143+
144+
The ``__attribute__((disable_sanitizer_instrumentation))`` is also supported to
145+
disable this and other sanitizer instrumentations.
146+
147+
Suppressions File (Ignorelist)
148+
------------------------------
149+
150+
AllocToken respects the ``src`` and ``fun`` entity types in the
151+
:doc:`SanitizerSpecialCaseList`, which can be used to omit specified source
152+
files or functions from instrumentation.
153+
154+
.. code-block:: bash
155+
156+
# Exclude specific source files
157+
src:third_party/allocator.c
158+
# Exclude function name patterns
159+
fun:*custom_malloc*
160+
fun:LowLevel::*
161+
162+
.. code-block:: console
163+
164+
% clang++ -fsanitize=alloc-token -fsanitize-ignorelist=my_ignorelist.txt example.cc
165+
166+
Conditional Compilation with ``__SANITIZE_ALLOC_TOKEN__``
167+
-----------------------------------------------------------
168+
169+
In some cases, one may need to execute different code depending on whether
170+
AllocToken instrumentation is enabled. The ``__SANITIZE_ALLOC_TOKEN__`` macro
171+
can be used for this purpose.
172+
173+
.. code-block:: c
174+
175+
#ifdef __SANITIZE_ALLOC_TOKEN__
176+
// Code specific to -fsanitize=alloc-token builds
177+
#endif

clang/docs/ReleaseNotes.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,11 +203,15 @@ Non-comprehensive list of changes in this release
203203
Currently, the use of ``__builtin_dedup_pack`` is limited to template arguments and base
204204
specifiers, it also must be used within a template context.
205205

206+
- Introduce support for allocation tokens to enable allocator-level heap
207+
organization strategies. A feature to instrument all allocation functions
208+
with a token ID can be enabled via the ``-fsanitize=alloc-token`` flag.
206209

207210
New Compiler Flags
208211
------------------
209212
- New option ``-fno-sanitize-debug-trap-reasons`` added to disable emitting trap reasons into the debug info when compiling with trapping UBSan (e.g. ``-fsanitize-trap=undefined``).
210213
- New option ``-fsanitize-debug-trap-reasons=`` added to control emitting trap reasons into the debug info when compiling with trapping UBSan (e.g. ``-fsanitize-trap=undefined``).
214+
- New options for enabling allocation token instrumentation: ``-fsanitize=alloc-token``, ``-falloc-token-max=``, ``-fsanitize-alloc-token-fast-abi``, ``-fsanitize-alloc-token-extended``.
211215

212216

213217
Lanai Support

clang/docs/UsersManual.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2194,6 +2194,8 @@ are listed below.
21942194
protection against stack-based memory corruption errors.
21952195
- ``-fsanitize=realtime``: :doc:`RealtimeSanitizer`,
21962196
a real-time safety checker.
2197+
- ``-fsanitize=alloc-token``: :doc:`AllocToken`,
2198+
allocation token instrumentation (requires compatible allocator).
21972199

21982200
There are more fine-grained checks available: see
21992201
the :ref:`list <ubsan-checks>` of specific kinds of

clang/include/clang/Basic/CodeGenOptions.def

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,8 @@ CODEGENOPT(SanitizeBinaryMetadataCovered, 1, 0, Benign) ///< Emit PCs for covere
306306
CODEGENOPT(SanitizeBinaryMetadataAtomics, 1, 0, Benign) ///< Emit PCs for atomic operations.
307307
CODEGENOPT(SanitizeBinaryMetadataUAR, 1, 0, Benign) ///< Emit PCs for start of functions
308308
///< that are subject for use-after-return checking.
309+
CODEGENOPT(SanitizeAllocTokenFastABI, 1, 0, Benign) ///< Use the AllocToken fast ABI.
310+
CODEGENOPT(SanitizeAllocTokenExtended, 1, 0, Benign) ///< Extend coverage to custom allocation functions.
309311
CODEGENOPT(SanitizeStats , 1, 0, Benign) ///< Collect statistics for sanitizers.
310312
ENUM_CODEGENOPT(SanitizeDebugTrapReasons, SanitizeDebugTrapReasonKind, 2, SanitizeDebugTrapReasonKind::Detailed, Benign) ///< Control how "trap reasons" are emitted in debug info
311313
CODEGENOPT(SimplifyLibCalls , 1, 1, Benign) ///< Set when -fbuiltin is enabled.

clang/include/clang/Basic/CodeGenOptions.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,9 @@ class CodeGenOptions : public CodeGenOptionsBase {
447447

448448
std::optional<double> AllowRuntimeCheckSkipHotCutoff;
449449

450+
/// Maximum number of allocation tokens (0 = no max).
451+
std::optional<uint64_t> AllocTokenMax;
452+
450453
/// List of backend command-line options for -fembed-bitcode.
451454
std::vector<uint8_t> CmdArgs;
452455

clang/include/clang/Basic/Sanitizers.def

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,9 @@ SANITIZER_GROUP("bounds", Bounds, ArrayBounds | LocalBounds)
195195
// Scudo hardened allocator
196196
SANITIZER("scudo", Scudo)
197197

198+
// AllocToken
199+
SANITIZER("alloc-token", AllocToken)
200+
198201
// Magic group, containing all sanitizers. For example, "-fno-sanitize=all"
199202
// can be used to disable all the sanitizers.
200203
SANITIZER_GROUP("all", All, ~SanitizerMask())

clang/include/clang/Driver/Options.td

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2730,8 +2730,25 @@ def fsanitize_skip_hot_cutoff_EQ
27302730
"(0.0 [default] = skip none; 1.0 = skip all). "
27312731
"Argument format: <sanitizer1>=<value1>,<sanitizer2>=<value2>,...">;
27322732

2733+
defm sanitize_alloc_token_fast_abi : BoolOption<"f", "sanitize-alloc-token-fast-abi",
2734+
CodeGenOpts<"SanitizeAllocTokenFastABI">, DefaultFalse,
2735+
PosFlag<SetTrue, [], [ClangOption], "Use the AllocToken fast ABI">,
2736+
NegFlag<SetFalse, [], [ClangOption], "Use the default AllocToken ABI">>,
2737+
Group<f_clang_Group>;
2738+
defm sanitize_alloc_token_extended : BoolOption<"f", "sanitize-alloc-token-extended",
2739+
CodeGenOpts<"SanitizeAllocTokenExtended">, DefaultFalse,
2740+
PosFlag<SetTrue, [], [ClangOption], "Enable">,
2741+
NegFlag<SetFalse, [], [ClangOption], "Disable">,
2742+
BothFlags<[], [ClangOption], " extended coverage to custom allocation functions">>,
2743+
Group<f_clang_Group>;
2744+
27332745
} // end -f[no-]sanitize* flags
27342746

2747+
def falloc_token_max_EQ : Joined<["-"], "falloc-token-max=">,
2748+
Group<f_Group>, Visibility<[ClangOption, CC1Option, CLOption]>,
2749+
MetaVarName<"<N>">,
2750+
HelpText<"Limit to maximum N allocation tokens (0 = no max)">;
2751+
27352752
def fallow_runtime_check_skip_hot_cutoff_EQ
27362753
: Joined<["-"], "fallow-runtime-check-skip-hot-cutoff=">,
27372754
Group<f_clang_Group>,

clang/include/clang/Driver/SanitizerArgs.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include "llvm/Option/Arg.h"
1414
#include "llvm/Option/ArgList.h"
1515
#include "llvm/Transforms/Instrumentation/AddressSanitizerOptions.h"
16+
#include <optional>
1617
#include <string>
1718
#include <vector>
1819

@@ -73,8 +74,9 @@ class SanitizerArgs {
7374
bool HwasanUseAliases = false;
7475
llvm::AsanDetectStackUseAfterReturnMode AsanUseAfterReturn =
7576
llvm::AsanDetectStackUseAfterReturnMode::Invalid;
76-
7777
std::string MemtagMode;
78+
bool AllocTokenFastABI = false;
79+
bool AllocTokenExtended = false;
7880

7981
public:
8082
/// Parses the sanitizer arguments from an argument list.

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,13 @@
5959
#include "llvm/TargetParser/Triple.h"
6060
#include "llvm/Transforms/HipStdPar/HipStdPar.h"
6161
#include "llvm/Transforms/IPO/EmbedBitcodePass.h"
62+
#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
6263
#include "llvm/Transforms/IPO/LowerTypeTests.h"
6364
#include "llvm/Transforms/IPO/ThinLTOBitcodeWriter.h"
6465
#include "llvm/Transforms/InstCombine/InstCombine.h"
6566
#include "llvm/Transforms/Instrumentation/AddressSanitizer.h"
6667
#include "llvm/Transforms/Instrumentation/AddressSanitizerOptions.h"
68+
#include "llvm/Transforms/Instrumentation/AllocToken.h"
6769
#include "llvm/Transforms/Instrumentation/BoundsChecking.h"
6870
#include "llvm/Transforms/Instrumentation/DataFlowSanitizer.h"
6971
#include "llvm/Transforms/Instrumentation/GCOVProfiler.h"
@@ -231,6 +233,14 @@ class EmitAssemblyHelper {
231233
};
232234
} // namespace
233235

236+
static AllocTokenOptions getAllocTokenOptions(const CodeGenOptions &CGOpts) {
237+
AllocTokenOptions Opts;
238+
Opts.MaxTokens = CGOpts.AllocTokenMax;
239+
Opts.Extended = CGOpts.SanitizeAllocTokenExtended;
240+
Opts.FastABI = CGOpts.SanitizeAllocTokenFastABI;
241+
return Opts;
242+
}
243+
234244
static SanitizerCoverageOptions
235245
getSancovOptsFromCGOpts(const CodeGenOptions &CGOpts) {
236246
SanitizerCoverageOptions Opts;
@@ -784,6 +794,16 @@ static void addSanitizers(const Triple &TargetTriple,
784794
if (LangOpts.Sanitize.has(SanitizerKind::DataFlow)) {
785795
MPM.addPass(DataFlowSanitizerPass(LangOpts.NoSanitizeFiles));
786796
}
797+
798+
if (LangOpts.Sanitize.has(SanitizerKind::AllocToken)) {
799+
if (Level == OptimizationLevel::O0) {
800+
// The default pass builder only infers libcall function attrs when
801+
// optimizing, so we insert it here because we need it for accurate
802+
// memory allocation function detection.
803+
MPM.addPass(InferFunctionAttrsPass());
804+
}
805+
MPM.addPass(AllocTokenPass(getAllocTokenOptions(CodeGenOpts)));
806+
}
787807
};
788808
if (ClSanitizeOnOptimizerEarlyEP) {
789809
PB.registerOptimizerEarlyEPCallback(

clang/lib/CodeGen/CGExpr.cpp

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1272,6 +1272,76 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
12721272
EmitCheck(std::make_pair(Check, CheckKind), CheckHandler, StaticData, Index);
12731273
}
12741274

1275+
void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
1276+
QualType AllocType) {
1277+
assert(SanOpts.has(SanitizerKind::AllocToken) &&
1278+
"Only needed with -fsanitize=alloc-token");
1279+
1280+
llvm::MDBuilder MDB(getLLVMContext());
1281+
1282+
// Get unique type name.
1283+
PrintingPolicy Policy(CGM.getContext().getLangOpts());
1284+
Policy.SuppressTagKeyword = true;
1285+
Policy.FullyQualifiedName = true;
1286+
std::string TypeName = AllocType.getCanonicalType().getAsString(Policy);
1287+
auto *TypeNameMD = MDB.createString(TypeName);
1288+
1289+
// Check if QualType contains a pointer. Implements a simple DFS to
1290+
// recursively check if a type contains a pointer type.
1291+
llvm::SmallPtrSet<const RecordDecl *, 4> VisitedRD;
1292+
auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
1293+
QualType CanonicalType = T.getCanonicalType();
1294+
if (CanonicalType->isPointerType())
1295+
return true; // base case
1296+
1297+
// Look through typedef chain to check for special types.
1298+
for (QualType CurrentT = T; const auto *TT = CurrentT->getAs<TypedefType>();
1299+
CurrentT = TT->getDecl()->getUnderlyingType()) {
1300+
const IdentifierInfo *II = TT->getDecl()->getIdentifier();
1301+
if (!II)
1302+
continue;
1303+
// Special Case: Syntactically uintptr_t is not a pointer; semantically,
1304+
// however, very likely used as such. Therefore, classify uintptr_t as a
1305+
// pointer, too.
1306+
if (II->isStr("uintptr_t"))
1307+
return true;
1308+
}
1309+
1310+
// The type is an array; check the element type.
1311+
if (const ArrayType *AT = CanonicalType->getAsArrayTypeUnsafe())
1312+
return self(self, AT->getElementType());
1313+
// The type is a struct, class, or union.
1314+
if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
1315+
if (!VisitedRD.insert(RD).second)
1316+
return false; // already visited
1317+
// Check all fields.
1318+
for (const FieldDecl *Field : RD->fields()) {
1319+
if (self(self, Field->getType()))
1320+
return true;
1321+
}
1322+
// For C++ classes, also check base classes.
1323+
if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {
1324+
// Polymorphic types require a vptr.
1325+
if (CXXRD->isPolymorphic())
1326+
return true;
1327+
for (const CXXBaseSpecifier &Base : CXXRD->bases()) {
1328+
if (self(self, Base.getType()))
1329+
return true;
1330+
}
1331+
}
1332+
}
1333+
return false;
1334+
};
1335+
const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
1336+
auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
1337+
auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
1338+
1339+
// Format: !{<type-name>, <contains-pointer>}
1340+
auto *MDN =
1341+
llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
1342+
CB->setMetadata("alloc_token_hint", MDN);
1343+
}
1344+
12751345
CodeGenFunction::ComplexPairTy CodeGenFunction::
12761346
EmitComplexPrePostIncDec(const UnaryOperator *E, LValue LV,
12771347
bool isInc, bool isPre) {

0 commit comments

Comments
 (0)