-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AllocToken, Clang] Implement TypeHashPointerSplit mode #156840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/melver/spr/main.alloctoken-clang-implement-typehashpointersplit-mode
Are you sure you want to change the base?
Conversation
Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1
Created using spr 1.3.8-beta.1
@llvm/pr-subscribers-llvm-transforms Author: Marco Elver (melver) ChangesImplement the TypeHashPointerSplit mode: This mode assigns a token ID This mode with max tokens of 2 ( Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 This change is part of the following series:
Full diff: https://github.com/llvm/llvm-project/pull/156840.diff 7 Files Affected:
diff --git a/clang/docs/AllocToken.rst b/clang/docs/AllocToken.rst
index a7bb8877f371b..fb354d6738ea3 100644
--- a/clang/docs/AllocToken.rst
+++ b/clang/docs/AllocToken.rst
@@ -31,13 +31,18 @@ Token Assignment Mode
The default mode to calculate tokens is:
-* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
- the allocated type's name.
+* *TypeHashPointerSplit* (mode=3): This mode assigns a token ID based on
+ the hash of the allocated type's name, where the top half ID-space is
+ reserved for types that contain pointers and the bottom half for types that
+ do not contain pointers.
Other token ID assignment modes are supported, but they may be subject to
change or removal. These may (experimentally) be selected with ``-mllvm
-alloc-token-mode=<mode>``:
+* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
+ the allocated type's name.
+
* *Random* (mode=1): This mode assigns a statically-determined random token ID
to each allocation site.
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 14e4d648428ef..e7a0e7696e204 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1277,14 +1277,75 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
+ llvm::MDBuilder MDB(getLLVMContext());
+
+ // Get unique type name.
PrintingPolicy Policy(CGM.getContext().getLangOpts());
Policy.SuppressTagKeyword = true;
Policy.FullyQualifiedName = true;
std::string TypeName = AllocType.getCanonicalType().getAsString(Policy);
- auto *TypeMDS = llvm::MDString::get(CGM.getLLVMContext(), TypeName);
+ auto *TypeNameMD = MDB.createString(TypeName);
+
+ // Check if QualType contains a pointer. Implements a simple DFS to
+ // recursively check if a type contains a pointer type.
+ llvm::SmallPtrSet<const RecordDecl *, 4> VisitedRD;
+ bool IncompleteType = false;
+ auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
+ QualType CanonicalType = T.getCanonicalType();
+ if (CanonicalType->isPointerType())
+ return true; // base case
+
+ // Look through typedef chain to check for special types.
+ for (QualType CurrentT = T; const auto *TT = CurrentT->getAs<TypedefType>();
+ CurrentT = TT->getDecl()->getUnderlyingType()) {
+ const IdentifierInfo *II = TT->getDecl()->getIdentifier();
+ if (!II)
+ continue;
+ // Special Case: Syntactically uintptr_t is not a pointer; semantically,
+ // however, very likely used as such. Therefore, classify uintptr_t as a
+ // pointer, too.
+ if (II->isStr("uintptr_t"))
+ return true;
+ }
+
+ // The type is an array; check the element type.
+ if (const ArrayType *AT = CanonicalType->getAsArrayTypeUnsafe())
+ return self(self, AT->getElementType());
+ // The type is a struct, class, or union.
+ if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
+ if (!RD->isCompleteDefinition()) {
+ IncompleteType = true;
+ return false;
+ }
+ if (!VisitedRD.insert(RD).second)
+ return false; // already visited
+ // Check all fields.
+ for (const FieldDecl *Field : RD->fields()) {
+ if (self(self, Field->getType()))
+ return true;
+ }
+ // For C++ classes, also check base classes.
+ if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {
+ // Polymorphic types require a vptr.
+ if (CXXRD->isPolymorphic())
+ return true;
+ for (const CXXBaseSpecifier &Base : CXXRD->bases()) {
+ if (self(self, Base.getType()))
+ return true;
+ }
+ }
+ }
+ return false;
+ };
+ const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
+ if (!ContainsPtr && IncompleteType)
+ return;
+ auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
+ auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
- // Format: !{<type-name>}
- auto *MDN = llvm::MDNode::get(CGM.getLLVMContext(), {TypeMDS});
+ // Format: !{<type-name>, <contains-pointer>}
+ auto *MDN =
+ llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
}
diff --git a/clang/test/CodeGenCXX/alloc-token-pointer.cpp b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
new file mode 100644
index 0000000000000..7adb75c7afebb
--- /dev/null
+++ b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
@@ -0,0 +1,177 @@
+// Check -fsanitize=alloc-token TypeHashPointerSplit mode with only 2
+// tokens so we effectively only test the contains-pointer logic.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -O -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+
+#include "../Analysis/Inputs/system-header-simulator-cxx.h"
+
+typedef __UINTPTR_TYPE__ uintptr_t;
+
+extern "C" {
+void *malloc(size_t size);
+}
+
+// CHECK-LABEL: @_Z15test_malloc_intv(
+void *test_malloc_int() {
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
+ int *a = (int *)malloc(sizeof(int));
+ *a = 42;
+ return a;
+}
+
+// CHECK-LABEL: @_Z15test_malloc_ptrv(
+int **test_malloc_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 8, i64 0)
+ int **a = (int **)malloc(sizeof(int*));
+ *a = nullptr;
+ return a;
+}
+
+// CHECK-LABEL: @_Z12test_new_intv(
+int *test_new_int() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 4, i64 0){{.*}} !alloc_token_hint
+ return new int;
+}
+
+// CHECK-LABEL: @_Z20test_new_ulong_arrayv(
+unsigned long *test_new_ulong_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 0){{.*}} !alloc_token_hint
+ return new unsigned long[10];
+}
+
+// CHECK-LABEL: @_Z12test_new_ptrv(
+int **test_new_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1){{.*}} !alloc_token_hint
+ return new int*;
+}
+
+// CHECK-LABEL: @_Z18test_new_ptr_arrayv(
+int **test_new_ptr_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 1){{.*}} !alloc_token_hint
+ return new int*[10];
+}
+
+struct ContainsPtr {
+ int a;
+ char *buf;
+};
+
+// CHECK-LABEL: @_Z27test_malloc_struct_with_ptrv(
+ContainsPtr *test_malloc_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_malloc_struct_array_with_ptrv(
+ContainsPtr *test_malloc_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z32test_operatornew_struct_with_ptrv(
+ContainsPtr *test_operatornew_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z38test_operatornew_struct_array_with_ptrv(
+ContainsPtr *test_operatornew_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_operatornew_struct_with_ptr2v(
+ContainsPtr *test_operatornew_struct_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z39test_operatornew_struct_array_with_ptr2v(
+ContainsPtr *test_operatornew_struct_array_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z24test_new_struct_with_ptrv(
+ContainsPtr *test_new_struct_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr;
+}
+
+// CHECK-LABEL: @_Z30test_new_struct_array_with_ptrv(
+ContainsPtr *test_new_struct_array_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 160, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr[10];
+}
+
+class TestClass {
+public:
+ void Foo();
+ ~TestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z14test_new_classv(
+TestClass *test_new_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 64, i64 0){{.*}} !alloc_token_hint
+ return new TestClass();
+}
+
+// CHECK-LABEL: @_Z20test_new_class_arrayv(
+TestClass *test_new_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 648, i64 0){{.*}} !alloc_token_hint
+ return new TestClass[10];
+}
+
+// Test that we detect that virtual classes have implicit vtable pointer.
+class VirtualTestClass {
+public:
+ virtual void Foo();
+ virtual ~VirtualTestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z22test_new_virtual_classv(
+VirtualTestClass *test_new_virtual_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 72, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass();
+}
+
+// CHECK-LABEL: @_Z28test_new_virtual_class_arrayv(
+VirtualTestClass *test_new_virtual_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 728, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass[10];
+}
+
+// uintptr_t is treated as a pointer.
+struct MyStructUintptr {
+ int a;
+ uintptr_t ptr;
+};
+
+// CHECK-LABEL: @_Z18test_uintptr_isptrv(
+MyStructUintptr *test_uintptr_isptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1)
+ return new MyStructUintptr;
+}
+
+using uptr = uintptr_t;
+// CHECK-LABEL: @_Z19test_uintptr_isptr2v(
+uptr *test_uintptr_isptr2() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1)
+ return new uptr;
+}
diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
index 4ea68470ca684..74cda227d50a7 100644
--- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
@@ -70,6 +70,11 @@ enum class TokenMode : unsigned {
/// Token ID based on allocated type hash.
TypeHash = 2,
+ /// Token ID based on allocated type hash, where the top half ID-space is
+ /// reserved for types that contain pointers and the bottom half for types
+ /// that do not contain pointers.
+ TypeHashPointerSplit = 3,
+
// Mode count - keep last
ModeCount
};
@@ -89,7 +94,7 @@ struct ModeParser : public cl::parser<unsigned> {
cl::opt<unsigned, false, ModeParser>
ClMode("alloc-token-mode", cl::desc("Token assignment mode"), cl::Hidden,
- cl::init(static_cast<unsigned>(TokenMode::TypeHash)));
+ cl::init(static_cast<unsigned>(TokenMode::TypeHashPointerSplit)));
cl::opt<std::string> ClFuncPrefix("alloc-token-prefix",
cl::desc("The allocation function prefix"),
@@ -142,16 +147,23 @@ STATISTIC(NumAllocations, "Allocations found");
/// Returns the !alloc_token_hint metadata if available.
///
-/// Expected format is: !{<type-name>}
+/// Expected format is: !{<type-name>, <contains-pointer>}
MDNode *getAllocTokenHintMetadata(const CallBase &CB) {
MDNode *Ret = CB.getMetadata(LLVMContext::MD_alloc_token_hint);
if (!Ret)
return nullptr;
- assert(Ret->getNumOperands() == 1 && "bad !alloc_token_hint");
+ assert(Ret->getNumOperands() == 2 && "bad !alloc_token_hint");
assert(isa<MDString>(Ret->getOperand(0)));
+ assert(isa<ConstantAsMetadata>(Ret->getOperand(1)));
return Ret;
}
+bool containsPointer(const MDNode *MD) {
+ ConstantAsMetadata *C = cast<ConstantAsMetadata>(MD->getOperand(1));
+ auto *CI = cast<ConstantInt>(C->getValue());
+ return CI->getValue().getBoolValue();
+}
+
class ModeBase {
public:
explicit ModeBase(uint64_t MaxTokens) : MaxTokens(MaxTokens) {}
@@ -198,12 +210,20 @@ class TypeHashMode : public ModeBase {
using ModeBase::ModeBase;
uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ const auto [N, H] = getHash(CB, ORE);
+ return N ? boundedToken(H) : H;
+ }
+
+protected:
+ std::pair<MDNode *, uint64_t> getHash(const CallBase &CB,
+ OptimizationRemarkEmitter &ORE) {
if (MDNode *N = getAllocTokenHintMetadata(CB)) {
MDString *S = cast<MDString>(N->getOperand(0));
- return boundedToken(xxHash64(S->getString()));
+ return {N, xxHash64(S->getString())};
}
+ // Fallback.
remarkNoHint(CB, ORE);
- return ClFallbackToken;
+ return {nullptr, ClFallbackToken};
}
/// Remark that there was no precise type information.
@@ -219,6 +239,26 @@ class TypeHashMode : public ModeBase {
}
};
+/// Implementation for TokenMode::TypeHashPointerSplit.
+class TypeHashPointerSplitMode : public TypeHashMode {
+public:
+ using TypeHashMode::TypeHashMode;
+
+ uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ if (MaxTokens == 1)
+ return 0;
+ const uint64_t HalfTokens =
+ (MaxTokens ? MaxTokens : std::numeric_limits<uint64_t>::max()) / 2;
+ const auto [N, H] = getHash(CB, ORE);
+ if (!N)
+ return H; // fallback token
+ uint64_t Hash = H % HalfTokens; // base hash
+ if (containsPointer(N))
+ Hash += HalfTokens;
+ return Hash;
+ }
+};
+
// Apply opt overrides.
AllocTokenOptions &&transformOptionsFromCl(AllocTokenOptions &&Opts) {
if (!Opts.MaxTokens.has_value())
@@ -244,6 +284,9 @@ class AllocToken {
case TokenMode::TypeHash:
Mode.emplace<TypeHashMode>(*Options.MaxTokens);
break;
+ case TokenMode::TypeHashPointerSplit:
+ Mode.emplace<TypeHashPointerSplitMode>(*Options.MaxTokens);
+ break;
case TokenMode::ModeCount:
llvm_unreachable("");
break;
@@ -281,7 +324,9 @@ class AllocToken {
// Cache for replacement functions.
DenseMap<std::pair<LibFunc, uint64_t>, FunctionCallee> TokenAllocFunctions;
// Selected mode.
- std::variant<IncrementMode, RandomMode, TypeHashMode> Mode;
+ std::variant<IncrementMode, RandomMode, TypeHashMode,
+ TypeHashPointerSplitMode>
+ Mode;
};
bool AllocToken::instrumentFunction(Function &F) {
diff --git a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
index 37c6d6463ffd2..94fc12b20caff 100644
--- a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
+++ b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
@@ -29,4 +29,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
index be99e5ef14b16..b933c0bd352e9 100644
--- a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
+++ b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
@@ -60,4 +60,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/remark.ll b/llvm/test/Instrumentation/AllocToken/remark.ll
index d5ecfc41bace5..d2bc273060403 100644
--- a/llvm/test/Instrumentation/AllocToken/remark.ll
+++ b/llvm/test/Instrumentation/AllocToken/remark.ll
@@ -24,4 +24,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
|
@llvm/pr-subscribers-clang-codegen Author: Marco Elver (melver) ChangesImplement the TypeHashPointerSplit mode: This mode assigns a token ID This mode with max tokens of 2 ( Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 This change is part of the following series:
Full diff: https://github.com/llvm/llvm-project/pull/156840.diff 7 Files Affected:
diff --git a/clang/docs/AllocToken.rst b/clang/docs/AllocToken.rst
index a7bb8877f371b..fb354d6738ea3 100644
--- a/clang/docs/AllocToken.rst
+++ b/clang/docs/AllocToken.rst
@@ -31,13 +31,18 @@ Token Assignment Mode
The default mode to calculate tokens is:
-* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
- the allocated type's name.
+* *TypeHashPointerSplit* (mode=3): This mode assigns a token ID based on
+ the hash of the allocated type's name, where the top half ID-space is
+ reserved for types that contain pointers and the bottom half for types that
+ do not contain pointers.
Other token ID assignment modes are supported, but they may be subject to
change or removal. These may (experimentally) be selected with ``-mllvm
-alloc-token-mode=<mode>``:
+* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
+ the allocated type's name.
+
* *Random* (mode=1): This mode assigns a statically-determined random token ID
to each allocation site.
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 14e4d648428ef..e7a0e7696e204 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1277,14 +1277,75 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
+ llvm::MDBuilder MDB(getLLVMContext());
+
+ // Get unique type name.
PrintingPolicy Policy(CGM.getContext().getLangOpts());
Policy.SuppressTagKeyword = true;
Policy.FullyQualifiedName = true;
std::string TypeName = AllocType.getCanonicalType().getAsString(Policy);
- auto *TypeMDS = llvm::MDString::get(CGM.getLLVMContext(), TypeName);
+ auto *TypeNameMD = MDB.createString(TypeName);
+
+ // Check if QualType contains a pointer. Implements a simple DFS to
+ // recursively check if a type contains a pointer type.
+ llvm::SmallPtrSet<const RecordDecl *, 4> VisitedRD;
+ bool IncompleteType = false;
+ auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
+ QualType CanonicalType = T.getCanonicalType();
+ if (CanonicalType->isPointerType())
+ return true; // base case
+
+ // Look through typedef chain to check for special types.
+ for (QualType CurrentT = T; const auto *TT = CurrentT->getAs<TypedefType>();
+ CurrentT = TT->getDecl()->getUnderlyingType()) {
+ const IdentifierInfo *II = TT->getDecl()->getIdentifier();
+ if (!II)
+ continue;
+ // Special Case: Syntactically uintptr_t is not a pointer; semantically,
+ // however, very likely used as such. Therefore, classify uintptr_t as a
+ // pointer, too.
+ if (II->isStr("uintptr_t"))
+ return true;
+ }
+
+ // The type is an array; check the element type.
+ if (const ArrayType *AT = CanonicalType->getAsArrayTypeUnsafe())
+ return self(self, AT->getElementType());
+ // The type is a struct, class, or union.
+ if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
+ if (!RD->isCompleteDefinition()) {
+ IncompleteType = true;
+ return false;
+ }
+ if (!VisitedRD.insert(RD).second)
+ return false; // already visited
+ // Check all fields.
+ for (const FieldDecl *Field : RD->fields()) {
+ if (self(self, Field->getType()))
+ return true;
+ }
+ // For C++ classes, also check base classes.
+ if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {
+ // Polymorphic types require a vptr.
+ if (CXXRD->isPolymorphic())
+ return true;
+ for (const CXXBaseSpecifier &Base : CXXRD->bases()) {
+ if (self(self, Base.getType()))
+ return true;
+ }
+ }
+ }
+ return false;
+ };
+ const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
+ if (!ContainsPtr && IncompleteType)
+ return;
+ auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
+ auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
- // Format: !{<type-name>}
- auto *MDN = llvm::MDNode::get(CGM.getLLVMContext(), {TypeMDS});
+ // Format: !{<type-name>, <contains-pointer>}
+ auto *MDN =
+ llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
}
diff --git a/clang/test/CodeGenCXX/alloc-token-pointer.cpp b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
new file mode 100644
index 0000000000000..7adb75c7afebb
--- /dev/null
+++ b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
@@ -0,0 +1,177 @@
+// Check -fsanitize=alloc-token TypeHashPointerSplit mode with only 2
+// tokens so we effectively only test the contains-pointer logic.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -O -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+
+#include "../Analysis/Inputs/system-header-simulator-cxx.h"
+
+typedef __UINTPTR_TYPE__ uintptr_t;
+
+extern "C" {
+void *malloc(size_t size);
+}
+
+// CHECK-LABEL: @_Z15test_malloc_intv(
+void *test_malloc_int() {
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
+ int *a = (int *)malloc(sizeof(int));
+ *a = 42;
+ return a;
+}
+
+// CHECK-LABEL: @_Z15test_malloc_ptrv(
+int **test_malloc_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 8, i64 0)
+ int **a = (int **)malloc(sizeof(int*));
+ *a = nullptr;
+ return a;
+}
+
+// CHECK-LABEL: @_Z12test_new_intv(
+int *test_new_int() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 4, i64 0){{.*}} !alloc_token_hint
+ return new int;
+}
+
+// CHECK-LABEL: @_Z20test_new_ulong_arrayv(
+unsigned long *test_new_ulong_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 0){{.*}} !alloc_token_hint
+ return new unsigned long[10];
+}
+
+// CHECK-LABEL: @_Z12test_new_ptrv(
+int **test_new_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1){{.*}} !alloc_token_hint
+ return new int*;
+}
+
+// CHECK-LABEL: @_Z18test_new_ptr_arrayv(
+int **test_new_ptr_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 1){{.*}} !alloc_token_hint
+ return new int*[10];
+}
+
+struct ContainsPtr {
+ int a;
+ char *buf;
+};
+
+// CHECK-LABEL: @_Z27test_malloc_struct_with_ptrv(
+ContainsPtr *test_malloc_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_malloc_struct_array_with_ptrv(
+ContainsPtr *test_malloc_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z32test_operatornew_struct_with_ptrv(
+ContainsPtr *test_operatornew_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z38test_operatornew_struct_array_with_ptrv(
+ContainsPtr *test_operatornew_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_operatornew_struct_with_ptr2v(
+ContainsPtr *test_operatornew_struct_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z39test_operatornew_struct_array_with_ptr2v(
+ContainsPtr *test_operatornew_struct_array_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z24test_new_struct_with_ptrv(
+ContainsPtr *test_new_struct_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr;
+}
+
+// CHECK-LABEL: @_Z30test_new_struct_array_with_ptrv(
+ContainsPtr *test_new_struct_array_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 160, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr[10];
+}
+
+class TestClass {
+public:
+ void Foo();
+ ~TestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z14test_new_classv(
+TestClass *test_new_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 64, i64 0){{.*}} !alloc_token_hint
+ return new TestClass();
+}
+
+// CHECK-LABEL: @_Z20test_new_class_arrayv(
+TestClass *test_new_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 648, i64 0){{.*}} !alloc_token_hint
+ return new TestClass[10];
+}
+
+// Test that we detect that virtual classes have implicit vtable pointer.
+class VirtualTestClass {
+public:
+ virtual void Foo();
+ virtual ~VirtualTestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z22test_new_virtual_classv(
+VirtualTestClass *test_new_virtual_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 72, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass();
+}
+
+// CHECK-LABEL: @_Z28test_new_virtual_class_arrayv(
+VirtualTestClass *test_new_virtual_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 728, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass[10];
+}
+
+// uintptr_t is treated as a pointer.
+struct MyStructUintptr {
+ int a;
+ uintptr_t ptr;
+};
+
+// CHECK-LABEL: @_Z18test_uintptr_isptrv(
+MyStructUintptr *test_uintptr_isptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1)
+ return new MyStructUintptr;
+}
+
+using uptr = uintptr_t;
+// CHECK-LABEL: @_Z19test_uintptr_isptr2v(
+uptr *test_uintptr_isptr2() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1)
+ return new uptr;
+}
diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
index 4ea68470ca684..74cda227d50a7 100644
--- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
@@ -70,6 +70,11 @@ enum class TokenMode : unsigned {
/// Token ID based on allocated type hash.
TypeHash = 2,
+ /// Token ID based on allocated type hash, where the top half ID-space is
+ /// reserved for types that contain pointers and the bottom half for types
+ /// that do not contain pointers.
+ TypeHashPointerSplit = 3,
+
// Mode count - keep last
ModeCount
};
@@ -89,7 +94,7 @@ struct ModeParser : public cl::parser<unsigned> {
cl::opt<unsigned, false, ModeParser>
ClMode("alloc-token-mode", cl::desc("Token assignment mode"), cl::Hidden,
- cl::init(static_cast<unsigned>(TokenMode::TypeHash)));
+ cl::init(static_cast<unsigned>(TokenMode::TypeHashPointerSplit)));
cl::opt<std::string> ClFuncPrefix("alloc-token-prefix",
cl::desc("The allocation function prefix"),
@@ -142,16 +147,23 @@ STATISTIC(NumAllocations, "Allocations found");
/// Returns the !alloc_token_hint metadata if available.
///
-/// Expected format is: !{<type-name>}
+/// Expected format is: !{<type-name>, <contains-pointer>}
MDNode *getAllocTokenHintMetadata(const CallBase &CB) {
MDNode *Ret = CB.getMetadata(LLVMContext::MD_alloc_token_hint);
if (!Ret)
return nullptr;
- assert(Ret->getNumOperands() == 1 && "bad !alloc_token_hint");
+ assert(Ret->getNumOperands() == 2 && "bad !alloc_token_hint");
assert(isa<MDString>(Ret->getOperand(0)));
+ assert(isa<ConstantAsMetadata>(Ret->getOperand(1)));
return Ret;
}
+bool containsPointer(const MDNode *MD) {
+ ConstantAsMetadata *C = cast<ConstantAsMetadata>(MD->getOperand(1));
+ auto *CI = cast<ConstantInt>(C->getValue());
+ return CI->getValue().getBoolValue();
+}
+
class ModeBase {
public:
explicit ModeBase(uint64_t MaxTokens) : MaxTokens(MaxTokens) {}
@@ -198,12 +210,20 @@ class TypeHashMode : public ModeBase {
using ModeBase::ModeBase;
uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ const auto [N, H] = getHash(CB, ORE);
+ return N ? boundedToken(H) : H;
+ }
+
+protected:
+ std::pair<MDNode *, uint64_t> getHash(const CallBase &CB,
+ OptimizationRemarkEmitter &ORE) {
if (MDNode *N = getAllocTokenHintMetadata(CB)) {
MDString *S = cast<MDString>(N->getOperand(0));
- return boundedToken(xxHash64(S->getString()));
+ return {N, xxHash64(S->getString())};
}
+ // Fallback.
remarkNoHint(CB, ORE);
- return ClFallbackToken;
+ return {nullptr, ClFallbackToken};
}
/// Remark that there was no precise type information.
@@ -219,6 +239,26 @@ class TypeHashMode : public ModeBase {
}
};
+/// Implementation for TokenMode::TypeHashPointerSplit.
+class TypeHashPointerSplitMode : public TypeHashMode {
+public:
+ using TypeHashMode::TypeHashMode;
+
+ uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ if (MaxTokens == 1)
+ return 0;
+ const uint64_t HalfTokens =
+ (MaxTokens ? MaxTokens : std::numeric_limits<uint64_t>::max()) / 2;
+ const auto [N, H] = getHash(CB, ORE);
+ if (!N)
+ return H; // fallback token
+ uint64_t Hash = H % HalfTokens; // base hash
+ if (containsPointer(N))
+ Hash += HalfTokens;
+ return Hash;
+ }
+};
+
// Apply opt overrides.
AllocTokenOptions &&transformOptionsFromCl(AllocTokenOptions &&Opts) {
if (!Opts.MaxTokens.has_value())
@@ -244,6 +284,9 @@ class AllocToken {
case TokenMode::TypeHash:
Mode.emplace<TypeHashMode>(*Options.MaxTokens);
break;
+ case TokenMode::TypeHashPointerSplit:
+ Mode.emplace<TypeHashPointerSplitMode>(*Options.MaxTokens);
+ break;
case TokenMode::ModeCount:
llvm_unreachable("");
break;
@@ -281,7 +324,9 @@ class AllocToken {
// Cache for replacement functions.
DenseMap<std::pair<LibFunc, uint64_t>, FunctionCallee> TokenAllocFunctions;
// Selected mode.
- std::variant<IncrementMode, RandomMode, TypeHashMode> Mode;
+ std::variant<IncrementMode, RandomMode, TypeHashMode,
+ TypeHashPointerSplitMode>
+ Mode;
};
bool AllocToken::instrumentFunction(Function &F) {
diff --git a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
index 37c6d6463ffd2..94fc12b20caff 100644
--- a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
+++ b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
@@ -29,4 +29,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
index be99e5ef14b16..b933c0bd352e9 100644
--- a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
+++ b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
@@ -60,4 +60,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/remark.ll b/llvm/test/Instrumentation/AllocToken/remark.ll
index d5ecfc41bace5..d2bc273060403 100644
--- a/llvm/test/Instrumentation/AllocToken/remark.ll
+++ b/llvm/test/Instrumentation/AllocToken/remark.ll
@@ -24,4 +24,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
|
@llvm/pr-subscribers-clang Author: Marco Elver (melver) ChangesImplement the TypeHashPointerSplit mode: This mode assigns a token ID This mode with max tokens of 2 ( Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 This change is part of the following series:
Full diff: https://github.com/llvm/llvm-project/pull/156840.diff 7 Files Affected:
diff --git a/clang/docs/AllocToken.rst b/clang/docs/AllocToken.rst
index a7bb8877f371b..fb354d6738ea3 100644
--- a/clang/docs/AllocToken.rst
+++ b/clang/docs/AllocToken.rst
@@ -31,13 +31,18 @@ Token Assignment Mode
The default mode to calculate tokens is:
-* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
- the allocated type's name.
+* *TypeHashPointerSplit* (mode=3): This mode assigns a token ID based on
+ the hash of the allocated type's name, where the top half ID-space is
+ reserved for types that contain pointers and the bottom half for types that
+ do not contain pointers.
Other token ID assignment modes are supported, but they may be subject to
change or removal. These may (experimentally) be selected with ``-mllvm
-alloc-token-mode=<mode>``:
+* *TypeHash* (mode=2): This mode assigns a token ID based on the hash of
+ the allocated type's name.
+
* *Random* (mode=1): This mode assigns a statically-determined random token ID
to each allocation site.
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 14e4d648428ef..e7a0e7696e204 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1277,14 +1277,75 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB,
assert(SanOpts.has(SanitizerKind::AllocToken) &&
"Only needed with -fsanitize=alloc-token");
+ llvm::MDBuilder MDB(getLLVMContext());
+
+ // Get unique type name.
PrintingPolicy Policy(CGM.getContext().getLangOpts());
Policy.SuppressTagKeyword = true;
Policy.FullyQualifiedName = true;
std::string TypeName = AllocType.getCanonicalType().getAsString(Policy);
- auto *TypeMDS = llvm::MDString::get(CGM.getLLVMContext(), TypeName);
+ auto *TypeNameMD = MDB.createString(TypeName);
+
+ // Check if QualType contains a pointer. Implements a simple DFS to
+ // recursively check if a type contains a pointer type.
+ llvm::SmallPtrSet<const RecordDecl *, 4> VisitedRD;
+ bool IncompleteType = false;
+ auto TypeContainsPtr = [&](auto &&self, QualType T) -> bool {
+ QualType CanonicalType = T.getCanonicalType();
+ if (CanonicalType->isPointerType())
+ return true; // base case
+
+ // Look through typedef chain to check for special types.
+ for (QualType CurrentT = T; const auto *TT = CurrentT->getAs<TypedefType>();
+ CurrentT = TT->getDecl()->getUnderlyingType()) {
+ const IdentifierInfo *II = TT->getDecl()->getIdentifier();
+ if (!II)
+ continue;
+ // Special Case: Syntactically uintptr_t is not a pointer; semantically,
+ // however, very likely used as such. Therefore, classify uintptr_t as a
+ // pointer, too.
+ if (II->isStr("uintptr_t"))
+ return true;
+ }
+
+ // The type is an array; check the element type.
+ if (const ArrayType *AT = CanonicalType->getAsArrayTypeUnsafe())
+ return self(self, AT->getElementType());
+ // The type is a struct, class, or union.
+ if (const RecordDecl *RD = CanonicalType->getAsRecordDecl()) {
+ if (!RD->isCompleteDefinition()) {
+ IncompleteType = true;
+ return false;
+ }
+ if (!VisitedRD.insert(RD).second)
+ return false; // already visited
+ // Check all fields.
+ for (const FieldDecl *Field : RD->fields()) {
+ if (self(self, Field->getType()))
+ return true;
+ }
+ // For C++ classes, also check base classes.
+ if (const CXXRecordDecl *CXXRD = dyn_cast<CXXRecordDecl>(RD)) {
+ // Polymorphic types require a vptr.
+ if (CXXRD->isPolymorphic())
+ return true;
+ for (const CXXBaseSpecifier &Base : CXXRD->bases()) {
+ if (self(self, Base.getType()))
+ return true;
+ }
+ }
+ }
+ return false;
+ };
+ const bool ContainsPtr = TypeContainsPtr(TypeContainsPtr, AllocType);
+ if (!ContainsPtr && IncompleteType)
+ return;
+ auto *ContainsPtrC = Builder.getInt1(ContainsPtr);
+ auto *ContainsPtrMD = MDB.createConstant(ContainsPtrC);
- // Format: !{<type-name>}
- auto *MDN = llvm::MDNode::get(CGM.getLLVMContext(), {TypeMDS});
+ // Format: !{<type-name>, <contains-pointer>}
+ auto *MDN =
+ llvm::MDNode::get(CGM.getLLVMContext(), {TypeNameMD, ContainsPtrMD});
CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
}
diff --git a/clang/test/CodeGenCXX/alloc-token-pointer.cpp b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
new file mode 100644
index 0000000000000..7adb75c7afebb
--- /dev/null
+++ b/clang/test/CodeGenCXX/alloc-token-pointer.cpp
@@ -0,0 +1,177 @@
+// Check -fsanitize=alloc-token TypeHashPointerSplit mode with only 2
+// tokens so we effectively only test the contains-pointer logic.
+//
+// RUN: %clang_cc1 -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -O -fsanitize=alloc-token -falloc-token-max=2 -triple x86_64-linux-gnu -std=c++20 -emit-llvm %s -o - | FileCheck %s
+
+#include "../Analysis/Inputs/system-header-simulator-cxx.h"
+
+typedef __UINTPTR_TYPE__ uintptr_t;
+
+extern "C" {
+void *malloc(size_t size);
+}
+
+// CHECK-LABEL: @_Z15test_malloc_intv(
+void *test_malloc_int() {
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 4, i64 0)
+ int *a = (int *)malloc(sizeof(int));
+ *a = 42;
+ return a;
+}
+
+// CHECK-LABEL: @_Z15test_malloc_ptrv(
+int **test_malloc_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 8, i64 0)
+ int **a = (int **)malloc(sizeof(int*));
+ *a = nullptr;
+ return a;
+}
+
+// CHECK-LABEL: @_Z12test_new_intv(
+int *test_new_int() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 4, i64 0){{.*}} !alloc_token_hint
+ return new int;
+}
+
+// CHECK-LABEL: @_Z20test_new_ulong_arrayv(
+unsigned long *test_new_ulong_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 0){{.*}} !alloc_token_hint
+ return new unsigned long[10];
+}
+
+// CHECK-LABEL: @_Z12test_new_ptrv(
+int **test_new_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1){{.*}} !alloc_token_hint
+ return new int*;
+}
+
+// CHECK-LABEL: @_Z18test_new_ptr_arrayv(
+int **test_new_ptr_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 80, i64 1){{.*}} !alloc_token_hint
+ return new int*[10];
+}
+
+struct ContainsPtr {
+ int a;
+ char *buf;
+};
+
+// CHECK-LABEL: @_Z27test_malloc_struct_with_ptrv(
+ContainsPtr *test_malloc_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_malloc_struct_array_with_ptrv(
+ContainsPtr *test_malloc_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call{{.*}} ptr @__alloc_token_malloc(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)malloc(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z32test_operatornew_struct_with_ptrv(
+ContainsPtr *test_operatornew_struct_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z38test_operatornew_struct_array_with_ptrv(
+ContainsPtr *test_operatornew_struct_array_with_ptr() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(ContainsPtr));
+ return c;
+}
+
+// CHECK-LABEL: @_Z33test_operatornew_struct_with_ptr2v(
+ContainsPtr *test_operatornew_struct_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z39test_operatornew_struct_array_with_ptr2v(
+ContainsPtr *test_operatornew_struct_array_with_ptr2() {
+ // FIXME: This should not be token ID 0!
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 160, i64 0)
+ ContainsPtr *c = (ContainsPtr *)__builtin_operator_new(10 * sizeof(*c));
+ return c;
+}
+
+// CHECK-LABEL: @_Z24test_new_struct_with_ptrv(
+ContainsPtr *test_new_struct_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr;
+}
+
+// CHECK-LABEL: @_Z30test_new_struct_array_with_ptrv(
+ContainsPtr *test_new_struct_array_with_ptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 160, i64 1){{.*}} !alloc_token_hint
+ return new ContainsPtr[10];
+}
+
+class TestClass {
+public:
+ void Foo();
+ ~TestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z14test_new_classv(
+TestClass *test_new_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 64, i64 0){{.*}} !alloc_token_hint
+ return new TestClass();
+}
+
+// CHECK-LABEL: @_Z20test_new_class_arrayv(
+TestClass *test_new_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 648, i64 0){{.*}} !alloc_token_hint
+ return new TestClass[10];
+}
+
+// Test that we detect that virtual classes have implicit vtable pointer.
+class VirtualTestClass {
+public:
+ virtual void Foo();
+ virtual ~VirtualTestClass();
+ int data[16];
+};
+
+// CHECK-LABEL: @_Z22test_new_virtual_classv(
+VirtualTestClass *test_new_virtual_class() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 72, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass();
+}
+
+// CHECK-LABEL: @_Z28test_new_virtual_class_arrayv(
+VirtualTestClass *test_new_virtual_class_array() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znam(i64 noundef 728, i64 1){{.*}} !alloc_token_hint
+ return new VirtualTestClass[10];
+}
+
+// uintptr_t is treated as a pointer.
+struct MyStructUintptr {
+ int a;
+ uintptr_t ptr;
+};
+
+// CHECK-LABEL: @_Z18test_uintptr_isptrv(
+MyStructUintptr *test_uintptr_isptr() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 16, i64 1)
+ return new MyStructUintptr;
+}
+
+using uptr = uintptr_t;
+// CHECK-LABEL: @_Z19test_uintptr_isptr2v(
+uptr *test_uintptr_isptr2() {
+ // CHECK: call {{.*}} ptr @__alloc_token_Znwm(i64 noundef 8, i64 1)
+ return new uptr;
+}
diff --git a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
index 4ea68470ca684..74cda227d50a7 100644
--- a/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+++ b/llvm/lib/Transforms/Instrumentation/AllocToken.cpp
@@ -70,6 +70,11 @@ enum class TokenMode : unsigned {
/// Token ID based on allocated type hash.
TypeHash = 2,
+ /// Token ID based on allocated type hash, where the top half ID-space is
+ /// reserved for types that contain pointers and the bottom half for types
+ /// that do not contain pointers.
+ TypeHashPointerSplit = 3,
+
// Mode count - keep last
ModeCount
};
@@ -89,7 +94,7 @@ struct ModeParser : public cl::parser<unsigned> {
cl::opt<unsigned, false, ModeParser>
ClMode("alloc-token-mode", cl::desc("Token assignment mode"), cl::Hidden,
- cl::init(static_cast<unsigned>(TokenMode::TypeHash)));
+ cl::init(static_cast<unsigned>(TokenMode::TypeHashPointerSplit)));
cl::opt<std::string> ClFuncPrefix("alloc-token-prefix",
cl::desc("The allocation function prefix"),
@@ -142,16 +147,23 @@ STATISTIC(NumAllocations, "Allocations found");
/// Returns the !alloc_token_hint metadata if available.
///
-/// Expected format is: !{<type-name>}
+/// Expected format is: !{<type-name>, <contains-pointer>}
MDNode *getAllocTokenHintMetadata(const CallBase &CB) {
MDNode *Ret = CB.getMetadata(LLVMContext::MD_alloc_token_hint);
if (!Ret)
return nullptr;
- assert(Ret->getNumOperands() == 1 && "bad !alloc_token_hint");
+ assert(Ret->getNumOperands() == 2 && "bad !alloc_token_hint");
assert(isa<MDString>(Ret->getOperand(0)));
+ assert(isa<ConstantAsMetadata>(Ret->getOperand(1)));
return Ret;
}
+bool containsPointer(const MDNode *MD) {
+ ConstantAsMetadata *C = cast<ConstantAsMetadata>(MD->getOperand(1));
+ auto *CI = cast<ConstantInt>(C->getValue());
+ return CI->getValue().getBoolValue();
+}
+
class ModeBase {
public:
explicit ModeBase(uint64_t MaxTokens) : MaxTokens(MaxTokens) {}
@@ -198,12 +210,20 @@ class TypeHashMode : public ModeBase {
using ModeBase::ModeBase;
uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ const auto [N, H] = getHash(CB, ORE);
+ return N ? boundedToken(H) : H;
+ }
+
+protected:
+ std::pair<MDNode *, uint64_t> getHash(const CallBase &CB,
+ OptimizationRemarkEmitter &ORE) {
if (MDNode *N = getAllocTokenHintMetadata(CB)) {
MDString *S = cast<MDString>(N->getOperand(0));
- return boundedToken(xxHash64(S->getString()));
+ return {N, xxHash64(S->getString())};
}
+ // Fallback.
remarkNoHint(CB, ORE);
- return ClFallbackToken;
+ return {nullptr, ClFallbackToken};
}
/// Remark that there was no precise type information.
@@ -219,6 +239,26 @@ class TypeHashMode : public ModeBase {
}
};
+/// Implementation for TokenMode::TypeHashPointerSplit.
+class TypeHashPointerSplitMode : public TypeHashMode {
+public:
+ using TypeHashMode::TypeHashMode;
+
+ uint64_t operator()(const CallBase &CB, OptimizationRemarkEmitter &ORE) {
+ if (MaxTokens == 1)
+ return 0;
+ const uint64_t HalfTokens =
+ (MaxTokens ? MaxTokens : std::numeric_limits<uint64_t>::max()) / 2;
+ const auto [N, H] = getHash(CB, ORE);
+ if (!N)
+ return H; // fallback token
+ uint64_t Hash = H % HalfTokens; // base hash
+ if (containsPointer(N))
+ Hash += HalfTokens;
+ return Hash;
+ }
+};
+
// Apply opt overrides.
AllocTokenOptions &&transformOptionsFromCl(AllocTokenOptions &&Opts) {
if (!Opts.MaxTokens.has_value())
@@ -244,6 +284,9 @@ class AllocToken {
case TokenMode::TypeHash:
Mode.emplace<TypeHashMode>(*Options.MaxTokens);
break;
+ case TokenMode::TypeHashPointerSplit:
+ Mode.emplace<TypeHashPointerSplitMode>(*Options.MaxTokens);
+ break;
case TokenMode::ModeCount:
llvm_unreachable("");
break;
@@ -281,7 +324,9 @@ class AllocToken {
// Cache for replacement functions.
DenseMap<std::pair<LibFunc, uint64_t>, FunctionCallee> TokenAllocFunctions;
// Selected mode.
- std::variant<IncrementMode, RandomMode, TypeHashMode> Mode;
+ std::variant<IncrementMode, RandomMode, TypeHashMode,
+ TypeHashPointerSplitMode>
+ Mode;
};
bool AllocToken::instrumentFunction(Function &F) {
diff --git a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
index 37c6d6463ffd2..94fc12b20caff 100644
--- a/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
+++ b/llvm/test/Instrumentation/AllocToken/extralibfuncs.ll
@@ -29,4 +29,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
index be99e5ef14b16..b933c0bd352e9 100644
--- a/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
+++ b/llvm/test/Instrumentation/AllocToken/nonlibcalls.ll
@@ -60,4 +60,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
diff --git a/llvm/test/Instrumentation/AllocToken/remark.ll b/llvm/test/Instrumentation/AllocToken/remark.ll
index d5ecfc41bace5..d2bc273060403 100644
--- a/llvm/test/Instrumentation/AllocToken/remark.ll
+++ b/llvm/test/Instrumentation/AllocToken/remark.ll
@@ -24,4 +24,4 @@ entry:
ret ptr %ptr1
}
-!0 = !{!"int"}
+!0 = !{!"int", i1 0}
|
Created using spr 1.3.8-beta.1
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 Pull Request: llvm#156840
Created using spr 1.3.8-beta.1
|
||
cl::opt<TokenMode> | ||
ClMode("alloc-token-mode", cl::Hidden, cl::desc("Token assignment mode"), | ||
cl::init(TokenMode::TypeHash), | ||
cl::values(clEnumValN(TokenMode::Increment, "increment", | ||
"Incrementally increasing token ID"), | ||
clEnumValN(TokenMode::Random, "random", | ||
"Statically-assigned random token ID"), | ||
clEnumValN(TokenMode::TypeHash, "typehash", | ||
"Token ID based on allocated type hash"))); | ||
cl::opt<TokenMode> ClMode( | ||
"alloc-token-mode", cl::Hidden, cl::desc("Token assignment mode"), | ||
cl::init(TokenMode::TypeHashPointerSplit), | ||
cl::values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it's some experimental parameters to make them FUNCTION_PASS_WITH_PARAMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are experimental. The default one is the current supported and recommended mode. Anything else is experimental (as also mentioned in the documentation).
I recall that @ojhunt might want to make the "default mode" dependent on the target triple, so that on Apple targets it'll switch to what they are currently already shipping with TMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this patch LLVM only,
and clang goes into the next patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 Pull Request: llvm#156840
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 Pull Request: llvm#156840
…alloc_token metadata (#160131) In preparation of adding the "AllocToken" pass, add the pre-requisite `sanitize_alloc_token` function attribute and `alloc_token` metadata. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
Introduce `AllocToken`, an instrumentation pass designed to provide tokens to memory allocators enabling various heap organization strategies, such as heap partitioning. Initially, the pass instruments functions marked with a new attribute `sanitize_alloc_token` by rewriting allocation calls to include a token ID, appended as a function argument with the default ABI. The design aims to provide a flexible framework for implementing different token generation schemes. It currently supports the following token modes: - TypeHash (default): token IDs based on a hash of the allocated type - Random: statically-assigned pseudo-random token IDs - Increment: incrementing token IDs per TU For the `TypeHash` mode introduce support for `!alloc_token` metadata: the metadata can be attached to allocation calls to provide richer semantic information to be consumed by the AllocToken pass. Optimization remarks can be enabled to show where no metadata was available. An alternative "fast ABI" is provided, where instead of passing the token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the token ID is directly encoded into the name of the called function (e.g., `__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this offers more efficient instrumentation by avoiding the overhead of passing an additional argument at each allocation site. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1] --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
…56838) Introduce `AllocToken`, an instrumentation pass designed to provide tokens to memory allocators enabling various heap organization strategies, such as heap partitioning. Initially, the pass instruments functions marked with a new attribute `sanitize_alloc_token` by rewriting allocation calls to include a token ID, appended as a function argument with the default ABI. The design aims to provide a flexible framework for implementing different token generation schemes. It currently supports the following token modes: - TypeHash (default): token IDs based on a hash of the allocated type - Random: statically-assigned pseudo-random token IDs - Increment: incrementing token IDs per TU For the `TypeHash` mode introduce support for `!alloc_token` metadata: the metadata can be attached to allocation calls to provide richer semantic information to be consumed by the AllocToken pass. Optimization remarks can be enabled to show where no metadata was available. An alternative "fast ABI" is provided, where instead of passing the token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the token ID is directly encoded into the name of the called function (e.g., `__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this offers more efficient instrumentation by avoiding the overhead of passing an additional argument at each allocation site. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1] --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
…162098) Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
…62099) For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.
This mode with max tokens of 2 (
-falloc-token-max=2
) may alsobe valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.
Make it the new default mode.
Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
This change is part of the following series: