Skip to content

Conversation

@qinkunbao
Copy link
Member

@qinkunbao qinkunbao commented May 13, 2025

It is a draft implementation for "src:*=sanitize". It should be applied to all sanitizers.

Any srcs assigned to the sanitize category will have their sanitizer instrumentation remained ignored by "src:". For example,

src:*
src:*/test1.c=sanitize

test1.c will still have the UBSan instrumented.

However

type:int
src:*/test1.c=sanitize

test1.c does not have the int type check.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 13, 2025
@llvmbot
Copy link
Member

llvmbot commented May 13, 2025

@llvm/pr-subscribers-clang

Author: Qinkun Bao (qinkunbao)

Changes

It is a draft implementation for "src:*=sanitize". It should be applied to all sanitizers.

Any srcs assigned to the sanitize category will have their sanitizer instrumentation remained ignored by "src:". For example,

src:*
src:*/test1.c=sanitize

test1.c will still have the UBSan instrumented.

However

type:int
src:*/test1.c=sanitize

test1.c does not have the int type check.


Full diff: https://github.com/llvm/llvm-project/pull/139772.diff

2 Files Affected:

  • (modified) clang/lib/Basic/NoSanitizeList.cpp (+2-1)
  • (added) clang/test/CodeGen/ubsan-src-ignorelist-category.test (+23)
diff --git a/clang/lib/Basic/NoSanitizeList.cpp b/clang/lib/Basic/NoSanitizeList.cpp
index e7e63c1f419e6..c4db5a5a211d2 100644
--- a/clang/lib/Basic/NoSanitizeList.cpp
+++ b/clang/lib/Basic/NoSanitizeList.cpp
@@ -44,7 +44,8 @@ bool NoSanitizeList::containsFunction(SanitizerMask Mask,
 
 bool NoSanitizeList::containsFile(SanitizerMask Mask, StringRef FileName,
                                   StringRef Category) const {
-  return SSCL->inSection(Mask, "src", FileName, Category);
+  bool allowList = SSCL->inSection(Mask, "src", FileName, "sanitize");
+  return SSCL->inSection(Mask, "src", FileName, Category) && !allowList;
 }
 
 bool NoSanitizeList::containsMainFile(SanitizerMask Mask, StringRef FileName,
diff --git a/clang/test/CodeGen/ubsan-src-ignorelist-category.test b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
new file mode 100644
index 0000000000000..5242c10bdeec7
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
@@ -0,0 +1,23 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test1.c -o - | FileCheck %s -check-prefix=CHECK-ALLOWLIST
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test2.c -o - | FileCheck %s -check-prefix=CHECK-IGNORELIST
+
+
+// Verify ubsan only emits checks for files in the allowlist
+
+//--- src.ignorelist
+src:*
+src:*/test1.c=sanitize
+
+//--- test1.c
+int add1(int a, int b) {
+// CHECK-ALLOWLIST: llvm.sadd.with.overflow.i32
+    return a+b;
+}
+
+//--- test2.c
+int add2(int a, int b) {
+// CHECK-IGNORELIST-NOT: llvm.sadd.with.overflow.i32
+    return a+b;
+}

@qinkunbao qinkunbao requested a review from vitalybuka May 13, 2025 18:09
@vitalybuka
Copy link
Collaborator

Reminder to @thurstond as well :)

Please figure out https://llvm.org/docs/GitHub.html#stacked-pull-requests

// Verify ubsan only emits checks for files in the allowlist

//--- src.ignorelist
src:*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have a problem with:

src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have some ideas how to fix this:

  1. Right now order does not matter, if we say that it's matter now, we don't break existing stuff
  2. Let's make =sanitize reset all above and =
  3. So inSection need to return linenum in file which will compare with linenum of =sanitize. There is a version of function for that

Problem: SanitizerSpecialCaseList tracks linenum, but if you check parsing code, it can represent multiple files!

I suggest to start with updating SpecialCaseList to track (fileindex, linenum).

But, SpecialCaseList::Sections for whatever reasons is StringMap,
so it's ordered by name, and it's not used anyway.

Let's first PR to make SpecialCaseList::Sections -> vector

similar SanitizerSpecialCaseList?

Copy link
Member Author

@qinkunbao qinkunbao May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have a problem with:

src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc

For this example, I think /mylib/test.cc will be instrumented because the file /mylib/test.cc will fall into two categories (= and =sanitize) with this PR. And it should follow the =sanitizer category. I think it should match the current doc. (I have updated the test to cover the situation. )

I was a little hesitated about the change during implementing the feature. For example, if we have a ignore list file

type:int
src:test.cc=sanitize

If a file test.cc has the type int, shall we still instrument the type int variable in the file test.cc?

On the other hand,

src:test.cc
type:int=sanitize

Shall we still instrument the type int variable in the file test.cc?

I think the both answers are yes. Let me know if it makes sense to you.

I will create a new PR with using user branches in llvm/llvm-project to update SpecialCaseList::Sections. Thank you for the feedback.

Copy link
Member Author

@qinkunbao qinkunbao May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, SpecialCaseList::Sections for whatever reasons is StringMap,
so it's ordered by name, and it's not used anyway.

Oh, I find the key of the StringMap Sections is actually used. Here is the Link. Here are some options. Let me know which option is better.

  1. How about creating a vector<Section*> to track the order?
StringMap<Section> Sections;
vector<Section*> SectionsVec;

  1. We can use the key from Globs
    auto [It, DidEmplace] = Globs.try_emplace(Pattern);

qinkunbao added a commit that referenced this pull request May 16, 2025
… vector.

As discussed in #139772, SpecialCaseList::Sections can keep the order of Sections when parsing the case list.

Reviewers: thurstond, vitalybuka

Reviewed By: vitalybuka

Pull Request: #140127
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 16, 2025
…tringMap to vector.

As discussed in llvm/llvm-project#139772, SpecialCaseList::Sections can keep the order of Sections when parsing the case list.

Reviewers: thurstond, vitalybuka

Reviewed By: vitalybuka

Pull Request: llvm/llvm-project#140127
qinkunbao added a commit that referenced this pull request May 24, 2025
…140964)

As discussed in #139772 and
#140529, Matcher::Globs can
keep the order when parsing the case list.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 24, 2025
… vector. (#140964)

As discussed in llvm/llvm-project#139772 and
llvm/llvm-project#140529, Matcher::Globs can
keep the order when parsing the case list.
@qinkunbao
Copy link
Member Author

Close this PR as the implementation goes to other PRs.

@qinkunbao qinkunbao closed this May 28, 2025
@vitalybuka vitalybuka reopened this May 28, 2025
@vitalybuka vitalybuka closed this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants