Skip to content

Conversation

vitalybuka
Copy link
Collaborator

@vitalybuka vitalybuka commented Oct 13, 2025

On average it saves half positive of Glob matching.

However, in real build most SpecialCaseList unmatched,
this change should not affect this case.

To be able to do so without breaking behavior, we
need to re-order matches according precedence.

Usually it's LineNo, and it's already ordered,
but Diagnostic requires reordering by rule length.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:support labels Oct 13, 2025
@vitalybuka vitalybuka requested a review from fmayer October 13, 2025 22:04
@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2025

@llvm/pr-subscribers-clang

Author: Vitaly Buka (vitalybuka)

Changes

On average it saves half of Glob matching.

To be able to do so without breaking behavior, we
need to re-order matches according precedence.

Usually it's LineNo, and it's already ordered,
but Diagnostic requires reordering by rule length.


Full diff: https://github.com/llvm/llvm-project/pull/163279.diff

3 Files Affected:

  • (modified) clang/lib/Basic/Diagnostic.cpp (+2-1)
  • (modified) llvm/include/llvm/Support/SpecialCaseList.h (+10-2)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+42-11)
diff --git a/clang/lib/Basic/Diagnostic.cpp b/clang/lib/Basic/Diagnostic.cpp
index 8ecbd3c5a2cc4..2dec26ecacf26 100644
--- a/clang/lib/Basic/Diagnostic.cpp
+++ b/clang/lib/Basic/Diagnostic.cpp
@@ -525,7 +525,8 @@ std::unique_ptr<WarningsSpecialCaseList>
 WarningsSpecialCaseList::create(const llvm::MemoryBuffer &Input,
                                 std::string &Err) {
   auto WarningSuppressionList = std::make_unique<WarningsSpecialCaseList>();
-  if (!WarningSuppressionList->createInternal(&Input, Err))
+  if (!WarningSuppressionList->createInternal(&Input, Err,
+                                              /*OrderBySize=*/true))
     return nullptr;
   return WarningSuppressionList;
 }
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index 477987c576965..ead765562504d 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -115,7 +115,8 @@ class SpecialCaseList {
   // classes.
   LLVM_ABI bool createInternal(const std::vector<std::string> &Paths,
                                vfs::FileSystem &VFS, std::string &Error);
-  LLVM_ABI bool createInternal(const MemoryBuffer *MB, std::string &Error);
+  LLVM_ABI bool createInternal(const MemoryBuffer *MB, std::string &Error,
+                               bool OrderBySize = false);
 
   SpecialCaseList() = default;
   SpecialCaseList(SpecialCaseList const &) = delete;
@@ -126,6 +127,8 @@ class SpecialCaseList {
   class RegexMatcher {
   public:
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
+
     LLVM_ABI void
     match(StringRef Query,
           llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const;
@@ -144,6 +147,8 @@ class SpecialCaseList {
   class GlobMatcher {
   public:
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
+
     LLVM_ABI void
     match(StringRef Query,
           llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const;
@@ -165,6 +170,7 @@ class SpecialCaseList {
     LLVM_ABI Matcher(bool UseGlobs, bool RemoveDotSlash);
 
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
 
     LLVM_ABI void
     match(StringRef Query,
@@ -206,6 +212,8 @@ class SpecialCaseList {
                                        StringRef Category) const;
 
   private:
+    friend class SpecialCaseList;
+    LLVM_ABI void preprocess(bool OrderBySize);
     LLVM_ABI const SpecialCaseList::Matcher *
     findMatcher(StringRef Prefix, StringRef Category) const;
   };
@@ -222,7 +230,7 @@ class SpecialCaseList {
 
   /// Parses just-constructed SpecialCaseList entries from a memory buffer.
   LLVM_ABI bool parse(unsigned FileIdx, const MemoryBuffer *MB,
-                      std::string &Error);
+                      std::string &Error, bool OrderBySize);
 };
 
 } // namespace llvm
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 80fd48558fe67..4f360da6d5b25 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -55,12 +55,20 @@ Error SpecialCaseList::RegexMatcher::insert(StringRef Pattern,
   return Error::success();
 }
 
+void SpecialCaseList::RegexMatcher::preprocess(bool BySize) {
+  if (BySize) {
+    llvm::sort(RegExes, [](const Reg &A, const Reg &B) {
+      return A.Name.size() < B.Name.size();
+    });
+  }
+}
+
 void SpecialCaseList::RegexMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
   for (const auto &R : reverse(RegExes))
     if (R.Rg.match(Query))
-      Cb(R.Name, R.LineNo);
+      return Cb(R.Name, R.LineNo);
 }
 
 Error SpecialCaseList::GlobMatcher::insert(StringRef Pattern,
@@ -75,12 +83,20 @@ Error SpecialCaseList::GlobMatcher::insert(StringRef Pattern,
   return Error::success();
 }
 
+void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
+  if (BySize) {
+    llvm::sort(Globs, [](const Glob &A, const Glob &B) {
+      return A.Name.size() < B.Name.size();
+    });
+  }
+}
+
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
   for (const auto &G : reverse(Globs))
     if (G.Pattern.match(Query))
-      Cb(G.Name, G.LineNo);
+      return Cb(G.Name, G.LineNo);
 }
 
 SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
@@ -91,6 +107,14 @@ SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
     M.emplace<RegexMatcher>();
 }
 
+Error SpecialCaseList::Matcher::insert(StringRef Pattern, unsigned LineNumber) {
+  return std::visit([&](auto &V) { return V.insert(Pattern, LineNumber); }, M);
+}
+
+LLVM_ABI void SpecialCaseList::Matcher::preprocess(bool BySize) {
+  return std::visit([&](auto &V) { return V.preprocess(BySize); }, M);
+}
+
 void SpecialCaseList::Matcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
@@ -99,10 +123,6 @@ void SpecialCaseList::Matcher::match(
   return std::visit([&](auto &V) { return V.match(Query, Cb); }, M);
 }
 
-Error SpecialCaseList::Matcher::insert(StringRef Pattern, unsigned LineNumber) {
-  return std::visit([&](auto &V) { return V.insert(Pattern, LineNumber); }, M);
-}
-
 // TODO: Refactor this to return Expected<...>
 std::unique_ptr<SpecialCaseList>
 SpecialCaseList::create(const std::vector<std::string> &Paths,
@@ -141,7 +161,7 @@ bool SpecialCaseList::createInternal(const std::vector<std::string> &Paths,
       return false;
     }
     std::string ParseError;
-    if (!parse(i, FileOrErr.get().get(), ParseError)) {
+    if (!parse(i, FileOrErr.get().get(), ParseError, /*OrderBySize=*/false)) {
       Error = (Twine("error parsing file '") + Path + "': " + ParseError).str();
       return false;
     }
@@ -149,9 +169,9 @@ bool SpecialCaseList::createInternal(const std::vector<std::string> &Paths,
   return true;
 }
 
-bool SpecialCaseList::createInternal(const MemoryBuffer *MB,
-                                     std::string &Error) {
-  if (!parse(0, MB, Error))
+bool SpecialCaseList::createInternal(const MemoryBuffer *MB, std::string &Error,
+                                     bool OrderBySize) {
+  if (!parse(0, MB, Error, OrderBySize))
     return false;
   return true;
 }
@@ -174,7 +194,7 @@ SpecialCaseList::addSection(StringRef SectionStr, unsigned FileNo,
 }
 
 bool SpecialCaseList::parse(unsigned FileIdx, const MemoryBuffer *MB,
-                            std::string &Error) {
+                            std::string &Error, bool OrderBySize) {
   unsigned long long Version = 2;
 
   StringRef Header = MB->getBuffer();
@@ -246,6 +266,10 @@ bool SpecialCaseList::parse(unsigned FileIdx, const MemoryBuffer *MB,
       return false;
     }
   }
+
+  for (Section &S : Sections)
+    S.preprocess(OrderBySize);
+
   return true;
 }
 
@@ -283,6 +307,13 @@ SpecialCaseList::Section::findMatcher(StringRef Prefix,
   return &II->second;
 }
 
+LLVM_ABI void SpecialCaseList::Section::preprocess(bool OrderBySize) {
+  SectionMatcher.preprocess(false);
+  for (auto &[K1, E] : Entries)
+    for (auto &[K2, M] : E)
+      M.preprocess(OrderBySize);
+}
+
 unsigned SpecialCaseList::Section::getLastMatch(StringRef Prefix,
                                                 StringRef Query,
                                                 StringRef Category) const {

@llvmbot
Copy link
Member

llvmbot commented Oct 13, 2025

@llvm/pr-subscribers-llvm-support

Author: Vitaly Buka (vitalybuka)

Changes

On average it saves half of Glob matching.

To be able to do so without breaking behavior, we
need to re-order matches according precedence.

Usually it's LineNo, and it's already ordered,
but Diagnostic requires reordering by rule length.


Full diff: https://github.com/llvm/llvm-project/pull/163279.diff

3 Files Affected:

  • (modified) clang/lib/Basic/Diagnostic.cpp (+2-1)
  • (modified) llvm/include/llvm/Support/SpecialCaseList.h (+10-2)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+42-11)
diff --git a/clang/lib/Basic/Diagnostic.cpp b/clang/lib/Basic/Diagnostic.cpp
index 8ecbd3c5a2cc4..2dec26ecacf26 100644
--- a/clang/lib/Basic/Diagnostic.cpp
+++ b/clang/lib/Basic/Diagnostic.cpp
@@ -525,7 +525,8 @@ std::unique_ptr<WarningsSpecialCaseList>
 WarningsSpecialCaseList::create(const llvm::MemoryBuffer &Input,
                                 std::string &Err) {
   auto WarningSuppressionList = std::make_unique<WarningsSpecialCaseList>();
-  if (!WarningSuppressionList->createInternal(&Input, Err))
+  if (!WarningSuppressionList->createInternal(&Input, Err,
+                                              /*OrderBySize=*/true))
     return nullptr;
   return WarningSuppressionList;
 }
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index 477987c576965..ead765562504d 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -115,7 +115,8 @@ class SpecialCaseList {
   // classes.
   LLVM_ABI bool createInternal(const std::vector<std::string> &Paths,
                                vfs::FileSystem &VFS, std::string &Error);
-  LLVM_ABI bool createInternal(const MemoryBuffer *MB, std::string &Error);
+  LLVM_ABI bool createInternal(const MemoryBuffer *MB, std::string &Error,
+                               bool OrderBySize = false);
 
   SpecialCaseList() = default;
   SpecialCaseList(SpecialCaseList const &) = delete;
@@ -126,6 +127,8 @@ class SpecialCaseList {
   class RegexMatcher {
   public:
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
+
     LLVM_ABI void
     match(StringRef Query,
           llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const;
@@ -144,6 +147,8 @@ class SpecialCaseList {
   class GlobMatcher {
   public:
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
+
     LLVM_ABI void
     match(StringRef Query,
           llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const;
@@ -165,6 +170,7 @@ class SpecialCaseList {
     LLVM_ABI Matcher(bool UseGlobs, bool RemoveDotSlash);
 
     LLVM_ABI Error insert(StringRef Pattern, unsigned LineNumber);
+    LLVM_ABI void preprocess(bool BySize);
 
     LLVM_ABI void
     match(StringRef Query,
@@ -206,6 +212,8 @@ class SpecialCaseList {
                                        StringRef Category) const;
 
   private:
+    friend class SpecialCaseList;
+    LLVM_ABI void preprocess(bool OrderBySize);
     LLVM_ABI const SpecialCaseList::Matcher *
     findMatcher(StringRef Prefix, StringRef Category) const;
   };
@@ -222,7 +230,7 @@ class SpecialCaseList {
 
   /// Parses just-constructed SpecialCaseList entries from a memory buffer.
   LLVM_ABI bool parse(unsigned FileIdx, const MemoryBuffer *MB,
-                      std::string &Error);
+                      std::string &Error, bool OrderBySize);
 };
 
 } // namespace llvm
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 80fd48558fe67..4f360da6d5b25 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -55,12 +55,20 @@ Error SpecialCaseList::RegexMatcher::insert(StringRef Pattern,
   return Error::success();
 }
 
+void SpecialCaseList::RegexMatcher::preprocess(bool BySize) {
+  if (BySize) {
+    llvm::sort(RegExes, [](const Reg &A, const Reg &B) {
+      return A.Name.size() < B.Name.size();
+    });
+  }
+}
+
 void SpecialCaseList::RegexMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
   for (const auto &R : reverse(RegExes))
     if (R.Rg.match(Query))
-      Cb(R.Name, R.LineNo);
+      return Cb(R.Name, R.LineNo);
 }
 
 Error SpecialCaseList::GlobMatcher::insert(StringRef Pattern,
@@ -75,12 +83,20 @@ Error SpecialCaseList::GlobMatcher::insert(StringRef Pattern,
   return Error::success();
 }
 
+void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
+  if (BySize) {
+    llvm::sort(Globs, [](const Glob &A, const Glob &B) {
+      return A.Name.size() < B.Name.size();
+    });
+  }
+}
+
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
   for (const auto &G : reverse(Globs))
     if (G.Pattern.match(Query))
-      Cb(G.Name, G.LineNo);
+      return Cb(G.Name, G.LineNo);
 }
 
 SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
@@ -91,6 +107,14 @@ SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
     M.emplace<RegexMatcher>();
 }
 
+Error SpecialCaseList::Matcher::insert(StringRef Pattern, unsigned LineNumber) {
+  return std::visit([&](auto &V) { return V.insert(Pattern, LineNumber); }, M);
+}
+
+LLVM_ABI void SpecialCaseList::Matcher::preprocess(bool BySize) {
+  return std::visit([&](auto &V) { return V.preprocess(BySize); }, M);
+}
+
 void SpecialCaseList::Matcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
@@ -99,10 +123,6 @@ void SpecialCaseList::Matcher::match(
   return std::visit([&](auto &V) { return V.match(Query, Cb); }, M);
 }
 
-Error SpecialCaseList::Matcher::insert(StringRef Pattern, unsigned LineNumber) {
-  return std::visit([&](auto &V) { return V.insert(Pattern, LineNumber); }, M);
-}
-
 // TODO: Refactor this to return Expected<...>
 std::unique_ptr<SpecialCaseList>
 SpecialCaseList::create(const std::vector<std::string> &Paths,
@@ -141,7 +161,7 @@ bool SpecialCaseList::createInternal(const std::vector<std::string> &Paths,
       return false;
     }
     std::string ParseError;
-    if (!parse(i, FileOrErr.get().get(), ParseError)) {
+    if (!parse(i, FileOrErr.get().get(), ParseError, /*OrderBySize=*/false)) {
       Error = (Twine("error parsing file '") + Path + "': " + ParseError).str();
       return false;
     }
@@ -149,9 +169,9 @@ bool SpecialCaseList::createInternal(const std::vector<std::string> &Paths,
   return true;
 }
 
-bool SpecialCaseList::createInternal(const MemoryBuffer *MB,
-                                     std::string &Error) {
-  if (!parse(0, MB, Error))
+bool SpecialCaseList::createInternal(const MemoryBuffer *MB, std::string &Error,
+                                     bool OrderBySize) {
+  if (!parse(0, MB, Error, OrderBySize))
     return false;
   return true;
 }
@@ -174,7 +194,7 @@ SpecialCaseList::addSection(StringRef SectionStr, unsigned FileNo,
 }
 
 bool SpecialCaseList::parse(unsigned FileIdx, const MemoryBuffer *MB,
-                            std::string &Error) {
+                            std::string &Error, bool OrderBySize) {
   unsigned long long Version = 2;
 
   StringRef Header = MB->getBuffer();
@@ -246,6 +266,10 @@ bool SpecialCaseList::parse(unsigned FileIdx, const MemoryBuffer *MB,
       return false;
     }
   }
+
+  for (Section &S : Sections)
+    S.preprocess(OrderBySize);
+
   return true;
 }
 
@@ -283,6 +307,13 @@ SpecialCaseList::Section::findMatcher(StringRef Prefix,
   return &II->second;
 }
 
+LLVM_ABI void SpecialCaseList::Section::preprocess(bool OrderBySize) {
+  SectionMatcher.preprocess(false);
+  for (auto &[K1, E] : Entries)
+    for (auto &[K2, M] : E)
+      M.preprocess(OrderBySize);
+}
+
 unsigned SpecialCaseList::Section::getLastMatch(StringRef Prefix,
                                                 StringRef Query,
                                                 StringRef Category) const {

jurahul and others added 2 commits October 13, 2025 16:44
Created using spr 1.3.6

[skip ci]
Created using spr 1.3.6
@vitalybuka vitalybuka requested a review from fmayer October 13, 2025 23:50
@vitalybuka vitalybuka changed the base branch from users/vitalybuka/spr/main.specialcaselist-support-early-return-from-matching to main October 14, 2025 00:01
@vitalybuka vitalybuka enabled auto-merge (squash) October 14, 2025 00:02
@vitalybuka vitalybuka merged commit 301d008 into main Oct 14, 2025
12 of 17 checks passed
@vitalybuka vitalybuka deleted the users/vitalybuka/spr/specialcaselist-support-early-return-from-matching branch October 14, 2025 00:19
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
On average it saves half positive of Glob matching.

However, in real build most SpecialCaseList unmatched,
this change should not affect this case. 

To be able to do so without breaking behavior, we
need to re-order matches according precedence.

Usually it's LineNo, and it's already ordered,
but Diagnostic requires reordering by rule length.

Co-authored-by: Rahul Joshi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants