Skip to content

Conversation

@jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Jul 16, 2025

This makes CodeGenRegisterClass::contains fast. Use this to simplify
inferMatchingSuperRegClass.

This makes CodeGenRegisterClass::contains fast. Use this to simplify
inferMatchingSuperRegClass.
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-tablegen

Author: Jay Foad (jayfoad)

Changes

This makes CodeGenRegisterClass::contains fast. Use this to simplify
inferMatchingSuperRegClass.


Full diff: https://github.com/llvm/llvm-project/pull/149122.diff

3 Files Affected:

  • (modified) llvm/utils/TableGen/Common/CodeGenRegisters.cpp (+11-16)
  • (modified) llvm/utils/TableGen/Common/CodeGenRegisters.h (+3-1)
  • (modified) llvm/utils/TableGen/RegisterInfoEmitter.cpp (+1-1)
diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
index c43cc9afe1e3c..c348d156fcce7 100644
--- a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
@@ -701,11 +701,13 @@ CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
   Orders.resize(1 + AltOrders->size());
 
   // Default allocation order always contains all registers.
+  MemberBV.resize(RegBank.getRegisters().size());
   Artificial = true;
   for (const Record *Element : *Elements) {
     Orders[0].push_back(Element);
     const CodeGenRegister *Reg = RegBank.getReg(Element);
     Members.push_back(Reg);
+    MemberBV.set(CodeGenRegBank::getRegIndex(Reg));
     Artificial &= Reg->Artificial;
     if (!Reg->getSuperRegs().empty())
       RegsWithSuperRegsTopoSigs.set(Reg->getTopoSig());
@@ -767,9 +769,11 @@ CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
       RegsWithSuperRegsTopoSigs(RegBank.getNumTopoSigs()), EnumValue(-1),
       RSI(Props.RSI), CopyCost(0), Allocatable(true), AllocationPriority(0),
       GlobalPriority(false), TSFlags(0) {
+  MemberBV.resize(RegBank.getRegisters().size());
   Artificial = true;
   GeneratePressureSet = false;
   for (const auto R : Members) {
+    MemberBV.set(CodeGenRegBank::getRegIndex(R));
     if (!R->getSuperRegs().empty())
       RegsWithSuperRegsTopoSigs.set(R->getTopoSig());
     Artificial &= R->Artificial;
@@ -833,7 +837,7 @@ bool CodeGenRegisterClass::hasType(const ValueTypeByHwMode &VT) const {
 }
 
 bool CodeGenRegisterClass::contains(const CodeGenRegister *Reg) const {
-  return llvm::binary_search(Members, Reg, deref<std::less<>>());
+  return MemberBV.test(CodeGenRegBank::getRegIndex(Reg));
 }
 
 unsigned CodeGenRegisterClass::getWeight(const CodeGenRegBank &RegBank) const {
@@ -2332,8 +2336,7 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
     CodeGenRegisterClass *RC,
     std::list<CodeGenRegisterClass>::iterator FirstSubRegRC) {
   DenseSet<const CodeGenSubRegIndex *> ImpliedSubRegIndices;
-  std::vector<std::pair<const CodeGenRegister *, const CodeGenRegister *>>
-      SubToSuperRegs;
+  std::vector<const CodeGenRegister *> SubRegs;
   BitVector TopoSigs(getNumTopoSigs());
 
   // Iterate subregister indices in topological order to visit larger indices
@@ -2351,15 +2354,14 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
 
     // Build list of (Sub, Super) pairs for this SubIdx, sorted by Sub. Note
     // that the list may contain entries with the same Sub but different Supers.
-    SubToSuperRegs.clear();
+    SubRegs.clear();
     TopoSigs.reset();
     for (const CodeGenRegister *Super : RC->getMembers()) {
       const CodeGenRegister *Sub = Super->getSubRegs().find(SubIdx)->second;
       assert(Sub && "Missing sub-register");
-      SubToSuperRegs.emplace_back(Sub, Super);
+      SubRegs.push_back(Sub);
       TopoSigs.set(Sub->getTopoSig());
     }
-    sort(SubToSuperRegs, on_first<deref<std::less<>>>());
 
     // Iterate over sub-register class candidates.  Ignore classes created by
     // this loop. They will never be useful.
@@ -2374,16 +2376,10 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
       // Topological shortcut: SubRC members have the wrong shape.
       if (!TopoSigs.anyCommon(SubRC.getRegsWithSuperRegsTopoSigs()))
         continue;
-      // Compute the subset of RC that maps into SubRC with a single linear scan
-      // through SubToSuperRegs and the members of SubRC.
+      // Compute the subset of RC that maps into SubRC.
       CodeGenRegister::Vec SubSetVec;
-      auto SubI = SubRC.getMembers().begin(), SubE = SubRC.getMembers().end();
-      for (auto &[Sub, Super] : SubToSuperRegs) {
-        while (SubI != SubE && **SubI < *Sub)
-          ++SubI;
-        if (SubI == SubE)
-          break;
-        if (**SubI == *Sub)
+      for (const auto &[Sub, Super] : zip_equal(SubRegs, RC->getMembers())) {
+        if (SubRC.contains(Sub))
           SubSetVec.push_back(Super);
       }
 
@@ -2391,7 +2387,6 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
         continue;
 
       // RC injects completely into SubRC.
-      sortAndUniqueRegisters(SubSetVec);
       if (SubSetVec.size() == RC->getMembers().size()) {
         SubRC.addSuperRegClass(SubIdx, RC);
 
diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.h b/llvm/utils/TableGen/Common/CodeGenRegisters.h
index bbcd44ce2cc5b..5e6fff0f775ea 100644
--- a/llvm/utils/TableGen/Common/CodeGenRegisters.h
+++ b/llvm/utils/TableGen/Common/CodeGenRegisters.h
@@ -315,6 +315,8 @@ inline bool operator==(const CodeGenRegister &A, const CodeGenRegister &B) {
 
 class CodeGenRegisterClass {
   CodeGenRegister::Vec Members;
+  // Bit mask of members, indexed by getRegIndex.
+  BitVector MemberBV;
   // Allocation orders. Order[0] always contains all registers in Members.
   std::vector<SmallVector<const Record *, 16>> Orders;
   // Bit mask of sub-classes including this, indexed by their EnumValue.
@@ -752,7 +754,7 @@ class CodeGenRegBank {
   CodeGenRegister *getReg(const Record *);
 
   // Get a Register's index into the Registers array.
-  unsigned getRegIndex(const CodeGenRegister *Reg) const {
+  static unsigned getRegIndex(const CodeGenRegister *Reg) {
     return Reg->EnumValue - 1;
   }
 
diff --git a/llvm/utils/TableGen/RegisterInfoEmitter.cpp b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
index 7d24c0f80cddb..2a311b7ff96b8 100644
--- a/llvm/utils/TableGen/RegisterInfoEmitter.cpp
+++ b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
@@ -1644,7 +1644,7 @@ void RegisterInfoEmitter::runTargetDesc(raw_ostream &OS) {
       for (const CodeGenRegister &Reg : Regs) {
         const CodeGenRegisterClass *BaseRC = nullptr;
         for (const CodeGenRegisterClass *RC : BaseClasses) {
-          if (is_contained(RC->getMembers(), &Reg)) {
+          if (RC->contains(&Reg)) {
             BaseRC = RC;
             break;
           }

@jayfoad
Copy link
Contributor Author

jayfoad commented Jul 16, 2025

I tried measuring peak memory usage using:

/usr/bin/time -v llvm-tblgen -gen-register-info -I include -I lib/Target/AMDGPU lib/Target/AMDGPU/AMDGPU.td

I couldn't measure any increase from this patch. The results fluctuate by around +/- 0.1% anyway so I guess it is lost in the noise.

@jayfoad
Copy link
Contributor Author

jayfoad commented Jul 16, 2025

I have ideas for follow-on optimizations but I want to keep this patch smallish.

@nvjle
Copy link
Contributor

nvjle commented Jul 16, 2025

I tried measuring peak memory usage using:

/usr/bin/time -v llvm-tblgen -gen-register-info -I include -I lib/Target/AMDGPU lib/Target/AMDGPU/AMDGPU.td

I couldn't measure any increase from this patch. The results fluctuate by around +/- 0.1% anyway so I guess it is lost in the noise.

The new patch appears obviously faster (and definitely cleaner), but you might also try using TGTimer to measure any compile-time difference for this area of code. FWIW, I've also applied the patch locally to a downstream back-end with a register hierarchy that is similarly complicated to AMDGPU (or worse), and all is well.

LGTM.

@jayfoad
Copy link
Contributor Author

jayfoad commented Jul 16, 2025

you might also try using TGTimer to measure any compile-time difference for this area of code.

Yes, I did try that locally. I should clean it up for submission.

FWIW, I've also applied the patch locally to a downstream back-end with a register hierarchy that is similarly complicated to AMDGPU (or worse), and all is well.

Awesome, thanks!

@jayfoad jayfoad merged commit 4355356 into llvm:main Jul 16, 2025
11 checks passed
@jayfoad jayfoad deleted the add-memberbv branch July 16, 2025 19:31
@jayfoad
Copy link
Contributor Author

jayfoad commented Jul 17, 2025

you might also try using TGTimer to measure any compile-time difference for this area of code.

Yes, I did try that locally. I should clean it up for submission.

#149309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants