[LLVM][MC][CodeEmitterGen] Reduce various `InstBits` table sizes #156213

jurahul · 2025-08-30T23:07:29Z

Change various InstBits tables have an entry only for non-pseudo target instructions and adjust the indexing into these tables accordingly.

Some minor refactoring related to this:

Use early return after handling variable length encodings
Reduce the scope of anonymous namespace to just the class declaration.

Example reductions in these table sizes for some targets:

Target      FirstSupportedOpcode         Reduction in size
AMDGPU                     10813         10813 * 16 = 168KB
RISCV                      12051         12051 * 8 = 94KB

llvmbot · 2025-08-31T00:11:58Z

@llvm/pr-subscribers-tablegen

Author: Rahul Joshi (jurahul)

Changes

Change various InstBits tables have an entry only for non-pseudo target instructions and adjust the indexing into these tables accordingly.

Some minor refactoring related to this:

Use early return after handling variable length encodings
Extract code to generate error reporting code into a common ReportError function.

Example reductions in these table sizes for some targets:

Target      FirstSupportedOpcode         Reduction in size
AMDGPU                     10813         10813 * 16 = 168KB
RISCV                      12051         12051 * 8 = 94KB

Full diff: https://github.com/llvm/llvm-project/pull/156213.diff

3 Files Affected:

(modified) llvm/test/TableGen/HwModeEncodeAPInt.td (+26-20)
(modified) llvm/test/TableGen/HwModeEncodeDecode3.td (+22-18)
(modified) llvm/utils/TableGen/CodeEmitterGen.cpp (+126-129)

diff --git a/llvm/test/TableGen/HwModeEncodeAPInt.td b/llvm/test/TableGen/HwModeEncodeAPInt.td
index 43ca5edd952a6..e71950e2b01f3 100644
--- a/llvm/test/TableGen/HwModeEncodeAPInt.td
+++ b/llvm/test/TableGen/HwModeEncodeAPInt.td
@@ -114,32 +114,38 @@ def unrelated: Instruction {
 // For 'baz' we only assigned ModeB for it, so it will be presented
 // as '0' in the tables of ModeA, ModeC and Default Mode.
 // ENCODER-LABEL:   static const uint64_t InstBits[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),       // baz
-// ENCODER:         UINT64_C(8), UINT64_C(0),       // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),       // baz
+// ENCODER-NEXT:    UINT64_C(8), UINT64_C(0),       // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeA[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),       // baz
-// ENCODER:         UINT64_C(12), UINT64_C(0),      // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),       // baz
+// ENCODER-NEXT:    UINT64_C(12), UINT64_C(0),      // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeB[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(12), UINT64_C(0),      // baz
-// ENCODER:         UINT64_C(0), UINT64_C(211106232532992),  // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(12), UINT64_C(0),      // baz
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(211106232532992),  // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeC[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),      // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),      // baz
-// ENCODER:         UINT64_C(12582915),  UINT64_C(0),  // foo
-// ENCODER:         UINT64_C(2),  UINT64_C(0),     // unrelated
-
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),      // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),      // baz
+// ENCODER-NEXT:    UINT64_C(12582915),  UINT64_C(0),  // foo
+// ENCODER-NEXT:    UINT64_C(2),  UINT64_C(0),     // unrelated
+// ENCODER-NEXT:    };
 
 // ENCODER: const uint64_t *InstBitsByHw;
+// ENCODER: constexpr unsigned FirstSupportedOpcode
 // ENCODER: const unsigned opcode = MI.getOpcode();
+// ENCODER: if (opcode < FirstSupportedOpcode)
+// ENCODER: unsigned TableIndex = opcode - FirstSupportedOpcode
 // ENCODER: if (Scratch.getBitWidth() != 128)
 // ENCODER:   Scratch = Scratch.zext(128);
-// ENCODER: Inst = APInt(128, ArrayRef(InstBits + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBits + TableIndex * 2, 2));
 // ENCODER: APInt &Value = Inst;
 // ENCODER: APInt &op = Scratch;
 // ENCODER: switch (opcode) {
@@ -155,7 +161,7 @@ def unrelated: Instruction {
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: case 3: InstBitsByHw = InstBits_ModeC; break;
 // ENCODER: };
-// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + TableIndex * 2, 2));
 // ENCODER: Value = Inst;
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
@@ -189,7 +195,7 @@ def unrelated: Instruction {
 // ENCODER: default: llvm_unreachable("Unknown hardware mode!"); break;
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: };
-// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + TableIndex * 2, 2));
 // ENCODER: Value = Inst;
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
diff --git a/llvm/test/TableGen/HwModeEncodeDecode3.td b/llvm/test/TableGen/HwModeEncodeDecode3.td
index dbbf866f057e5..dc2b850cd1096 100644
--- a/llvm/test/TableGen/HwModeEncodeDecode3.td
+++ b/llvm/test/TableGen/HwModeEncodeDecode3.td
@@ -189,25 +189,29 @@ def unrelated: Instruction {
 // For 'baz' we only assigned ModeB for it, so it will be presented
 // as '0' in the tables of ModeA, ModeC and Default Mode.
 // ENCODER-LABEL:   static const uint64_t InstBits[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(8),        // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(8),        // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeA[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(12),       // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(12),       // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeB[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(12),       // baz
-// ENCODER:         UINT64_C(3),        // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(12),       // baz
+// ENCODER-NEXT:    UINT64_C(3),        // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeC[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(12582915), // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(12582915), // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 
 // ENCODER-LABEL: case ::bar:
 // ENCODER-LABEL: case ::unrelated:
@@ -221,7 +225,7 @@ def unrelated: Instruction {
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: case 3: InstBitsByHw = InstBits_ModeC; break;
 // ENCODER: };
-// ENCODER: Value = InstBitsByHw[opcode];
+// ENCODER: Value = InstBitsByHw[TableIndex];
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
 // ENCODER: case 0: {
@@ -256,7 +260,7 @@ def unrelated: Instruction {
 // ENCODER: default: llvm_unreachable("Unknown hardware mode!"); break;
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: };
-// ENCODER: Value = InstBitsByHw[opcode];
+// ENCODER: Value = InstBitsByHw[TableIndex];
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
 // ENCODER: case 2: {
diff --git a/llvm/utils/TableGen/CodeEmitterGen.cpp b/llvm/utils/TableGen/CodeEmitterGen.cpp
index 252c2521c0694..128da2ed49661 100644
--- a/llvm/utils/TableGen/CodeEmitterGen.cpp
+++ b/llvm/utils/TableGen/CodeEmitterGen.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
@@ -78,6 +79,8 @@ class CodeEmitterGen {
   bool UseAPInt = false;
 };
 
+} // end anonymous namespace
+
 // If the VarBitInit at position 'bit' matches the specified variable then
 // return the variable bit position.  Otherwise return -1.
 int CodeEmitterGen::getVariableBit(const std::string &VarName,
@@ -311,12 +314,12 @@ CodeEmitterGen::getInstructionCases(const Record *R,
     if (UseAPInt) {
       int NumWords = APInt::getNumWords(BitWidth);
       Case += "      Inst = APInt(" + itostr(BitWidth);
-      Case += ", ArrayRef(InstBitsByHw + opcode * " + itostr(NumWords) + ", " +
-              itostr(NumWords);
+      Case += ", ArrayRef(InstBitsByHw + TableIndex * " + itostr(NumWords) +
+              ", " + itostr(NumWords);
       Case += "));\n";
       Case += "      Value = Inst;\n";
     } else {
-      Case += "      Value = InstBitsByHw[opcode];\n";
+      Case += "      Value = InstBitsByHw[TableIndex];\n";
     }
 
     Append("      switch (HwMode) {\n");
@@ -396,15 +399,6 @@ void CodeEmitterGen::emitInstructionBaseValues(
 
   for (const CodeGenInstruction *CGI : NumberedInstructions) {
     const Record *R = CGI->TheDef;
-
-    if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-        R->getValueAsBit("isPseudo")) {
-      O << "    ";
-      emitInstBits(O, APInt(BitWidth, 0));
-      O << ",\n";
-      continue;
-    }
-
     const Record *EncodingDef = R;
     if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
       EncodingInfoByHwMode EBM(RV, HWM);
@@ -453,6 +447,16 @@ void CodeEmitterGen::emitCaseMap(
   }
 }
 
+static void ReportError(raw_ostream &O, indent Indent) {
+  static constexpr char ReportErrorCode[] = R"(
+{0}std::string msg;
+{0}raw_string_ostream Msg(msg);
+{0}Msg << "Unsupported instruction: " << MI;
+{0}report_fatal_error(Msg.str().c_str());
+)";
+  O << formatv(ReportErrorCode, Indent);
+}
+
 void CodeEmitterGen::run(raw_ostream &O) {
   emitSourceFileHeader("Machine Code Emitter", O);
 
@@ -461,138 +465,131 @@ void CodeEmitterGen::run(raw_ostream &O) {
   // For little-endian instruction bit encodings, reverse the bit order
   Target.reverseBitsForLittleEndianEncoding();
 
-  ArrayRef<const CodeGenInstruction *> NumberedInstructions =
-      Target.getInstructions();
+  ArrayRef<const CodeGenInstruction *> EncodedInstructions =
+      Target.getTargetNonPseudoInstructions();
 
   if (Target.hasVariableLengthEncodings()) {
     emitVarLenCodeEmitter(Records, O);
-  } else {
-    const CodeGenHwModes &HWM = Target.getHwModes();
-    // The set of HwModes used by instruction encodings.
-    std::set<unsigned> HwModes;
-    BitWidth = 0;
-    for (const CodeGenInstruction *CGI : NumberedInstructions) {
-      const Record *R = CGI->TheDef;
-      if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-          R->getValueAsBit("isPseudo"))
-        continue;
-
-      if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
-        EncodingInfoByHwMode EBM(RV, HWM);
-        for (const auto &[Key, Value] : EBM) {
-          const BitsInit *BI = Value->getValueAsBitsInit("Inst");
-          BitWidth = std::max(BitWidth, BI->getNumBits());
-          HwModes.insert(Key);
-        }
-        continue;
+    return;
+  }
+  const CodeGenHwModes &HWM = Target.getHwModes();
+  // The set of HwModes used by instruction encodings.
+  std::set<unsigned> HwModes;
+  BitWidth = 0;
+  for (const CodeGenInstruction *CGI : EncodedInstructions) {
+    const Record *R = CGI->TheDef;
+    if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
+      EncodingInfoByHwMode EBM(RV, HWM);
+      for (const auto &[Key, Value] : EBM) {
+        const BitsInit *BI = Value->getValueAsBitsInit("Inst");
+        BitWidth = std::max(BitWidth, BI->getNumBits());
+        HwModes.insert(Key);
       }
-      const BitsInit *BI = R->getValueAsBitsInit("Inst");
-      BitWidth = std::max(BitWidth, BI->getNumBits());
+      continue;
     }
-    UseAPInt = BitWidth > 64;
+    const BitsInit *BI = R->getValueAsBitsInit("Inst");
+    BitWidth = std::max(BitWidth, BI->getNumBits());
+  }
+  UseAPInt = BitWidth > 64;
+
+  // Emit function declaration
+  if (UseAPInt) {
+    O << "void " << Target.getName()
+      << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
+      << "    SmallVectorImpl<MCFixup> &Fixups,\n"
+      << "    APInt &Inst,\n"
+      << "    APInt &Scratch,\n"
+      << "    const MCSubtargetInfo &STI) const {\n";
+  } else {
+    O << "uint64_t " << Target.getName();
+    O << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
+      << "    SmallVectorImpl<MCFixup> &Fixups,\n"
+      << "    const MCSubtargetInfo &STI) const {\n";
+  }
 
-    // Emit function declaration
-    if (UseAPInt) {
-      O << "void " << Target.getName()
-        << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
-        << "    SmallVectorImpl<MCFixup> &Fixups,\n"
-        << "    APInt &Inst,\n"
-        << "    APInt &Scratch,\n"
-        << "    const MCSubtargetInfo &STI) const {\n";
-    } else {
-      O << "uint64_t " << Target.getName();
-      O << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
-        << "    SmallVectorImpl<MCFixup> &Fixups,\n"
-        << "    const MCSubtargetInfo &STI) const {\n";
+  // Emit instruction base values
+  emitInstructionBaseValues(O, EncodedInstructions, Target, DefaultMode);
+  if (!HwModes.empty()) {
+    // Emit table for instrs whose encodings are controlled by HwModes.
+    for (unsigned HwMode : HwModes) {
+      if (HwMode == DefaultMode)
+        continue;
+      emitInstructionBaseValues(O, EncodedInstructions, Target, HwMode);
     }
 
-    // Emit instruction base values
-    emitInstructionBaseValues(O, NumberedInstructions, Target, DefaultMode);
-    if (!HwModes.empty()) {
-      // Emit table for instrs whose encodings are controlled by HwModes.
-      for (unsigned HwMode : HwModes) {
-        if (HwMode == DefaultMode)
-          continue;
-        emitInstructionBaseValues(O, NumberedInstructions, Target, HwMode);
-      }
-
-      // This pointer will be assigned to the HwMode table later.
-      O << "  const uint64_t *InstBitsByHw;\n";
-    }
+    // This pointer will be assigned to the HwMode table later.
+    O << "  const uint64_t *InstBitsByHw;\n";
+  }
 
-    // Map to accumulate all the cases.
-    std::map<std::string, std::vector<std::string>> CaseMap;
-    std::map<std::string, std::vector<std::string>> BitOffsetCaseMap;
+  // Map to accumulate all the cases.
+  std::map<std::string, std::vector<std::string>> CaseMap;
+  std::map<std::string, std::vector<std::string>> BitOffsetCaseMap;
 
-    // Construct all cases statement for each opcode
-    for (const Record *R : Records.getAllDerivedDefinitions("Instruction")) {
-      if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-          R->getValueAsBit("isPseudo"))
-        continue;
-      std::string InstName =
-          (R->getValueAsString("Namespace") + "::" + R->getName()).str();
-      std::string Case, BitOffsetCase;
-      std::tie(Case, BitOffsetCase) = getInstructionCases(R, Target);
+  // Construct all cases statement for each opcode
+  for (const CodeGenInstruction *CGI : EncodedInstructions) {
+    const Record *R = CGI->TheDef;
+    std::string InstName =
+        (R->getValueAsString("Namespace") + "::" + R->getName()).str();
+    std::string Case, BitOffsetCase;
+    std::tie(Case, BitOffsetCase) = getInstructionCases(R, Target);
 
-      CaseMap[Case].push_back(InstName);
-      BitOffsetCaseMap[BitOffsetCase].push_back(std::move(InstName));
-    }
+    CaseMap[Case].push_back(InstName);
+    BitOffsetCaseMap[BitOffsetCase].push_back(std::move(InstName));
+  }
 
-    // Emit initial function code
-    if (UseAPInt) {
-      int NumWords = APInt::getNumWords(BitWidth);
-      O << "  const unsigned opcode = MI.getOpcode();\n"
-        << "  if (Scratch.getBitWidth() != " << BitWidth << ")\n"
-        << "    Scratch = Scratch.zext(" << BitWidth << ");\n"
-        << "  Inst = APInt(" << BitWidth << ", ArrayRef(InstBits + opcode * "
-        << NumWords << ", " << NumWords << "));\n"
-        << "  APInt &Value = Inst;\n"
-        << "  APInt &op = Scratch;\n"
-        << "  switch (opcode) {\n";
-    } else {
-      O << "  const unsigned opcode = MI.getOpcode();\n"
-        << "  uint64_t Value = InstBits[opcode];\n"
-        << "  uint64_t op = 0;\n"
-        << "  (void)op;  // suppress warning\n"
-        << "  switch (opcode) {\n";
-    }
+  unsigned FirstSupportedOpcode = EncodedInstructions.front()->EnumVal;
+  O << "  constexpr unsigned FirstSupportedOpcode = " << FirstSupportedOpcode
+    << ";\n";
+
+  O << "  const unsigned opcode = MI.getOpcode();\n";
+  O << "  if (opcode < FirstSupportedOpcode) {";
+  ReportError(O, indent(4));
+  O << "  }\n";
+  O << "  unsigned TableIndex = opcode - FirstSupportedOpcode;\n";
+
+  // Emit initial function code
+  if (UseAPInt) {
+    int NumWords = APInt::getNumWords(BitWidth);
+    O << "  if (Scratch.getBitWidth() != " << BitWidth << ")\n"
+      << "    Scratch = Scratch.zext(" << BitWidth << ");\n"
+      << "  Inst = APInt(" << BitWidth << ", ArrayRef(InstBits + TableIndex * "
+      << NumWords << ", " << NumWords << "));\n"
+      << "  APInt &Value = Inst;\n"
+      << "  APInt &op = Scratch;\n"
+      << "  switch (opcode) {\n";
+  } else {
+    O << "  uint64_t Value = InstBits[TableIndex];\n"
+      << "  uint64_t op = 0;\n"
+      << "  (void)op;  // suppress warning\n"
+      << "  switch (opcode) {\n";
+  }
 
-    // Emit each case statement
-    emitCaseMap(O, CaseMap);
+  // Emit each case statement
+  emitCaseMap(O, CaseMap);
 
-    // Default case: unhandled opcode
-    O << "  default:\n"
-      << "    std::string msg;\n"
-      << "    raw_string_ostream Msg(msg);\n"
-      << "    Msg << \"Not supported instr: \" << MI;\n"
-      << "    report_fatal_error(Msg.str().c_str());\n"
-      << "  }\n";
-    if (UseAPInt)
-      O << "  Inst = Value;\n";
-    else
-      O << "  return Value;\n";
-    O << "}\n\n";
-
-    O << "#ifdef GET_OPERAND_BIT_OFFSET\n"
-      << "#undef GET_OPERAND_BIT_OFFSET\n\n"
-      << "uint32_t " << Target.getName()
-      << "MCCodeEmitter::getOperandBitOffset(const MCInst &MI,\n"
-      << "    unsigned OpNum,\n"
-      << "    const MCSubtargetInfo &STI) const {\n"
-      << "  switch (MI.getOpcode()) {\n";
-    emitCaseMap(O, BitOffsetCaseMap);
-    O << "  }\n"
-      << "  std::string msg;\n"
-      << "  raw_string_ostream Msg(msg);\n"
-      << "  Msg << \"Not supported instr[opcode]: \" << MI << \"[\" << OpNum "
-         "<< \"]\";\n"
-      << "  report_fatal_error(Msg.str().c_str());\n"
-      << "}\n\n"
-      << "#endif // GET_OPERAND_BIT_OFFSET\n\n";
-  }
+  // Default case: unhandled opcode
+  O << "  default:";
+  ReportError(O, indent(4));
+  O << "  }\n";
+  if (UseAPInt)
+    O << "  Inst = Value;\n";
+  else
+    O << "  return Value;\n";
+  O << "}\n\n";
+
+  O << "#ifdef GET_OPERAND_BIT_OFFSET\n"
+    << "#undef GET_OPERAND_BIT_OFFSET\n\n"
+    << "uint32_t " << Target.getName()
+    << "MCCodeEmitter::getOperandBitOffset(const MCInst &MI,\n"
+    << "    unsigned OpNum,\n"
+    << "    const MCSubtargetInfo &STI) const {\n"
+    << "  switch (MI.getOpcode()) {\n";
+  emitCaseMap(O, BitOffsetCaseMap);
+  O << "  }\n";
+  ReportError(O, indent(2));
+  O << "}\n\n"
+    << "#endif // GET_OPERAND_BIT_OFFSET\n\n";
 }
 
-} // end anonymous namespace
-
 static TableGen::Emitter::OptClass<CodeEmitterGen>
     X("gen-emitter", "Generate machine code emitter");

jurahul · 2025-08-31T00:16:56Z

Diff might be easier to understand if you hide whitespace diffs

llvm/utils/TableGen/CodeEmitterGen.cpp

Change various `InstBits` tables have an entry only for non-pseudo target instructions and adjust the indexing into these tables accordingly. So minor refactoring related to this: - Use early return after handling variable length encodings - Extract code to generate error reporting code into a common `ReportError` function.

jurahul · 2025-09-08T14:11:23Z

@topperc @lenary @s-barannikov @mshockwave gentle ping.

lenary

This seems good to me, given that RISC-V has lots of pseudos.

topperc

LGTM

jurahul marked this pull request as ready for review August 31, 2025 00:11

llvmbot added the tablegen label Aug 31, 2025

jurahul requested review from s-barannikov and topperc August 31, 2025 00:11

jurahul requested a review from mshockwave August 31, 2025 00:13

jurahul requested a review from lenary September 2, 2025 16:14

s-barannikov reviewed Sep 3, 2025

View reviewed changes

llvm/utils/TableGen/CodeEmitterGen.cpp Outdated Show resolved Hide resolved

jurahul requested a review from s-barannikov September 3, 2025 22:35

jurahul added 2 commits September 3, 2025 17:37

Revert ReportError refactor

a85d4ce

jurahul force-pushed the cgmc_emitter_reduce_instbits_size branch from 83e6dff to a85d4ce Compare September 4, 2025 00:38

lenary approved these changes Sep 9, 2025

View reviewed changes

topperc approved these changes Sep 9, 2025

View reviewed changes

jurahul merged commit acea1f5 into llvm:main Sep 10, 2025
9 checks passed

jurahul deleted the cgmc_emitter_reduce_instbits_size branch September 10, 2025 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][MC][CodeEmitterGen] Reduce various `InstBits` table sizes #156213

[LLVM][MC][CodeEmitterGen] Reduce various `InstBits` table sizes #156213

Uh oh!

jurahul commented Aug 30, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 31, 2025

Uh oh!

jurahul commented Aug 31, 2025

Uh oh!

Uh oh!

jurahul commented Sep 8, 2025

Uh oh!

lenary left a comment

Uh oh!

topperc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[LLVM][MC][CodeEmitterGen] Reduce various InstBits table sizes #156213

[LLVM][MC][CodeEmitterGen] Reduce various InstBits table sizes #156213

Uh oh!

Conversation

jurahul commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 31, 2025

Uh oh!

jurahul commented Aug 31, 2025

Uh oh!

Uh oh!

jurahul commented Sep 8, 2025

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[LLVM][MC][CodeEmitterGen] Reduce various `InstBits` table sizes #156213

[LLVM][MC][CodeEmitterGen] Reduce various `InstBits` table sizes #156213

jurahul commented Aug 30, 2025 •

edited

Loading