Skip to content

Conversation

@jurahul
Copy link
Contributor

@jurahul jurahul commented Aug 30, 2025

Change various InstBits tables have an entry only for non-pseudo target instructions and adjust the indexing into these tables accordingly.

Some minor refactoring related to this:

  • Use early return after handling variable length encodings
  • Reduce the scope of anonymous namespace to just the class declaration.

Example reductions in these table sizes for some targets:

Target      FirstSupportedOpcode         Reduction in size
AMDGPU                     10813         10813 * 16 = 168KB
RISCV                      12051         12051 * 8 = 94KB

@jurahul jurahul marked this pull request as ready for review August 31, 2025 00:11
@llvmbot
Copy link
Member

llvmbot commented Aug 31, 2025

@llvm/pr-subscribers-tablegen

Author: Rahul Joshi (jurahul)

Changes

Change various InstBits tables have an entry only for non-pseudo target instructions and adjust the indexing into these tables accordingly.

Some minor refactoring related to this:

  • Use early return after handling variable length encodings
  • Extract code to generate error reporting code into a common ReportError function.

Example reductions in these table sizes for some targets:

Target      FirstSupportedOpcode         Reduction in size
AMDGPU                     10813         10813 * 16 = 168KB
RISCV                      12051         12051 * 8 = 94KB

Full diff: https://github.com/llvm/llvm-project/pull/156213.diff

3 Files Affected:

  • (modified) llvm/test/TableGen/HwModeEncodeAPInt.td (+26-20)
  • (modified) llvm/test/TableGen/HwModeEncodeDecode3.td (+22-18)
  • (modified) llvm/utils/TableGen/CodeEmitterGen.cpp (+126-129)
diff --git a/llvm/test/TableGen/HwModeEncodeAPInt.td b/llvm/test/TableGen/HwModeEncodeAPInt.td
index 43ca5edd952a6..e71950e2b01f3 100644
--- a/llvm/test/TableGen/HwModeEncodeAPInt.td
+++ b/llvm/test/TableGen/HwModeEncodeAPInt.td
@@ -114,32 +114,38 @@ def unrelated: Instruction {
 // For 'baz' we only assigned ModeB for it, so it will be presented
 // as '0' in the tables of ModeA, ModeC and Default Mode.
 // ENCODER-LABEL:   static const uint64_t InstBits[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),       // baz
-// ENCODER:         UINT64_C(8), UINT64_C(0),       // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),       // baz
+// ENCODER-NEXT:    UINT64_C(8), UINT64_C(0),       // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeA[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),       // baz
-// ENCODER:         UINT64_C(12), UINT64_C(0),      // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),       // baz
+// ENCODER-NEXT:    UINT64_C(12), UINT64_C(0),      // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeB[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // bar
-// ENCODER:         UINT64_C(12), UINT64_C(0),      // baz
-// ENCODER:         UINT64_C(0), UINT64_C(211106232532992),  // foo
-// ENCODER:         UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // bar
+// ENCODER-NEXT:    UINT64_C(12), UINT64_C(0),      // baz
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(211106232532992),  // foo
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),       // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeC[] = {
-// ENCODER:         UINT64_C(2), UINT64_C(0),      // bar
-// ENCODER:         UINT64_C(0), UINT64_C(0),      // baz
-// ENCODER:         UINT64_C(12582915),  UINT64_C(0),  // foo
-// ENCODER:         UINT64_C(2),  UINT64_C(0),     // unrelated
-
+// ENCODER-NEXT:    UINT64_C(2), UINT64_C(0),      // bar
+// ENCODER-NEXT:    UINT64_C(0), UINT64_C(0),      // baz
+// ENCODER-NEXT:    UINT64_C(12582915),  UINT64_C(0),  // foo
+// ENCODER-NEXT:    UINT64_C(2),  UINT64_C(0),     // unrelated
+// ENCODER-NEXT:    };
 
 // ENCODER: const uint64_t *InstBitsByHw;
+// ENCODER: constexpr unsigned FirstSupportedOpcode
 // ENCODER: const unsigned opcode = MI.getOpcode();
+// ENCODER: if (opcode < FirstSupportedOpcode)
+// ENCODER: unsigned TableIndex = opcode - FirstSupportedOpcode
 // ENCODER: if (Scratch.getBitWidth() != 128)
 // ENCODER:   Scratch = Scratch.zext(128);
-// ENCODER: Inst = APInt(128, ArrayRef(InstBits + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBits + TableIndex * 2, 2));
 // ENCODER: APInt &Value = Inst;
 // ENCODER: APInt &op = Scratch;
 // ENCODER: switch (opcode) {
@@ -155,7 +161,7 @@ def unrelated: Instruction {
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: case 3: InstBitsByHw = InstBits_ModeC; break;
 // ENCODER: };
-// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + TableIndex * 2, 2));
 // ENCODER: Value = Inst;
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
@@ -189,7 +195,7 @@ def unrelated: Instruction {
 // ENCODER: default: llvm_unreachable("Unknown hardware mode!"); break;
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: };
-// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + opcode * 2, 2));
+// ENCODER: Inst = APInt(128, ArrayRef(InstBitsByHw + TableIndex * 2, 2));
 // ENCODER: Value = Inst;
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
diff --git a/llvm/test/TableGen/HwModeEncodeDecode3.td b/llvm/test/TableGen/HwModeEncodeDecode3.td
index dbbf866f057e5..dc2b850cd1096 100644
--- a/llvm/test/TableGen/HwModeEncodeDecode3.td
+++ b/llvm/test/TableGen/HwModeEncodeDecode3.td
@@ -189,25 +189,29 @@ def unrelated: Instruction {
 // For 'baz' we only assigned ModeB for it, so it will be presented
 // as '0' in the tables of ModeA, ModeC and Default Mode.
 // ENCODER-LABEL:   static const uint64_t InstBits[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(8),        // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(8),        // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeA[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(12),       // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(12),       // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeB[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(12),       // baz
-// ENCODER:         UINT64_C(3),        // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(12),       // baz
+// ENCODER-NEXT:    UINT64_C(3),        // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 // ENCODER-LABEL:   static const uint64_t InstBits_ModeC[] = {
-// ENCODER:         UINT64_C(2),        // bar
-// ENCODER:         UINT64_C(0),        // baz
-// ENCODER:         UINT64_C(12582915), // foo
-// ENCODER:         UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    UINT64_C(2),        // bar
+// ENCODER-NEXT:    UINT64_C(0),        // baz
+// ENCODER-NEXT:    UINT64_C(12582915), // foo
+// ENCODER-NEXT:    UINT64_C(2),        // unrelated
+// ENCODER-NEXT:    };
 
 // ENCODER-LABEL: case ::bar:
 // ENCODER-LABEL: case ::unrelated:
@@ -221,7 +225,7 @@ def unrelated: Instruction {
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: case 3: InstBitsByHw = InstBits_ModeC; break;
 // ENCODER: };
-// ENCODER: Value = InstBitsByHw[opcode];
+// ENCODER: Value = InstBitsByHw[TableIndex];
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
 // ENCODER: case 0: {
@@ -256,7 +260,7 @@ def unrelated: Instruction {
 // ENCODER: default: llvm_unreachable("Unknown hardware mode!"); break;
 // ENCODER: case 2: InstBitsByHw = InstBits_ModeB; break;
 // ENCODER: };
-// ENCODER: Value = InstBitsByHw[opcode];
+// ENCODER: Value = InstBitsByHw[TableIndex];
 // ENCODER: switch (HwMode) {
 // ENCODER: default: llvm_unreachable("Unhandled HwMode");
 // ENCODER: case 2: {
diff --git a/llvm/utils/TableGen/CodeEmitterGen.cpp b/llvm/utils/TableGen/CodeEmitterGen.cpp
index 252c2521c0694..128da2ed49661 100644
--- a/llvm/utils/TableGen/CodeEmitterGen.cpp
+++ b/llvm/utils/TableGen/CodeEmitterGen.cpp
@@ -31,6 +31,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
@@ -78,6 +79,8 @@ class CodeEmitterGen {
   bool UseAPInt = false;
 };
 
+} // end anonymous namespace
+
 // If the VarBitInit at position 'bit' matches the specified variable then
 // return the variable bit position.  Otherwise return -1.
 int CodeEmitterGen::getVariableBit(const std::string &VarName,
@@ -311,12 +314,12 @@ CodeEmitterGen::getInstructionCases(const Record *R,
     if (UseAPInt) {
       int NumWords = APInt::getNumWords(BitWidth);
       Case += "      Inst = APInt(" + itostr(BitWidth);
-      Case += ", ArrayRef(InstBitsByHw + opcode * " + itostr(NumWords) + ", " +
-              itostr(NumWords);
+      Case += ", ArrayRef(InstBitsByHw + TableIndex * " + itostr(NumWords) +
+              ", " + itostr(NumWords);
       Case += "));\n";
       Case += "      Value = Inst;\n";
     } else {
-      Case += "      Value = InstBitsByHw[opcode];\n";
+      Case += "      Value = InstBitsByHw[TableIndex];\n";
     }
 
     Append("      switch (HwMode) {\n");
@@ -396,15 +399,6 @@ void CodeEmitterGen::emitInstructionBaseValues(
 
   for (const CodeGenInstruction *CGI : NumberedInstructions) {
     const Record *R = CGI->TheDef;
-
-    if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-        R->getValueAsBit("isPseudo")) {
-      O << "    ";
-      emitInstBits(O, APInt(BitWidth, 0));
-      O << ",\n";
-      continue;
-    }
-
     const Record *EncodingDef = R;
     if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
       EncodingInfoByHwMode EBM(RV, HWM);
@@ -453,6 +447,16 @@ void CodeEmitterGen::emitCaseMap(
   }
 }
 
+static void ReportError(raw_ostream &O, indent Indent) {
+  static constexpr char ReportErrorCode[] = R"(
+{0}std::string msg;
+{0}raw_string_ostream Msg(msg);
+{0}Msg << "Unsupported instruction: " << MI;
+{0}report_fatal_error(Msg.str().c_str());
+)";
+  O << formatv(ReportErrorCode, Indent);
+}
+
 void CodeEmitterGen::run(raw_ostream &O) {
   emitSourceFileHeader("Machine Code Emitter", O);
 
@@ -461,138 +465,131 @@ void CodeEmitterGen::run(raw_ostream &O) {
   // For little-endian instruction bit encodings, reverse the bit order
   Target.reverseBitsForLittleEndianEncoding();
 
-  ArrayRef<const CodeGenInstruction *> NumberedInstructions =
-      Target.getInstructions();
+  ArrayRef<const CodeGenInstruction *> EncodedInstructions =
+      Target.getTargetNonPseudoInstructions();
 
   if (Target.hasVariableLengthEncodings()) {
     emitVarLenCodeEmitter(Records, O);
-  } else {
-    const CodeGenHwModes &HWM = Target.getHwModes();
-    // The set of HwModes used by instruction encodings.
-    std::set<unsigned> HwModes;
-    BitWidth = 0;
-    for (const CodeGenInstruction *CGI : NumberedInstructions) {
-      const Record *R = CGI->TheDef;
-      if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-          R->getValueAsBit("isPseudo"))
-        continue;
-
-      if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
-        EncodingInfoByHwMode EBM(RV, HWM);
-        for (const auto &[Key, Value] : EBM) {
-          const BitsInit *BI = Value->getValueAsBitsInit("Inst");
-          BitWidth = std::max(BitWidth, BI->getNumBits());
-          HwModes.insert(Key);
-        }
-        continue;
+    return;
+  }
+  const CodeGenHwModes &HWM = Target.getHwModes();
+  // The set of HwModes used by instruction encodings.
+  std::set<unsigned> HwModes;
+  BitWidth = 0;
+  for (const CodeGenInstruction *CGI : EncodedInstructions) {
+    const Record *R = CGI->TheDef;
+    if (const Record *RV = R->getValueAsOptionalDef("EncodingInfos")) {
+      EncodingInfoByHwMode EBM(RV, HWM);
+      for (const auto &[Key, Value] : EBM) {
+        const BitsInit *BI = Value->getValueAsBitsInit("Inst");
+        BitWidth = std::max(BitWidth, BI->getNumBits());
+        HwModes.insert(Key);
       }
-      const BitsInit *BI = R->getValueAsBitsInit("Inst");
-      BitWidth = std::max(BitWidth, BI->getNumBits());
+      continue;
     }
-    UseAPInt = BitWidth > 64;
+    const BitsInit *BI = R->getValueAsBitsInit("Inst");
+    BitWidth = std::max(BitWidth, BI->getNumBits());
+  }
+  UseAPInt = BitWidth > 64;
+
+  // Emit function declaration
+  if (UseAPInt) {
+    O << "void " << Target.getName()
+      << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
+      << "    SmallVectorImpl<MCFixup> &Fixups,\n"
+      << "    APInt &Inst,\n"
+      << "    APInt &Scratch,\n"
+      << "    const MCSubtargetInfo &STI) const {\n";
+  } else {
+    O << "uint64_t " << Target.getName();
+    O << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
+      << "    SmallVectorImpl<MCFixup> &Fixups,\n"
+      << "    const MCSubtargetInfo &STI) const {\n";
+  }
 
-    // Emit function declaration
-    if (UseAPInt) {
-      O << "void " << Target.getName()
-        << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
-        << "    SmallVectorImpl<MCFixup> &Fixups,\n"
-        << "    APInt &Inst,\n"
-        << "    APInt &Scratch,\n"
-        << "    const MCSubtargetInfo &STI) const {\n";
-    } else {
-      O << "uint64_t " << Target.getName();
-      O << "MCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,\n"
-        << "    SmallVectorImpl<MCFixup> &Fixups,\n"
-        << "    const MCSubtargetInfo &STI) const {\n";
+  // Emit instruction base values
+  emitInstructionBaseValues(O, EncodedInstructions, Target, DefaultMode);
+  if (!HwModes.empty()) {
+    // Emit table for instrs whose encodings are controlled by HwModes.
+    for (unsigned HwMode : HwModes) {
+      if (HwMode == DefaultMode)
+        continue;
+      emitInstructionBaseValues(O, EncodedInstructions, Target, HwMode);
     }
 
-    // Emit instruction base values
-    emitInstructionBaseValues(O, NumberedInstructions, Target, DefaultMode);
-    if (!HwModes.empty()) {
-      // Emit table for instrs whose encodings are controlled by HwModes.
-      for (unsigned HwMode : HwModes) {
-        if (HwMode == DefaultMode)
-          continue;
-        emitInstructionBaseValues(O, NumberedInstructions, Target, HwMode);
-      }
-
-      // This pointer will be assigned to the HwMode table later.
-      O << "  const uint64_t *InstBitsByHw;\n";
-    }
+    // This pointer will be assigned to the HwMode table later.
+    O << "  const uint64_t *InstBitsByHw;\n";
+  }
 
-    // Map to accumulate all the cases.
-    std::map<std::string, std::vector<std::string>> CaseMap;
-    std::map<std::string, std::vector<std::string>> BitOffsetCaseMap;
+  // Map to accumulate all the cases.
+  std::map<std::string, std::vector<std::string>> CaseMap;
+  std::map<std::string, std::vector<std::string>> BitOffsetCaseMap;
 
-    // Construct all cases statement for each opcode
-    for (const Record *R : Records.getAllDerivedDefinitions("Instruction")) {
-      if (R->getValueAsString("Namespace") == "TargetOpcode" ||
-          R->getValueAsBit("isPseudo"))
-        continue;
-      std::string InstName =
-          (R->getValueAsString("Namespace") + "::" + R->getName()).str();
-      std::string Case, BitOffsetCase;
-      std::tie(Case, BitOffsetCase) = getInstructionCases(R, Target);
+  // Construct all cases statement for each opcode
+  for (const CodeGenInstruction *CGI : EncodedInstructions) {
+    const Record *R = CGI->TheDef;
+    std::string InstName =
+        (R->getValueAsString("Namespace") + "::" + R->getName()).str();
+    std::string Case, BitOffsetCase;
+    std::tie(Case, BitOffsetCase) = getInstructionCases(R, Target);
 
-      CaseMap[Case].push_back(InstName);
-      BitOffsetCaseMap[BitOffsetCase].push_back(std::move(InstName));
-    }
+    CaseMap[Case].push_back(InstName);
+    BitOffsetCaseMap[BitOffsetCase].push_back(std::move(InstName));
+  }
 
-    // Emit initial function code
-    if (UseAPInt) {
-      int NumWords = APInt::getNumWords(BitWidth);
-      O << "  const unsigned opcode = MI.getOpcode();\n"
-        << "  if (Scratch.getBitWidth() != " << BitWidth << ")\n"
-        << "    Scratch = Scratch.zext(" << BitWidth << ");\n"
-        << "  Inst = APInt(" << BitWidth << ", ArrayRef(InstBits + opcode * "
-        << NumWords << ", " << NumWords << "));\n"
-        << "  APInt &Value = Inst;\n"
-        << "  APInt &op = Scratch;\n"
-        << "  switch (opcode) {\n";
-    } else {
-      O << "  const unsigned opcode = MI.getOpcode();\n"
-        << "  uint64_t Value = InstBits[opcode];\n"
-        << "  uint64_t op = 0;\n"
-        << "  (void)op;  // suppress warning\n"
-        << "  switch (opcode) {\n";
-    }
+  unsigned FirstSupportedOpcode = EncodedInstructions.front()->EnumVal;
+  O << "  constexpr unsigned FirstSupportedOpcode = " << FirstSupportedOpcode
+    << ";\n";
+
+  O << "  const unsigned opcode = MI.getOpcode();\n";
+  O << "  if (opcode < FirstSupportedOpcode) {";
+  ReportError(O, indent(4));
+  O << "  }\n";
+  O << "  unsigned TableIndex = opcode - FirstSupportedOpcode;\n";
+
+  // Emit initial function code
+  if (UseAPInt) {
+    int NumWords = APInt::getNumWords(BitWidth);
+    O << "  if (Scratch.getBitWidth() != " << BitWidth << ")\n"
+      << "    Scratch = Scratch.zext(" << BitWidth << ");\n"
+      << "  Inst = APInt(" << BitWidth << ", ArrayRef(InstBits + TableIndex * "
+      << NumWords << ", " << NumWords << "));\n"
+      << "  APInt &Value = Inst;\n"
+      << "  APInt &op = Scratch;\n"
+      << "  switch (opcode) {\n";
+  } else {
+    O << "  uint64_t Value = InstBits[TableIndex];\n"
+      << "  uint64_t op = 0;\n"
+      << "  (void)op;  // suppress warning\n"
+      << "  switch (opcode) {\n";
+  }
 
-    // Emit each case statement
-    emitCaseMap(O, CaseMap);
+  // Emit each case statement
+  emitCaseMap(O, CaseMap);
 
-    // Default case: unhandled opcode
-    O << "  default:\n"
-      << "    std::string msg;\n"
-      << "    raw_string_ostream Msg(msg);\n"
-      << "    Msg << \"Not supported instr: \" << MI;\n"
-      << "    report_fatal_error(Msg.str().c_str());\n"
-      << "  }\n";
-    if (UseAPInt)
-      O << "  Inst = Value;\n";
-    else
-      O << "  return Value;\n";
-    O << "}\n\n";
-
-    O << "#ifdef GET_OPERAND_BIT_OFFSET\n"
-      << "#undef GET_OPERAND_BIT_OFFSET\n\n"
-      << "uint32_t " << Target.getName()
-      << "MCCodeEmitter::getOperandBitOffset(const MCInst &MI,\n"
-      << "    unsigned OpNum,\n"
-      << "    const MCSubtargetInfo &STI) const {\n"
-      << "  switch (MI.getOpcode()) {\n";
-    emitCaseMap(O, BitOffsetCaseMap);
-    O << "  }\n"
-      << "  std::string msg;\n"
-      << "  raw_string_ostream Msg(msg);\n"
-      << "  Msg << \"Not supported instr[opcode]: \" << MI << \"[\" << OpNum "
-         "<< \"]\";\n"
-      << "  report_fatal_error(Msg.str().c_str());\n"
-      << "}\n\n"
-      << "#endif // GET_OPERAND_BIT_OFFSET\n\n";
-  }
+  // Default case: unhandled opcode
+  O << "  default:";
+  ReportError(O, indent(4));
+  O << "  }\n";
+  if (UseAPInt)
+    O << "  Inst = Value;\n";
+  else
+    O << "  return Value;\n";
+  O << "}\n\n";
+
+  O << "#ifdef GET_OPERAND_BIT_OFFSET\n"
+    << "#undef GET_OPERAND_BIT_OFFSET\n\n"
+    << "uint32_t " << Target.getName()
+    << "MCCodeEmitter::getOperandBitOffset(const MCInst &MI,\n"
+    << "    unsigned OpNum,\n"
+    << "    const MCSubtargetInfo &STI) const {\n"
+    << "  switch (MI.getOpcode()) {\n";
+  emitCaseMap(O, BitOffsetCaseMap);
+  O << "  }\n";
+  ReportError(O, indent(2));
+  O << "}\n\n"
+    << "#endif // GET_OPERAND_BIT_OFFSET\n\n";
 }
 
-} // end anonymous namespace
-
 static TableGen::Emitter::OptClass<CodeEmitterGen>
     X("gen-emitter", "Generate machine code emitter");

@jurahul jurahul requested a review from mshockwave August 31, 2025 00:13
@jurahul
Copy link
Contributor Author

jurahul commented Aug 31, 2025

Diff might be easier to understand if you hide whitespace diffs

@jurahul jurahul requested a review from lenary September 2, 2025 16:14
@jurahul jurahul requested a review from s-barannikov September 3, 2025 22:35
Change various `InstBits` tables have an entry only for non-pseudo
target instructions and adjust the indexing into these tables
accordingly.

So minor refactoring related to this:
- Use early return after handling variable length encodings
- Extract code to generate error reporting code into a common
  `ReportError` function.
@jurahul jurahul force-pushed the cgmc_emitter_reduce_instbits_size branch from 83e6dff to a85d4ce Compare September 4, 2025 00:38
@jurahul
Copy link
Contributor Author

jurahul commented Sep 8, 2025

@topperc @lenary @s-barannikov @mshockwave gentle ping.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me, given that RISC-V has lots of pseudos.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jurahul jurahul merged commit acea1f5 into llvm:main Sep 10, 2025
9 checks passed
@jurahul jurahul deleted the cgmc_emitter_reduce_instbits_size branch September 10, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants