Skip to content

Commit dafffe2

Browse files
authored
[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865)
This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. **Current state**: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for *all* instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is *never exercised*. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). **New state**: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```
1 parent 33d5a3b commit dafffe2

File tree

13 files changed

+362
-125
lines changed

13 files changed

+362
-125
lines changed

llvm/include/llvm/MC/MCDecoder.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
1414
#include "llvm/Support/MathExtras.h"
15+
#include <bitset>
1516
#include <cassert>
1617

1718
namespace llvm::MCD {
@@ -48,6 +49,15 @@ fieldFromInstruction(const InsnType &Insn, unsigned StartBit,
4849
return Insn.extractBitsAsZExtValue(NumBits, StartBit);
4950
}
5051

52+
template <size_t N>
53+
uint64_t fieldFromInstruction(const std::bitset<N> &Insn, unsigned StartBit,
54+
unsigned NumBits) {
55+
assert(StartBit + NumBits <= N && "Instruction field out of bounds!");
56+
assert(NumBits <= 64 && "Cannot support >64-bit extractions!");
57+
const std::bitset<N> Mask(maskTrailingOnes<uint64_t>(NumBits));
58+
return ((Insn >> StartBit) & Mask).to_ullong();
59+
}
60+
5161
// Helper function for inserting bits extracted from an encoded instruction into
5262
// an integer-typed field.
5363
template <typename IntType>
@@ -62,6 +72,13 @@ insertBits(IntType &field, IntType bits, unsigned startBit, unsigned numBits) {
6272
field |= bits << startBit;
6373
}
6474

75+
// InsnBitWidth is essentially a type trait used by the decoder emitter to query
76+
// the supported bitwidth for a given type. But default, the value is 0, making
77+
// it an invalid type for use as `InsnType` when instantiating the decoder.
78+
// Individual targets are expected to provide specializations for these based
79+
// on their usage.
80+
template <typename T> static constexpr uint32_t InsnBitWidth = 0;
81+
6582
} // namespace llvm::MCD
6683

6784
#endif // LLVM_MC_MCDECODER_H

llvm/lib/Target/AMDGPU/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
66
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
77
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
88
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
9-
tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
9+
tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler
10+
--specialize-decoders-per-bitwidth)
1011
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
1112
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
1213
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
#include "llvm/Support/Compiler.h"
3939

4040
using namespace llvm;
41+
using namespace llvm::MCD;
4142

4243
#define DEBUG_TYPE "amdgpu-disassembler"
4344

@@ -446,6 +447,14 @@ static DecodeStatus decodeVersionImm(MCInst &Inst, unsigned Imm,
446447

447448
#include "AMDGPUGenDisassemblerTables.inc"
448449

450+
// Define bitwidths for various types used to instantiate the decoder.
451+
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
452+
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 64;
453+
template <>
454+
static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<96>> = 96;
455+
template <>
456+
static constexpr uint32_t llvm::MCD::InsnBitWidth<std::bitset<128>> = 128;
457+
449458
//===----------------------------------------------------------------------===//
450459
//
451460
//===----------------------------------------------------------------------===//
@@ -498,26 +507,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
498507
return Res;
499508
}
500509

501-
static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
510+
static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
511+
using namespace llvm::support::endian;
502512
assert(Bytes.size() >= 12);
503-
uint64_t Lo =
504-
support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
513+
std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
505514
Bytes = Bytes.slice(8);
506-
uint64_t Hi =
507-
support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
515+
std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
508516
Bytes = Bytes.slice(4);
509-
return DecoderUInt128(Lo, Hi);
517+
return (Hi << 64) | Lo;
510518
}
511519

512-
static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
520+
static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
521+
using namespace llvm::support::endian;
513522
assert(Bytes.size() >= 16);
514-
uint64_t Lo =
515-
support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
523+
std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
516524
Bytes = Bytes.slice(8);
517-
uint64_t Hi =
518-
support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
525+
std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
519526
Bytes = Bytes.slice(8);
520-
return DecoderUInt128(Lo, Hi);
527+
return (Hi << 64) | Lo;
521528
}
522529

523530
void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -600,14 +607,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
600607
// Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
601608
// encodings
602609
if (isGFX1250() && Bytes.size() >= 16) {
603-
DecoderUInt128 DecW = eat16Bytes(Bytes);
610+
std::bitset<128> DecW = eat16Bytes(Bytes);
604611
if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
605612
break;
606613
Bytes = Bytes_.slice(0, MaxInstBytesNum);
607614
}
608615

609616
if (isGFX11Plus() && Bytes.size() >= 12) {
610-
DecoderUInt128 DecW = eat12Bytes(Bytes);
617+
std::bitset<96> DecW = eat12Bytes(Bytes);
611618

612619
if (isGFX11() &&
613620
tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -642,7 +649,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
642649

643650
} else if (Bytes.size() >= 16 &&
644651
STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
645-
DecoderUInt128 DecW = eat16Bytes(Bytes);
652+
std::bitset<128> DecW = eat16Bytes(Bytes);
646653
if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
647654
break;
648655

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h

Lines changed: 0 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -32,44 +32,6 @@ class MCOperand;
3232
class MCSubtargetInfo;
3333
class Twine;
3434

35-
// Exposes an interface expected by autogenerated code in
36-
// FixedLenDecoderEmitter
37-
class DecoderUInt128 {
38-
private:
39-
uint64_t Lo = 0;
40-
uint64_t Hi = 0;
41-
42-
public:
43-
DecoderUInt128() = default;
44-
DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
45-
operator bool() const { return Lo || Hi; }
46-
uint64_t extractBitsAsZExtValue(unsigned NumBits,
47-
unsigned BitPosition) const {
48-
assert(NumBits && NumBits <= 64);
49-
assert(BitPosition < 128);
50-
uint64_t Val;
51-
if (BitPosition < 64)
52-
Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
53-
else
54-
Val = Hi >> (BitPosition - 64);
55-
return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
56-
}
57-
DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
58-
return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
59-
}
60-
DecoderUInt128 operator&(const uint64_t &RHS) const {
61-
return *this & DecoderUInt128(RHS);
62-
}
63-
DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
64-
bool operator==(const DecoderUInt128 &RHS) {
65-
return Lo == RHS.Lo && Hi == RHS.Hi;
66-
}
67-
bool operator!=(const DecoderUInt128 &RHS) {
68-
return Lo != RHS.Lo || Hi != RHS.Hi;
69-
}
70-
bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
71-
};
72-
7335
//===----------------------------------------------------------------------===//
7436
// AMDGPUDisassembler
7537
//===----------------------------------------------------------------------===//

llvm/lib/Target/RISCV/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
77
tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
88
tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
99
tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
10-
tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
10+
tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler
11+
--specialize-decoders-per-bitwidth)
1112
tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
1213
tablegen(LLVM RISCVGenMCCodeEmitter.inc -gen-emitter)
1314
tablegen(LLVM RISCVGenMCPseudoLowering.inc -gen-pseudo-lowering)

llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -558,7 +558,7 @@ static DecodeStatus decodeXqccmpRlistS0(MCInst &Inst, uint32_t Imm,
558558
return decodeZcmpRlist(Inst, Imm, Address, Decoder);
559559
}
560560

561-
static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint32_t Insn,
561+
static DecodeStatus decodeCSSPushPopchk(MCInst &Inst, uint16_t Insn,
562562
uint64_t Address,
563563
const MCDisassembler *Decoder) {
564564
uint32_t Rs1 = fieldFromInstruction(Insn, 7, 5);
@@ -701,6 +701,12 @@ static constexpr DecoderListEntry DecoderList32[]{
701701
{DecoderTableZdinxRV32Only32, {}, "RV32-only Zdinx (Double in Integer)"},
702702
};
703703

704+
// Define bitwidths for various types used to instantiate the decoder.
705+
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint16_t> = 16;
706+
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint32_t> = 32;
707+
// Use uint64_t to represent 48 bit instructions.
708+
template <> static constexpr uint32_t llvm::MCD::InsnBitWidth<uint64_t> = 48;
709+
704710
DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
705711
ArrayRef<uint8_t> Bytes,
706712
uint64_t Address,
@@ -711,9 +717,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
711717
}
712718
Size = 4;
713719

714-
// Use uint64_t to match getInstruction48. decodeInstruction is templated
715-
// on the Insn type.
716-
uint64_t Insn = support::endian::read32le(Bytes.data());
720+
uint32_t Insn = support::endian::read32le(Bytes.data());
717721

718722
for (const DecoderListEntry &Entry : DecoderList32) {
719723
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -759,9 +763,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
759763
}
760764
Size = 2;
761765

762-
// Use uint64_t to match getInstruction48. decodeInstruction is templated
763-
// on the Insn type.
764-
uint64_t Insn = support::endian::read16le(Bytes.data());
766+
uint16_t Insn = support::endian::read16le(Bytes.data());
765767

766768
for (const DecoderListEntry &Entry : DecoderList16) {
767769
if (!Entry.haveContainedFeatures(STI.getFeatureBits()))

0 commit comments

Comments
 (0)