Skip to content

Commit eb6da94

Browse files
authored
[lldb] Improve disassembly of unknown instructions (#145793)
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Code for the new Opcode Type eType16_32Tuples by Jason Molenda. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. - Added disassembly byte test for x86 with known and unknown instructions. - Added disassembly byte test for riscv32 with known and unknown instructions, with and without filtering. - Added test from Jason Molenda to RISC-V disassembly unit tests.
1 parent 91b3dbe commit eb6da94

File tree

11 files changed

+315
-41
lines changed

11 files changed

+315
-41
lines changed

lldb/examples/python/filter_disasm.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
"""
2+
Defines a command, fdis, that does filtered disassembly. The command does the
3+
lldb disassemble command with -b and any other arguments passed in, and
4+
pipes that through a provided filter program.
5+
6+
The intention is to support disassembly of RISC-V proprietary instructions.
7+
This is handled with llvm-objdump by piping the output of llvm-objdump through
8+
a filter program. This script is intended to mimic that workflow.
9+
"""
10+
11+
import lldb
12+
import subprocess
13+
14+
filter_program = "crustfilt"
15+
16+
17+
def __lldb_init_module(debugger, dict):
18+
debugger.HandleCommand("command script add -f filter_disasm.fdis fdis")
19+
print("Disassembly filter command (fdis) loaded")
20+
print("Filter program set to %s" % filter_program)
21+
22+
23+
def fdis(debugger, args, exe_ctx, result, dict):
24+
"""
25+
Call the built in disassembler, then pass its output to a filter program
26+
to add in disassembly for hidden opcodes.
27+
Except for get and set, use the fdis command like the disassemble command.
28+
By default, the filter program is crustfilt, from
29+
https://github.com/quic/crustfilt . This can be changed by changing
30+
the global variable filter_program.
31+
32+
Usage:
33+
fdis [[get] [set <program>] [<disassembly options>]]
34+
35+
Choose one of the following:
36+
get
37+
Gets the current filter program
38+
39+
set <program>
40+
Sets the current filter program. This can be an executable, which
41+
will be found on PATH, or an absolute path.
42+
43+
<disassembly options>
44+
If the first argument is not get or set, the args will be passed
45+
to the disassemble command as is.
46+
47+
"""
48+
49+
global filter_program
50+
args_list = args.split(" ")
51+
result.Clear()
52+
53+
if len(args_list) == 1 and args_list[0] == "get":
54+
result.PutCString(filter_program)
55+
result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
56+
return
57+
58+
if len(args_list) == 2 and args_list[0] == "set":
59+
filter_program = args_list[1]
60+
result.PutCString("Filter program set to %s" % filter_program)
61+
result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
62+
return
63+
64+
res = lldb.SBCommandReturnObject()
65+
debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, exe_ctx, res)
66+
if len(res.GetError()) > 0:
67+
result.SetError(res.GetError())
68+
result.SetStatus(lldb.eReturnStatusFailed)
69+
return
70+
output = res.GetOutput()
71+
72+
try:
73+
proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
74+
except (subprocess.SubprocessError, OSError) as e:
75+
result.PutCString("Error occurred. Original disassembly:\n\n" + output)
76+
result.SetError(str(e))
77+
result.SetStatus(lldb.eReturnStatusFailed)
78+
return
79+
80+
if proc.returncode:
81+
result.PutCString("warning: {} returned non-zero value {}".format(filter_program, proc.returncode))
82+
83+
result.PutCString(proc.stdout)
84+
result.SetStatus(lldb.eReturnStatusSuccessFinishResult)

lldb/include/lldb/Core/Opcode.h

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,10 @@ class Opcode {
3232
eTypeInvalid,
3333
eType8,
3434
eType16,
35-
eType16_2, // a 32-bit Thumb instruction, made up of two words
35+
eType16_2, // a 32-bit Thumb instruction, made up of two words
36+
eType16_32Tuples, // RISC-V that can have 2, 4, 6, 8 etc byte long
37+
// instructions which will be printed in combinations of
38+
// 16 & 32-bit words.
3639
eType32,
3740
eType64,
3841
eTypeBytes
@@ -60,9 +63,9 @@ class Opcode {
6063
m_data.inst64 = inst;
6164
}
6265

63-
Opcode(uint8_t *bytes, size_t length)
64-
: m_byte_order(lldb::eByteOrderInvalid) {
65-
SetOpcodeBytes(bytes, length);
66+
Opcode(uint8_t *bytes, size_t length, Opcode::Type type,
67+
lldb::ByteOrder order) {
68+
DoSetOpcodeBytes(bytes, length, type, order);
6669
}
6770

6871
void Clear() {
@@ -82,6 +85,8 @@ class Opcode {
8285
break;
8386
case Opcode::eType16_2:
8487
break;
88+
case Opcode::eType16_32Tuples:
89+
break;
8590
case Opcode::eType32:
8691
break;
8792
case Opcode::eType64:
@@ -103,6 +108,8 @@ class Opcode {
103108
: m_data.inst16;
104109
case Opcode::eType16_2:
105110
break;
111+
case Opcode::eType16_32Tuples:
112+
break;
106113
case Opcode::eType32:
107114
break;
108115
case Opcode::eType64:
@@ -122,6 +129,8 @@ class Opcode {
122129
case Opcode::eType16:
123130
return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
124131
: m_data.inst16;
132+
case Opcode::eType16_32Tuples:
133+
break;
125134
case Opcode::eType16_2: // passthrough
126135
case Opcode::eType32:
127136
return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -143,6 +152,8 @@ class Opcode {
143152
case Opcode::eType16:
144153
return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
145154
: m_data.inst16;
155+
case Opcode::eType16_32Tuples:
156+
break;
146157
case Opcode::eType16_2: // passthrough
147158
case Opcode::eType32:
148159
return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -186,20 +197,30 @@ class Opcode {
186197
m_byte_order = order;
187198
}
188199

200+
void SetOpcode16_32TupleBytes(const void *bytes, size_t length,
201+
lldb::ByteOrder order) {
202+
DoSetOpcodeBytes(bytes, length, eType16_32Tuples, order);
203+
}
204+
189205
void SetOpcodeBytes(const void *bytes, size_t length) {
206+
DoSetOpcodeBytes(bytes, length, eTypeBytes, lldb::eByteOrderInvalid);
207+
}
208+
209+
void DoSetOpcodeBytes(const void *bytes, size_t length, Opcode::Type type,
210+
lldb::ByteOrder order) {
190211
if (bytes != nullptr && length > 0) {
191-
m_type = eTypeBytes;
212+
m_type = type;
192213
m_data.inst.length = length;
193214
assert(length < sizeof(m_data.inst.bytes));
194215
memcpy(m_data.inst.bytes, bytes, length);
195-
m_byte_order = lldb::eByteOrderInvalid;
216+
m_byte_order = order;
196217
} else {
197218
m_type = eTypeInvalid;
198219
m_data.inst.length = 0;
199220
}
200221
}
201222

202-
int Dump(Stream *s, uint32_t min_byte_width);
223+
int Dump(Stream *s, uint32_t min_byte_width) const;
203224

204225
const void *GetOpcodeBytes() const {
205226
return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
@@ -213,6 +234,8 @@ class Opcode {
213234
return sizeof(m_data.inst8);
214235
case Opcode::eType16:
215236
return sizeof(m_data.inst16);
237+
case Opcode::eType16_32Tuples:
238+
return m_data.inst.length;
216239
case Opcode::eType16_2: // passthrough
217240
case Opcode::eType32:
218241
return sizeof(m_data.inst32);
@@ -238,6 +261,8 @@ class Opcode {
238261
return &m_data.inst8;
239262
case Opcode::eType16:
240263
return &m_data.inst16;
264+
case Opcode::eType16_32Tuples:
265+
return m_data.inst.bytes;
241266
case Opcode::eType16_2: // passthrough
242267
case Opcode::eType32:
243268
return &m_data.inst32;

lldb/source/Core/Disassembler.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -685,10 +685,12 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
685685
}
686686
}
687687
const size_t opcode_pos = ss.GetSizeOfLastLine();
688-
const std::string &opcode_name =
689-
show_color ? m_markup_opcode_name : m_opcode_name;
688+
std::string &opcode_name = show_color ? m_markup_opcode_name : m_opcode_name;
690689
const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
691690

691+
if (opcode_name.empty())
692+
opcode_name = "<unknown>";
693+
692694
// The default opcode size of 7 characters is plenty for most architectures
693695
// but some like arm can pull out the occasional vqrshrun.s16. We won't get
694696
// consistent column spacing in these cases, unfortunately. Also note that we

lldb/source/Core/Opcode.cpp

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
using namespace lldb;
2222
using namespace lldb_private;
2323

24-
int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
24+
int Opcode::Dump(Stream *s, uint32_t min_byte_width) const {
2525
const uint32_t previous_bytes = s->GetWrittenBytes();
2626
switch (m_type) {
2727
case Opcode::eTypeInvalid:
@@ -38,6 +38,27 @@ int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
3838
s->Printf("0x%8.8x", m_data.inst32);
3939
break;
4040

41+
case Opcode::eType16_32Tuples: {
42+
const bool format_as_words = (m_data.inst.length % 4) == 0;
43+
uint32_t i = 0;
44+
while (i < m_data.inst.length) {
45+
if (i > 0)
46+
s->PutChar(' ');
47+
if (format_as_words) {
48+
// Format as words; print 1 or more UInt32 values.
49+
s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
50+
m_data.inst.bytes[i + 2], m_data.inst.bytes[i + 1],
51+
m_data.inst.bytes[i + 0]);
52+
i += 4;
53+
} else {
54+
// Format as halfwords; print 1 or more UInt16 values.
55+
s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
56+
m_data.inst.bytes[i + 0]);
57+
i += 2;
58+
}
59+
}
60+
} break;
61+
4162
case Opcode::eType64:
4263
s->Printf("0x%16.16" PRIx64, m_data.inst64);
4364
break;
@@ -69,6 +90,7 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
6990
case Opcode::eType8:
7091
case Opcode::eType16:
7192
case Opcode::eType16_2:
93+
case Opcode::eType16_32Tuples:
7294
case Opcode::eType32:
7395
case Opcode::eType64:
7496
return endian::InlHostByteOrder();
@@ -113,6 +135,9 @@ uint32_t Opcode::GetData(DataExtractor &data) const {
113135
swap_buf[3] = m_data.inst.bytes[2];
114136
buf = swap_buf;
115137
break;
138+
case Opcode::eType16_32Tuples:
139+
buf = GetOpcodeDataBytes();
140+
break;
116141
case Opcode::eType32:
117142
*(uint32_t *)swap_buf = llvm::byteswap<uint32_t>(m_data.inst32);
118143
buf = swap_buf;

lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp

Lines changed: 35 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
6161

6262
uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
6363
lldb::addr_t pc, llvm::MCInst &mc_inst) const;
64+
bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
65+
lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
6466
void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
6567
std::string &inst_string, std::string &comments_string);
6668
void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -486,8 +488,13 @@ class InstructionLLVMC : public lldb_private::Instruction {
486488
break;
487489

488490
default:
489-
m_opcode.SetOpcodeBytes(data.PeekData(data_offset, min_op_byte_size),
490-
min_op_byte_size);
491+
if (arch.GetTriple().isRISCV())
492+
m_opcode.SetOpcode16_32TupleBytes(
493+
data.PeekData(data_offset, min_op_byte_size), min_op_byte_size,
494+
byte_order);
495+
else
496+
m_opcode.SetOpcodeBytes(
497+
data.PeekData(data_offset, min_op_byte_size), min_op_byte_size);
491498
got_op = true;
492499
break;
493500
}
@@ -524,13 +531,16 @@ class InstructionLLVMC : public lldb_private::Instruction {
524531
const addr_t pc = m_address.GetFileAddress();
525532
llvm::MCInst inst;
526533

527-
const size_t inst_size =
528-
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
529-
if (inst_size == 0)
530-
m_opcode.Clear();
531-
else {
532-
m_opcode.SetOpcodeBytes(opcode_data, inst_size);
533-
m_is_valid = true;
534+
size_t inst_size = 0;
535+
m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
536+
pc, inst, inst_size);
537+
m_opcode.Clear();
538+
if (inst_size != 0) {
539+
if (arch.GetTriple().isRISCV())
540+
m_opcode.SetOpcode16_32TupleBytes(opcode_data, inst_size,
541+
byte_order);
542+
else
543+
m_opcode.SetOpcodeBytes(opcode_data, inst_size);
534544
}
535545
}
536546
}
@@ -604,10 +614,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
604614
const uint8_t *opcode_data = data.GetDataStart();
605615
const size_t opcode_data_len = data.GetByteSize();
606616
llvm::MCInst inst;
607-
size_t inst_size =
608-
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
617+
size_t inst_size = 0;
618+
bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
619+
inst, inst_size);
609620

610-
if (inst_size > 0) {
621+
if (valid && inst_size > 0) {
611622
mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
612623

613624
const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1217,10 @@ class InstructionLLVMC : public lldb_private::Instruction {
12061217
const uint8_t *opcode_data = data.GetDataStart();
12071218
const size_t opcode_data_len = data.GetByteSize();
12081219
llvm::MCInst inst;
1209-
const size_t inst_size =
1210-
mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
1211-
if (inst_size == 0)
1220+
size_t inst_size = 0;
1221+
const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
1222+
pc, inst, inst_size);
1223+
if (!valid)
12121224
return;
12131225

12141226
m_has_visited_instruction = true;
@@ -1337,19 +1349,19 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
13371349
m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
13381350
}
13391351

1340-
uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
1341-
const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
1342-
llvm::MCInst &mc_inst) const {
1352+
bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(const uint8_t *opcode_data,
1353+
size_t opcode_data_len,
1354+
lldb::addr_t pc,
1355+
llvm::MCInst &mc_inst,
1356+
size_t &size) const {
13431357
llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
13441358
llvm::MCDisassembler::DecodeStatus status;
13451359

1346-
uint64_t new_inst_size;
1347-
status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
1348-
llvm::nulls());
1360+
status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls());
13491361
if (status == llvm::MCDisassembler::Success)
1350-
return new_inst_size;
1362+
return true;
13511363
else
1352-
return 0;
1364+
return false;
13531365
}
13541366

13551367
void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(

lldb/source/Utility/ArchSpec.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
228228
{eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
229229
ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
230230

231-
{eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
231+
{eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
232232
"riscv32"},
233-
{eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
233+
{eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
234234
"riscv64"},
235235

236236
{eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#! /usr/bin/env python3
2+
3+
import sys
4+
5+
for line in sys.stdin:
6+
if "0940003f 00200020" in line and "<unknown>" in line:
7+
line = line.replace("<unknown>", "Fake64")
8+
print(line, end="")

0 commit comments

Comments
 (0)