Skip to content

[x86][MC] Fail to decode some long multi-byte NOPs #117304

@venkyqz

Description

@venkyqz

Work environment

Questions Answers
OS/arch/bits x86_64 Ubuntu 20.04
Architecture x86_64
Source of Capstone git clone, default on master branch.
Version/git commit llvm-20git, f08278

minimum PoC disassembler

#include <llvm-c/Disassembler.h>
#include <llvm-c/Target.h>

int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
    LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
    LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
        errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
                                        LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
    }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
                                             output_string, sizeof(output_string));

    if (instr_len > 0) {
        printf("%s\n", output_string);
    } else {
        printf("Error: Unable to disassemble the input bytes.\n");
    }
}

Instruction bytes giving faulty results

0f 1a de

Expected results

It should be:

nop esi, ebx

Actually results

$./min_llvm_disassembler "0f1ade"
Error: Unable to disassemble the input bytes.

Other cases seem to work

$./min_llvm_disassembler "0f1f00"
nop     dword ptr [rax]

Additional Logs, screenshots, source code, configuration dump, ...

Instructions with opcodes ranging from 0f 18 to 0f 1f are defined as multi-byte NOP (No Operation) instructions in x86 ISA. Please refer to the StackOverflow post for more details. It should be decoded in the following logic.

  • "0x0f 0x1a" is extended opcode.
  • The ModR/M byte DE translates to binary 11011110 (0xde).
    • Bits 7-6 (Mod): 11 (binary) = 3 (decimal)
      Indicates register-direct addressing mode.
    • Bits 5-3 (Reg): 011 (binary) = 3 (decimal)
      Corresponds to the EBX (or RBX in 64-bit mode) register.
    • Bits 2-0 (R/M): 110 (binary) = 6 (decimal)
      Corresponds to the ESI (or RSI in 64-bit mode) register.

XED also translates "0f 1a de" into "nop esi, ebx".

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend:X86llvm:mcMachine (object) codequestionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions