Skip to content

[MC][x86-64] Clang silently mis-assembles instructions with high-byte registers and EVEX/VEX prefixes in release mode and crashes with assertion enabled #158585

@venkyqz

Description

@venkyqz

Description

When using Clang's integrated assembler (LLVM-MC), an interesting issue arises when assembling an instruction that both requires an EVEX or VEX prefix for encoding and uses a high-byte register (ah, ch, dh, or bh). The fundamental incompatibility between these two encoding requirements leads to a silent misassembling bug in the release mode of Clang.

This results in a dangerous discrepancy between different build modes:

  • In debug/assertion builds: The assembler's internal checks correctly identify the invalid instruction. This leads to a crash with an informative error message: "LLVM ERROR: Cannot encode high byte register in VEX/EVEX-prefixed instruction." This behavior, although it causes a crash, is useful for development as it clearly signals a fatal error.

  • In release builds: The internal checks are disabled. The assembler silently proceeds with the invalid instruction, resulting in a silent mis-assembly. This behavior is problematic and dangerous as it produces a defective executable without any warning or error. The resulting binary's behavior is unpredictable, potentially leading to security vulnerabilities or silent data corruption.

Impact and Scope

  • This bug is not limited to a single instruction but affects a broad range of opcodes. Any instruction that the assembler attempts to encode with an EVEX/VEX prefix will exhibit this issue if a high-byte register is used. This includes:

  • Instructions that natively use EVEX/VEX: The entire ccmp* and ctest* family of instructions from AVX-512 are directly affected.

  • Standard instructions that can be implicitly encoded with EVEX/VEX: Many traditional opcodes (add, sub, and, or, etc.) can be affected when the assembler chooses to use these prefixes for specific operand combinations or modern CPU features.

  • The list of potentially affected opcodes is extensive, including but not limited to:

adc, add, and

All ccmp* and ctest* instructions

dec, inc, neg, not

or, sbb, shl, shr

All setzu* instructions

sub, xor

Expected Behavior

An assembler should always be robust and predictable. The correct behavior is to consistently fail with an explicit error message when given an impossible instruction, regardless of the build mode. This is demonstrated by GAS binutils, which rejects the instruction with the clear error: "Error: can't encode register 'ah' in an instruction requiring EVEX prefix." The silent mis-assembly in release mode is an unacceptable form of Undefined Behavior.

Ways to Reproduce

Godbolt Link: https://godbolt.org/z/jhP3o4v8q

Observation

  • Clang 17.0.1 is the latest version that does not have that problem.
  • The behaviour of trunk binutils GAS is correct.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions