-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Description
Description
When using Clang's integrated assembler (LLVM-MC), an interesting issue arises when assembling an instruction that both requires an EVEX or VEX prefix
for encoding and uses a high-byte register (ah
, ch
, dh
, or bh
). The fundamental incompatibility between these two encoding requirements leads to a silent misassembling bug in the release mode of Clang.
This results in a dangerous discrepancy between different build modes:
-
In
debug/assertion builds
: The assembler's internal checks correctly identify the invalid instruction. This leads to a crash with an informative error message: "LLVM ERROR: Cannot encode high byte register in VEX/EVEX-prefixed instruction." This behavior, although it causes a crash, is useful for development as it clearly signals a fatal error. -
In
release builds
: The internal checks are disabled. The assembler silently proceeds with the invalid instruction, resulting in a silent mis-assembly. This behavior is problematic and dangerous as it produces a defective executable without any warning or error. The resulting binary's behavior is unpredictable, potentially leading to security vulnerabilities or silent data corruption.
Impact and Scope
-
This bug is not limited to a single instruction but affects a broad range of opcodes. Any instruction that the assembler attempts to encode with an EVEX/VEX prefix will exhibit this issue if a high-byte register is used. This includes:
-
Instructions that natively use EVEX/VEX: The entire
ccmp*
andctest*
family of instructions fromAVX-512
are directly affected. -
Standard instructions that can be implicitly encoded with EVEX/VEX: Many traditional opcodes (add, sub, and, or, etc.) can be affected when the assembler chooses to use these prefixes for specific operand combinations or modern CPU features.
-
The list of potentially affected opcodes is extensive, including but not limited to:
adc, add, and
All ccmp* and ctest* instructions
dec, inc, neg, not
or, sbb, shl, shr
All setzu* instructions
sub, xor
Expected Behavior
An assembler should always be robust and predictable. The correct behavior is to consistently fail with an explicit error message when given an impossible instruction, regardless of the build mode. This is demonstrated by GAS binutils, which rejects the instruction with the clear error: "Error: can't encode register 'ah' in an instruction requiring EVEX prefix." The silent mis-assembly in release mode is an unacceptable form of Undefined Behavior.
Ways to Reproduce
Godbolt Link: https://godbolt.org/z/jhP3o4v8q
Observation
Clang 17.0.1
is the latest version that does not have that problem.- The behaviour of
trunk binutils GAS
is correct.