Skip to content

Latest commit

 

History

History
178 lines (118 loc) · 45.5 KB

File metadata and controls

178 lines (118 loc) · 45.5 KB

Generated by ChatGPT 4.5 Deep Research


CPUID Feature Bit Hierarchy for Instruction Support

CPUID (CPU Identification) is an instruction that allows software to query what features a processor supports. Modern x86 CPUs report hundreds of feature bits via CPUID, each indicating support for a specific instruction set extension or capability (3). These instruction-set features have evolved in a hierarchical manner – newer extensions often build on older ones (for example, 512-bit vector instructions build on prior 256-bit and 128-bit SIMD support). Below we verify and extend the hierarchy of CPUID feature bits for user-mode (CPL3) instruction support, incorporating the latest architectures and noting important dependencies and vendor-specific caveats.

Foundational x86 Architecture (Base Features)

  • Baseline x86 ISA: All modern x86 processors implement the 8086-compatible base instruction set (16/32-bit modes) plus the x87 floating-point unit (FPU). This is the implicit foundation for all extensions – CPUID itself was introduced in the Pentium era and its availability indicates an x86 CPU (1). There is no CPUID flag for “basic x86 support” – it’s assumed if the CPU runs x86 code. Notably, 64-bit CPUs (AMD64/EM64T) mandate certain base features like on-chip FPU and at least SSE2 support, as part of the x86-64 specification.

  • Example Base Features (CPUID EAX=1, EDX): Early CPUID feature bits in the EDX register cover fundamental capabilities. For instance, bit 0 = FPU present, bit 4 = Time Stamp Counter (TSC), bit 8 = CMPXCHG8B (64-bit atomic compare-and-swap), bit 11 = SYSENTER/SYSEXIT, bit 23 = MMX, etc (1 & 1). Many of these became standard requirements over time (e.g. all x86-64 chips have FPU, TSC, CMPXCHG8B, etc.). The focus below, however, is on the major instruction set extensions for general software.

Basic Instruction Set Extensions (CPUID EAX=1, EDX/ECX bits)

These are the earlier extensions that broadened the original ISA. They are reported in CPUID leaf 1 (EAX=1), primarily in the EDX and ECX registers. Over time, the ECX bits were used for newer features as the EDX bits filled up. Key extensions in this category include the introduction of SIMD (single-instruction multiple-data) instructions and other core enhancements:

  • MMX (Multimedia Extensions)CPUID.1.EDX bit 23. Introduced in 1997, MMX added 64-bit vector registers (aliasing the FPU stack) for parallel integer operations to accelerate media and graphics. It was the first x86 SIMD extension. MMX support is indicated by the CPUID “MMX” bit (1). While foundational for later SIMD, MMX had limitations (no floating-point and reusing FPU registers). Subsequent extensions (SSE family) extended and superseded MMX for most uses.

  • SSE (Streaming SIMD Extensions)CPUID.1.EDX bit 25. Introduced in 1999 (Pentium III), SSE added eight 128-bit XMM registers and instructions for parallel single-precision floating-point operations. This vastly improved floating-point SIMD performance over x87 and MMX. SSE is often considered a prerequisite for later SIMD extensions. (All 64-bit x86 CPUs support SSE and SSE2 by specification, ensuring SSE is ubiquitous on modern systems.)

  • SSE2CPUID.1.EDX bit 26. Released in 2001 (Pentium 4), SSE2 extended the XMM instruction set to support double-precision floating-point and additional integer operations. SSE2 effectively replaced most MMX functionality (since XMM registers could now handle integer SIMD) and became a baseline for modern software. In fact, SSE2 is required by many operating systems and 64-bit environments. (All x86-64 chips from AMD/Intel implement SSE2 (2).) SSE2’s presence can be considered a fundamental requirement for any newer vector extensions.

  • SSE3, SSSE3, SSE4.1, SSE4.2CPUID.1.ECX bits 0, 9, 19, 20 respectively. These incremental upgrades (from 2004 through 2008) added new instructions (e.g. horizontal adds, shuffle permutations, dot product, string and text processing instructions, etc.) on the same XMM 128-bit registers. They build on the SSE/SSE2 foundation and are indicated in the ECX bits of CPUID leaf 1 (since they were introduced later). Software can check these bits to see which additional SIMD operations are available. For example, SSE4.2 introduced the PCMPxSTRx string compare instructions and the POPCNT instruction (population count) has a CPUID flag (bit 23 ECX) often associated with SSE4.2 support (2 & 2). (Caveat: AMD did not implement SSE4.1/4.2 until much later; instead, they introduced SSE4a and other instructions – see AMD-specific extensions below.)

  • AES-NI (Advanced Encryption Standard New Instructions)CPUID.1.ECX bit 25. This is a set of instructions (introduced by Intel in 2010) to accelerate AES encryption and decryption. If this bit is 1, the CPU has hardware AES support (AESENC/AESDEC, etc.). These instructions operate on XMM registers (128-bit) and greatly speed up cryptography. AES-NI doesn’t require SSE2 per se, but it benefits from the SIMD architecture and is typically present only on CPUs that also support SSE/SSE2 (which is practically all modern CPUs). Along with AES, Intel introduced the PCLMULQDQ instruction (Carry-less multiply, for cryptographic Galois Field multiplication used in AES-GCM). PCLMULQDQ is indicated by CPUID.1.ECX bit 1 on supporting processors (1). AES-NI and PCLMULQDQ often go hand-in-hand for crypto workloads (Intel’s Westmere was the first to support both).

  • AVX (Advanced Vector Extensions)CPUID.1.ECX bit 28. AVX, introduced in 2011 (Intel Sandy Bridge), expands SIMD registers to 256 bits (YMM0–YMM15 in 64-bit mode) and enables a new encoding (VEX prefix) for instructions. AVX improves floating-point throughput (able to pack 8 floats or 4 doubles per vector, versus 4/2 in SSE) and introduces non-destructive three-operand instructions (so operations no longer overwrite one source operand). Dependencies: AVX requires prior SSE/SSE2 support and also requires OS support for the extended registers. The CPUID AVX bit is only set if the CPU supports saving the YMM registers via the extended state management (XSAVE) and if the OS has enabled XSAVE usage (indicated by CPUID bits XSAVE and OSXSAVE) (4). In other words, software must check CPUID.1:ECX.AVX=1 and CPUID.1:ECX.OSXSAVE=1, and then enable XCR0 bits for SSE and AVX, before it can safely use AVX instructions (4). All AVX-capable CPUs also support SSE4.2 and earlier SIMD features (it’s generally implied that AVX = SSE2+SSE3+...+SSE4.x are present) (2). AVX is a foundation for many further extensions.

  • FMA (Fused Multiply-Add)CPUID.1.ECX bit 12. FMA provides a single instruction that multiplies and adds (i.e. a = a + b*c in one step with only one rounding error). Intel’s FMA3 (3-operand FMA, not to be confused with AMD’s earlier 4-operand FMA4) was introduced around the same time as AVX2 (on Intel Haswell, 2013). FMA instructions use the AVX (VEX) encoding and the 256-bit YMM registers, so AVX support is a prerequisite. The CPUID.1.ECX bit 12 indicates FMA3 support. This greatly accelerates many numeric algorithms (dense linear algebra, DSP, etc.) by combining operations. (AMD had introduced FMA4 earlier in 2011 with their Bulldozer CPUs – see vendor caveats below – but later also adopted FMA3 for compatibility). In practice, if a modern CPU supports AVX, it likely supports FMA as well, but CPUID feature bits should be checked individually.

  • Other Notable CPUID.1 ECX features: There are a few additional bits that indicate useful user-mode instructions:

    • PCLMULQDQ (Carry-less Multiply)CPUID.1.ECX bit 1. (Mentioned above alongside AES-NI) Accelerates GF(2^128) multiplication for cryptography (1).
    • CMPXCHG16BCPUID.1.ECX bit 13. Indicates support for the 16-byte compare-and-exchange instruction (CMPXCHG16B) which is important for atomic operations on 64-bit systems.
    • MOVBECPUID.1.ECX bit 22. Indicates MOVBE (move with byte-swap) support, an instruction useful for endian conversions.
    • XSAVECPUID.1.ECX bit 26 and OSXSAVEbit 27. Indicate support for the XSAVE/XRSTOR extended state management and that the OS has enabled it. These are critical for AVX and later context management (4).
    • RDRANDCPUID.1.ECX bit 30. Introduced with Intel Ivy Bridge (2012), this provides a hardware random number generator (RDRAND instruction). If set, user-mode software can retrieve random numbers from the on-chip RNG. (AMD supports it in newer architectures as well.)
    • RDTSCPCPUID.80000001h.EDX bit 27. (This one is in extended leaf.) It indicates the RDTSCP instruction, a serializing variant of the time-stamp counter read, used in multi-thread timing.

(5) Evolution of SIMD register width and instruction sets in Intel CPUs: SSE introduced 128-bit XMM registers (1999), AVX expanded them to 256-bit YMM (2011), and AVX-512 uses 512-bit ZMM registers (2016). Each generation builds on prior ones (all AVX-capable CPUs also support SSE), and newer 512-bit units can execute older 256/128-bit instructions by using a portion of the register width (5 & 5).

Extended Instruction Set Extensions (CPUID EAX=7, sub-leaf 0)

As the set of available features grew, Intel defined CPUID leaf 7 (EAX=7, ECX=0) to report “extended” feature flags beyond the basic leaf 1. Leaf7 is where many modern extensions (post-2010) are indicated, using the EBX, ECX, and EDX registers as bitfields (1). This includes larger SIMD extensions, cryptography, bit-manipulation, security features, etc. These often have hierarchical dependencies among themselves and with older features:

  • AVX2 (Advanced Vector Extensions 2)CPUID.7.EBX bit 5. Introduced in 2013 (Haswell), AVX2 extends AVX by adding 256-bit integer SIMD operations and other capabilities. Notably, AVX2 adds support for vector integer multiply, gather (vectorized memory reads), and extends most SSE/AVX instructions to work on 256-bit YMM registers (e.g. integer shifts, blends, etc.). Requires: AVX (and by extension SSE/SSE2) must be present. In practice, any AVX2-capable CPU sets the AVX bit as well. AVX2 is a key building block for the subsequent AVX-512 family.

  • BMI1 and BMI2 (Bit Manipulation Instruction Sets)CPUID.7.EBX bits 3 and 8. These sets (BMI1 introduced with Haswell 2013, BMI2 with Broadwell 2014) provide efficient bit-level operations: bit scans, parallel bit deposit/extract, bit counting, etc. BMI1 includes instructions like TZCNT, LZCNT*, ANDN, BEXTR, and BMI2 adds PEXT, PDEP, MULX and others. Dependency: BMI2 builds on BMI1 (BMI2’s availability assumes BMI1). These are integer extensions independent of SIMD, but often used alongside vector code or for implementing algorithms like big integer arithmetic, bitfields, etc. (LZCNT is officially part of BMI1 on Intel; on AMD it was earlier exposed via the “ABM” bit – see AMD section.)

  • TBM (Trailing Bit Manipulation)CPUID.80000001h.ECX bit 21 (AMD). This was an AMD-specific bit-manipulation extension (introduced with AMD “PileDriver” 2012) providing instructions like trailing zero count, bit rotate, etc. (1). It was essentially AMD’s counterpart to BMI, but it never appeared on Intel and was dropped in later AMD designs (Zen removed TBM). Software targeting AMD specifically might check this, but generally BMI1/2 have become the standard set for bit operations.

  • ADX (Multi-Precision Add-Carry)CPUID.7.EBX bit 19. Introduced by Intel (Broadwell, 2014), ADX provides the instructions ADCX/ADOX which allow faster arbitrary-precision addition by maintaining two separate carry chains. It is useful in big integer math (crypto, bignum libraries) to add numbers larger than 64 bits without as much overhead. ADX doesn’t depend on other new instructions (beyond baseline CPUID.1 features) but in practice comes alongside AVX2/BMI2 in the same CPU generations.

  • SHA (Secure Hash Algorithm Extensions)CPUID.7.EBX bit 29. A set of instructions to accelerate SHA-1 and SHA-256 hashing, introduced by Intel with Goldmont (Atom, 2013) and Ice Lake (2019) in mainstream cores. The CPUID bit indicates support for instructions like SHA1RNDS4, SHA256MSG2, etc. These are useful for cryptographic applications. (AMD later added support for these as well in Zen 3.) The SHA extensions are standalone and only require baseline SSE2 support (they use XMM registers). They often appear alongside AES-NI on modern CPUs.

  • FMA3 (Fused Multiply-Add) – (See above in AVX section; Intel’s FMA3 is indicated in CPUID.1.ECX). FMA4 (AMD)CPUID.80000001h.ECX bit 16. AMD introduced its own 4-operand FMA instructions with the Bulldozer architecture (2011), indicated by this bit (1). FMA4 allows a distinct destination register (w = x + y*z) in addition to the three source operands. However, Intel never supported FMA4, and AMD eventually deprecated it in favor of the 3-operand FMA that Intel uses. Modern compilers and software generally use FMA3 on both Intel and AMD, so FMA4 is mostly a historical footnote. (If targeting Bulldozer/Piledriver specifically, one might check this CPUID bit, but code using it won’t run on Intel or newer AMD CPUs.)

  • AVX-512 FamilyCPUID.7.EBX/ECX/EDX various bits (and additional leaves). AVX-512 is an umbrella for many 512-bit extensions, first introduced in Xeon Phi (Knights Landing, 2016) and later in Skylake-X (2017) and onward. The foundational feature is AVX-512F (Foundation), indicated by CPUID.7.EBX bit 16. AVX-512F requires AVX and AVX2 support (as well as 64-bit mode in practice). A CPU that supports AVX-512F has 32 ZMM 512-bit vector registers (in 64-bit mode) and eight new opmask registers (for per-lane masking). AVX-512F alone provides basic 512-bit integer and floating-point operations and introduces the mask/merge capabilities. On top of AVX-512F, there are numerous sub-features each with their own CPUID bit (AVX-512 support on various processors - Intel Communities):

    • Examples of AVX-512 sub-extensions:
      • AVX-512 VL (Vector Length, bit 31 EBX) – allows the new AVX-512 instructions to operate on 128 or 256-bit sizes (making them applicable to XMM/YMM registers).
      • AVX-512 BW/DQ (Byte/Word and Doubleword/Quadword, bits 30 and 17 EBX) – adds integer operations on 8/16-bit and 32/64-bit data types respectively (AVX-512F by itself primarily covers 32/64-bit float/int).
      • AVX-512 CD (Conflict Detection, bit 28 EBX) – assists in vectorizing loops with indirect memory accesses by providing conflict detection instructions.
      • AVX-512 ER/PF (Exponential/Reciprocal and Prefetch, bits 27 and 26 EBX) – specialized sets for exponential and reciprocal approximation and for memory prefetch hint instructions (these were mainly on Knights Landing Xeon Phi).
      • AVX-512 VNNI (Vector Neural Network Instructions, CPUID.7.ECX bit 11) – introduced in Cascade Lake (2019), adds an efficient int8 dot-product accumulate instruction for deep learning (essentially DX=AX*BX+CX in one step for 8-bit elements) (1).
      • AVX-512 BF16 (Bfloat16 support, CPUID.7.EAX (sub-leaf1) bit 5) – introduced in Cooper Lake (2020) and Ice Lake, supports the bfloat16 16-bit brain-float format for AI workloads (1) (1).
      • AVX-512 IFMA (Integer FMA 52-bit, EDX bit 21) – integer fused multiply-add on 52-bit numbers (for big-integer math, introduced on Cannon Lake/Ice Lake) (1).
      • AVX-512 VBMI/VBMI2 (Vector Byte Manipulation, ECX bits 1 and 6) – allow arbitrary byte permutation/shuffles and bit rotations on 512-bit vectors (Ice Lake) (1).
      • AVX-512 BITALG (Bit Algorithms, ECX bit 12) – bit counting and bit manipulation instructions (like vectorized popcount) on 512-bit vectors (1).
      • AVX-512 VPOPCNTDQ (Vector Popcount, ECX bit 14) – population count for 64-bit and 32-bit elements in a vector.
      • AVX-512 VPCLMULQDQ and VAES (ECX bits 10 and 9) – carry-less multiply and AES instructions that operate on 512-bit registers (essentially vectorized PCLMUL and AES-NI) (1).
      • (And others: GFNI (Galois Field New Instructions, bit 8 ECX) for binary field arithmetic, VP2INTERSECT, etc. (1))

    Each of these AVX-512 extensions is indicated by a separate CPUID flag, so a CPU might support some and not others (AVX-512 support on various processors - Intel Communities) (instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency ... - GitHub). Dependencies: All AVX-512 extensions depend on AVX-512F (the CPU won’t report sub-features without the foundation). Some sub-features also imply others (e.g., AVX-512 BW and DL (DQ) are often paired, and VL usually comes with them in mainstream CPUs). Practically, Intel grouped AVX-512 sets by microarchitecture: e.g. Skylake-SP had (F,CD,BW,DQ,VL) (AMD64 Architecture Programmer's Manual - Zen4 - AMD Community), Ice Lake added (VNNI, VBMI, BITALG, VPOPCNTDQ, IFMA, etc.) (twest820/AVX-512: AVX-512 documentation beyond what Intel provides - GitHub), Tiger Lake/Golden Cove added (BF16, VPCLMULQDQ, VAES, etc.), and so on. Software should check the specific bits it needs. (As of 2022, AMD Zen 4 has also implemented AVX-512 support, largely compatible with Intel’s bit flags (twest820/AVX-512: AVX-512 documentation beyond what Intel provides - GitHub) (AMD Zen 4 Adds AVX-512 - TechInsights). Prior AMD CPUs did not, so for cross-vendor code one may need to detect AVX-512 support carefully.)

  • MPX (Memory Protection Extensions)CPUID.7.EDX bit 14. Intel MPX (Skylake, 2015) introduced bounds checking registers and instructions (BNDMOV, BNDCL, etc.) to help catch buffer overruns in software (1). CPUID bit indicates presence of MPX. It requires OS support to be useful (because new registers/state need context-switch saving). MPX had limited adoption and Intel removed it in later generations, so it’s a short-lived feature flag. (No AMD support for MPX.)

  • Intel TSX (Transactional Synchronization Extensions)CPUID.7.EBX bits 4 (HLE) and 11 (RTM). TSX provides hardware transactional memory for concurrency control. HLE (Hardware Lock Elision) is an instruction prefix scheme (XACQUIRE/XRELEASE) to hint at transactional execution, and RTM is an explicit set of instructions (XBEGIN, XEND, etc.) for transactional regions. CPUID reports HLE and RTM separately (1 & 1). These were introduced in Haswell (2013). Transactional memory allows optimistic concurrency – code can execute in a transaction and either commit (if no conflicts) or abort (as if it never ran). TSX requires the CPU to support a strong memory model, and has had errata (some early implementations disabled it via microcode). No AMD equivalent exists as of today, and some newer Intel hybrid CPUs dropped TSX. It’s a niche feature – developers should check CPUID and also be prepared for TSX possibly being disabled even if the bit is present (via MSR updates).

  • Persistent Memory & Cache Control: Several CPUID.7 bits relate to specialized instructions for cache management and non-volatile memory:

    • CLFLUSHOPT (bit 23 EBX) and CLWB (bit 24 EBX) – cache line flush and write-back optimizations (1). These are enhancements to the older CLFLUSH instruction (EDX bit19 on CPUID.1). CLFLUSHOPT allows flushing cache lines with reduced overhead; CLWB writes back a cache line from L1 to memory (useful for NVRAM).
    • PCOMMIT (deprecated, ECX bit 22) – an old persistent commit instruction, now removed (1).
    • PT (Processor Trace) – CPUID.7.EBX bit 25 indicates Intel PT, a feature for tracing program execution (generating detailed branch logs) (1).
    • MOVDIRI/MOVDIR64B – CPUID.7.ECX bits 27 and 28 for direct-store MOV instructions (store 32-bit or 64-byte data direct to memory bypassing caches) (1). Introduced with Intel’s Cascadelake/Cannonlake for fast writes (e.g., NIC or SSD buffers).
    • CLDEMOTE – CPUID.7.ECX bit 25, an instruction to hint that a cache line should be moved from cache closer to cores to a lower-level cache (1) (introduced around Tiger Lake).
  • Security and OS Support Features: Modern CPUs have added flags to indicate features that improve security or require OS enabling:

    • FSGSBASECPUID.7.EBX bit 0. Allows user-mode read/write of FS/GS segment base registers (RDFSBASE/WRFSBASE etc.), which is handy in user-level threading and fast context switching (1). Introduced in Intel Haswell and AMD Zen.
    • SGX (Software Guard Extensions)CPUID.7.EBX bit 2 (with additional leaves 0x12 for enclave details). Indicates support for secure enclaves where user code can run in a protected memory region (1). Requires specific BIOS/OS support to enable. (Intel-only, as of the client Skylake era; Intel is phasing it out on newer client CPUs. AMD’s analogous technology is SEV/TEE at a system level, not enumerated in CPUID in the same way.)
    • SMEP (Supervisor Mode Exec Prevention)CPUID.7.EBX bit 7. If 1, the CPU prevents kernel from executing code in user pages (mitigating certain attacks) (1). This is mainly an OS security feature (first in Ivy Bridge, 2012).
    • SMAP (Supervisor Mode Access Prevention)CPUID.7.ECX bit 20. If 1, the CPU prevents kernel from accessing user-space memory when this bit is enabled (except through explicit override) (1 & 1). Further hardens user/kernel memory isolation (Intel Broadwell+, 2014; AMD supported from Zen+).
    • PKE/PKU (Protection Keys for Userspace)CPUID.7.ECX bit 3 (PKU) and bit 4 (OSPKE). Allows memory pages to be tagged with “protection keys” and fast user-mode changes to permission domains (without system call) (1). Requires OS support to manage keys. Introduced in Skylake (2015) for finer-grained memory protection.
    • CET – Shadow Stack and Indirect Branch TrackingCPUID.7.ECX bit 7 (CET_SS) and CPUID.7.EDX bit 20 (CET_IBT). Control-Flow Enforcement Technology provides a shadow call stack (to prevent return-oriented programming) and indirect branch protections (1) (1). First implemented in Intel Tiger Lake (2020) and supported by Windows 11 / Linux for enhanced security. CPUID flags indicate if these features are present so software/OS can enable them.
    • LA57 (5-Level Paging)CPUID.7.ECX bit 16. Not an instruction, but a mode capability (57-bit virtual addressing) for very large memory support (1). (Introduced in Intel Cannon Lake and supported by AMD).
    • SERIALIZECPUID.7.EDX bit 14. The SERIALIZE instruction (Intel Tremont, 2020) is a stronger serializing instruction (like an improved CPUID or MFENCE) that ensures all prior instructions complete and memory is consistent (1). The CPUID bit indicates its availability.
    • TSXLDTRKCPUID.7.EDX bit 16. Indicates Intel’s TSX suspend Load Address Tracking extension (TSX Suspend Resume, introduced in Alder Lake) (1), which is a niche extension to transactional memory for debugging.
    • AMX (Advanced Matrix Extensions)CPUID.7.EDX bits 22, 24, 25: These indicate support for Intel’s new tile matrix multiply unit (Tile Matrix Multiply - bfloat16 and int8, and tile load/store) (1 & 1). AMX introduces a set of 2D register tiles and is present on processors like Sapphire Rapids (2022). It requires OS support to enable the state (similar to AVX-512). CPUID bits: 22 = AMX-BF16, 24 = AMX-TILE, 25 = AMX-INT8.

(The above list is not exhaustive – CPUID leaf7 has many bits, including those for virtualization and debugging that are beyond the scope of user-mode software. For a complete reference of CPUID features, see the official Intel/AMD documentation or the x86-cpuid.org database, which enumerates 800+ CPUID bit fields (3.)

Extended AMD-Specific Features (CPUID EAX=0x80000001)

In addition to the common CPUID leaves, AMD uses the high 0x80000000+ leaves to report some features and extensions (Intel reports some things here too, like Intel 64-bit support). Notable bits in CPUID.80000001h (Extended Processor Info and Features) include:

  • AMD64 (Intel 64) & NX: EDX bit 29 = LM (Long Mode) indicates 64-bit capability (AMD’s original x86-64 feature, which Intel later adopted) (1). EDX bit 20 = NX (No-eXecute) indicates support for the execute-disable bit for memory pages (1) – this was introduced by AMD (as the “NX bit”) and later by Intel (“XD bit”) for buffer overflow protection. Virtually all modern CPUs have NX/XD support (1).

  • SSE4a: ECX bit 6 (80000001h) – Not to be confused with SSE4.1/4.2, SSE4a is an AMD-specific minor extension (added in AMD K10, 2007) that includes misaligned SSE load and a couple of insert/qword extract instructions (1). It’s only on AMD CPUs; Intel CPUs will show 0 for this bit (since Intel instead had SSE4.1/4.2). SSE4a doesn’t depend on SSE4.1/4.2 (indeed it predates them) but does require SSE2 as a base.

  • ABM – LZCNT/POPCNT: ECX bit 5 (80000001h) – Stands for “Advanced Bit Manipulation,” which on AMD indicates the presence of the LZCNT (Leading Zero Count) and POPCNT (Population count) instructions (1). AMD introduced these with SSE4a (popcnt) and ABM (Bulldozer added LZCNT). Intel also has POPCNT (but Intel’s POPCNT was tied to SSE4.2 CPUID.1.ECX bit 23) and LZCNT (treated as an extension of BMI1 on Intel, since LZCNT is architecturally a variant of the Bit Scan Reverse). In short, both AMD and Intel ended up supporting POPCNT and LZCNT, but the CPUID bits differ: AMD’s “ABM” bit covers LZCNT, while Intel signals LZCNT support implicitly if BMI1 and SSE4.2 are present. Usually, if you compile with SSE4.2 or BMI1, you get both POPCNT and LZCNT available on both vendors (with the caveat that LZCNT will behave like BSR (bit scan reverse) on older Intel chips that lack it).

  • XOP (Extended Operations)ECX bit 11 (80000001h). XOP was part of AMD’s proposed SSE5 (2009) and implemented in Bulldozer (2011). It added several vector integer instructions (like fused compare+multiply, shifts, etc.) using the 128-bit SIMD registers (1). XOP can be seen as AMD’s equivalent of SSE4-era extensions. It was never supported by Intel, and AMD dropped it in Zen. So XOP and FMA4 (and TBM mentioned above) are collectively remnants of AMD’s short-lived SSE5 initiative. If developing for older AMD CPUs, check these bits; otherwise, they are no longer relevant on current CPUs.

  • SKINIT, SVMe, and other AMD bits: AMD has bits for its security and virtualization features here too (e.g., ECX bit 2 = SVM for Secure Virtual Machine virtualization, ECX bit 12 = SKINIT/STGI for secure kernel init, etc. (1)). These are largely system-level and not used in user-mode applications directly. Another bit is ECX bit 7 = Misaligned SSE, which indicates AMD CPUs allow more efficient misaligned SSE accesses (a minor nuance of SSE handling) (1).

  • 3DNow! and MMX extensions: In EDX, AMD formerly indicated support for 3DNow! instructions (bits 31 and 30 for 3DNow! and “3DNow! extensions”) (1). 3DNow! was AMD’s 1998 SIMD extension (floats in MMX registers) which is now obsolete – AMD removed most 3DNow instructions in modern chips (leaving only PREFETCH/PREFETCHW). It’s largely historical; CPUID bits may still be defined but modern compilers don’t use 3DNow!. AMD also had an “MMX extensions” bit (EDX bit 22) to indicate slightly extended MMX instructions (like AMD’s 3DNow-prefetch, etc.) (1).

Hierarchy and Dependencies Summary

To visualize the dependency tree of major instruction set extensions:

x86 base (implicit)  
├── MMX (EDX 23)  
└── SSE (EDX 25)  
    └── SSE2 (EDX 26)  
        ├── SSE3/SSSE3/SSE4.1/SSE4.2 (ECX 0/9/19/20)  
        ├── AES-NI & PCLMULQDQ (ECX 25, ECX 1) [parallel crypto branch]  
        └── AVX (ECX 28)  – requires OS support (XSAVE) and SSE/SSE2  
            ├── FMA3 (ECX 12)  
            ├── F16C (ECX 29)  
            ├── AVX2 (Leaf7 EBX 5)  
            │   ├── BMI1 (Leaf7 EBX 3) → BMI2 (Leaf7 EBX 8)  
            │   ├── ADX (Leaf7 EBX 19)  
            │   ├── MPX (Leaf7 EDX 14)  
            │   ├── AVX-512 Foundation (Leaf7 EBX 16)  
            │   │   ├── AVX-512 VL, BW, DQ, CD... (Leaf7 EBX & others)  
            │   │   ├── AVX-512 VNNI, BITALG, etc. (Leaf7 ECX bits)  
            │   │   ├── AVX-512 BF16, VPOPCNTDQ, etc. (Leaf7 ECX/EDX)  
            │   │   └── ... (many AVX-512 subsets, all require AVX-512F)  
            │   └── SHA (Leaf7 EBX 29)  
            └── TSX (Leaf7 EBX 4,11 for HLE/RTM)  

(Note: Above is a simplified hierarchy – not every dependency is strict but reflects typical implementation. For example, AES-NI doesn’t strictly require SSE2, but in practice any CPU with AES-NI also has SSE2. Likewise, AVX2 implies AVX which implies SSE2 in all real CPUs (2). Conversely, some branches are independent (e.g. BMI/ADX don’t technically need AVX2), but they tend to appear together in the same chip generations. Always check the individual CPUID bits in your software because out-of-order combinations, while unlikely, are theoretically possible.)

Key Caveats and Exceptions

  • Vendor Differences: Intel and AMD generally implement the same major extensions now (SSE, AVX, etc.), but there have been differences historically. AMD’s SSE4a vs Intel’s SSE4.1/4.2 is one example – these share a name but are different features (1). AMD introduced XOP, FMA4, TBM which Intel never supported (1); those are now deprecated. Intel’s TSX and MPX were not implemented by AMD. Always consider the vendor when reading CPUID: some bits in leaf7 or 0x80000001h are vendor-specific (e.g., an Intel CPU will not set AMD’s SVM bit, and an AMD CPU before Zen4 will not set AVX-512 bits even if CPUID.7 is available). The x86-cpuid database is useful for filtering features by vendor (3).

  • 64-bit Mode Requirements: The move to x86-64 (AMD64) made some features mandatory (like SSE2) and introduced new ones like SYSCALL (CPUID.80000001h:EDX bit 11) for fast system calls (1). Some features are only relevant in 64-bit mode (e.g. LA57 5-level paging, AVX-512 requires 64-bit mode to access extended registers, etc.). When targeting 32-bit versus 64-bit code, CPUID bits may need interpretation – for instance, CMPXCHG16B is standard on x86-64 but optional on 32-bit (so CPUID bit 13 ECX might be 0 on an older 32-bit CPU, but all x64 CPUs support it by default).

  • OS Support and Enablement: Just because CPUID indicates a feature doesn’t always mean you can use it directly – the operating system may need to enable certain features first. This is notably true for XSAVE-enabled features like AVX, AVX-512, and AMX. The CPUID.1.ECX.OSXSAVE bit (and CR4.OSXAVE in the OS) must be set, and the OS must set the corresponding XCR0 bits to allow the extended registers to be used (4 & 4). If not, executing (for example) an AVX instruction on a supporting CPU will still #UD (undef instruction fault). Most modern OSes handle this automatically, but low-level software should be aware of this requirement. Similarly, features like SGX, PT, TSX, or CET may be gated by OS/kernel support (or even BIOS enablement).

  • Deprecated/Removed Features: A few CPUID features have been removed or repurposed over time. For example, the PCOMMIT instruction was initially flagged via CPUID (leaf7 ECX bit 22) but later deprecated and is not found on newer CPUs (1). TSX had bugs leading to it being turned off on some steppings (CPUID might still show it, but TSX always aborts or is disabled via microcode – later CPUs added bits like RTM_ALWAYS_ABORT to indicate this (1)). Always consult the latest CPUID documentation for notes on specific bits that might be operational or not. The CPUID leaf for “Extended Features” (EAX=7, ECX=1) has flags like X86S (for Intel’s canceled 32-bit deprecation plan) which might never be applicable but still have bit assignments (1).

  • Checking Dependencies in Software: Generally, if you use an instruction set, you should check for its own CPUID flag. The hierarchy means you usually don’t need to manually check every prerequisite bit – for instance, if you check AVX2 (7:EBX.5) and it’s present, you can reasonably assume AVX and SSE2 are present (and indeed the compiler or OS likely wouldn’t even execute your AVX2 code path unless those were present). However, for safety, some programmers do verify key prerequisites especially when enabling OS features. For example, before using AVX, one might ensure XSAVE/OSXSAVE and AVX bits are set and then execute the XSETBV instruction to enable the YMM state (4 & 4). When using AVX-512, ensure the OS has enabled the ZMM and opmask state bits (XCR0 bits 5/6/7) (4).

  • Tooling: Many libraries and OS utilities (like Linux’s /proc/cpuinfo flags or the Windows IsProcessorFeaturePresent) abstract these CPUID bits into human-readable names. For instance, /proc/cpuinfo might list “avx” or “sse4_2” which correspond to the CPUID bits (3). It’s useful to cross-reference these names with CPUID bits (the x86-cpuid.org database maps hundreds of such names to CPUID bits (3)). For software that needs to dispatch based on CPU capabilities, using provided intrinsics or libraries (like Intel’s CPU dispatch or tools like __cpuid on Windows) can simplify querying these bits (__cpuid, __cpuidex | Microsoft Learn).

Conclusion

Modern x86 CPUs include a rich set of instruction extensions reported via CPUID. Understanding the hierarchy (which features depend on or imply others) helps in making sense of the multitude of CPUID flags. In summary, the SIMD/vector path (SSE → AVX → AVX2 → AVX-512 → beyond) is the most hierarchically dependent chain – each step building on increased register width and new capabilities (5 & 5). Other extensions like bit-manipulation (BMI1/2), crypto (AES/SHA), and concurrency (TSX) branch off to address specific domains, sometimes in parallel to the main SIMD line.

When writing user-mode software, you typically enable code paths based on these CPUID bits – e.g. use SSE2 if available, fall back to x87 if not; use AVX2 for a performance boost if present, etc. The “leaf 7” features represent the newer wave of instructions (post-2010) that a developer might target for optimization, while the “leaf 0x80000001” features remind us of legacy and vendor-specific instructions. It’s always important to consult official references (like Intel/AMD manuals or community-maintained lists) for the exact semantics of each CPUID bit (1 & 1).

By keeping the hierarchical relationships and caveats in mind, developers can make informed decisions about which instruction sets to use and ensure their software checks the right CPUID flags for compatibility. In practice, thanks to this hierarchy, enabling a high-level option (like targeting AVX2 in a compiler) will implicitly assume all earlier necessary features (SSE/AVX) are present (2 & 2) – a testament to how the evolution of the x86 ISA has been additive and (mostly) backward-compatible.

Sources: The CPUID feature bits and instruction set details are based on information from the Wikipedia “CPUID” article (1 & 1), the x86-cpuid.org project database (3 & 3), Intel and AMD architecture manuals, and other references as cited above. These sources provide comprehensive bit definitions and notes on each extension for further reading.