Skip to content

SIMD auto-vectorization: emit SSE4/AVX2 instructions from vector IR types #86

@yudongusa

Description

@yudongusa

Background

LLVM IR has first-class vector types (`<4 x i32>`, `<8 x float>`, etc.) and vector instructions (`shufflevector`, `insertelement`, `extractelement`). The parser and IR types already support these, but the x86 codegen backend currently has no lowering for them — any vector instruction panics or emits a NOP.

Vectorization is one of the highest-leverage optimizations in a compiler. A single AVX2 instruction can process 8 floats simultaneously, delivering 8× throughput for numerical workloads.

Goals

Phase 1 — Scalar lowering of vector IR (baseline)

Scalarize vector operations into scalar loops so they at least produce correct (if slow) code. This is the fallback when the target doesn't support SIMD.

Phase 2 — SSE4.2 codegen

Lower common vector patterns to SSE4.2 instructions:

  • <4 x i32> arithmetic → PADDD, PSUBD, PMULLD
  • <4 x float> arithmetic → ADDPS, MULPS, DIVPS
  • <2 x double> arithmetic → ADDPD, MULPD
  • Loads/stores → MOVDQU, MOVAPS

Phase 3 — AVX2 codegen

Extend to 256-bit registers for 2× throughput on modern CPUs:

  • <8 x i32>, <8 x float>, <4 x double>
  • VPADDD, VMULPS, etc.

Phase 4 — Auto-vectorization pass

A middle-end pass that detects scalar loops over arrays and transforms them into vector IR, enabling the backend to emit SIMD instructions even for scalar source programs.

Acceptance criteria

  • All vector InstrKind variants produce valid (non-panicking) x86 output at O0 (scalarized)
  • SSE4.2 lowering for the 12 most common patterns, each with an encoding test
  • At least one end-to-end test: vector .ll → ELF → link → run → correct result
  • Feature-gated: vectorization only emitted when target CPU supports it (via a TargetFeatures flag)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions