SIMD auto-vectorization: emit SSE4/AVX2 instructions from vector IR types

## Background

LLVM IR has first-class vector types (\`<4 x i32>\`, \`<8 x float>\`, etc.) and vector instructions (\`shufflevector\`, \`insertelement\`, \`extractelement\`). The parser and IR types already support these, but the x86 codegen backend currently has no lowering for them — any vector instruction panics or emits a NOP.

Vectorization is one of the highest-leverage optimizations in a compiler. A single AVX2 instruction can process 8 floats simultaneously, delivering 8× throughput for numerical workloads.

## Goals

### Phase 1 — Scalar lowering of vector IR (baseline)
Scalarize vector operations into scalar loops so they at least produce correct (if slow) code. This is the fallback when the target doesn't support SIMD.

### Phase 2 — SSE4.2 codegen
Lower common vector patterns to SSE4.2 instructions:
- `<4 x i32>` arithmetic → `PADDD`, `PSUBD`, `PMULLD`
- `<4 x float>` arithmetic → `ADDPS`, `MULPS`, `DIVPS`
- `<2 x double>` arithmetic → `ADDPD`, `MULPD`
- Loads/stores → `MOVDQU`, `MOVAPS`

### Phase 3 — AVX2 codegen  
Extend to 256-bit registers for 2× throughput on modern CPUs:
- `<8 x i32>`, `<8 x float>`, `<4 x double>`
- `VPADDD`, `VMULPS`, etc.

### Phase 4 — Auto-vectorization pass
A middle-end pass that detects scalar loops over arrays and transforms them into vector IR, enabling the backend to emit SIMD instructions even for scalar source programs.

## Acceptance criteria

- [ ] All vector `InstrKind` variants produce valid (non-panicking) x86 output at O0 (scalarized)
- [ ] SSE4.2 lowering for the 12 most common patterns, each with an encoding test
- [ ] At least one end-to-end test: vector `.ll` → ELF → link → run → correct result
- [ ] Feature-gated: vectorization only emitted when target CPU supports it (via a `TargetFeatures` flag)

## References
- [Intel Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html)
- [LLVM Vectorization documentation](https://llvm.org/docs/Vectorizers.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD auto-vectorization: emit SSE4/AVX2 instructions from vector IR types #86

Background

Goals

Phase 1 — Scalar lowering of vector IR (baseline)

Phase 2 — SSE4.2 codegen

Phase 3 — AVX2 codegen

Phase 4 — Auto-vectorization pass

Acceptance criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SIMD auto-vectorization: emit SSE4/AVX2 instructions from vector IR types #86

Description

Background

Goals

Phase 1 — Scalar lowering of vector IR (baseline)

Phase 2 — SSE4.2 codegen

Phase 3 — AVX2 codegen

Phase 4 — Auto-vectorization pass

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions