O001: ONNX Runtime Migration (post-F026)

## Summary

Replace `torch` dependency with `onnxruntime` for embedding inference in the `[ml]` optional dependency group. Reduces install size by ~1.1GB.

## Background

From tech stack audit discussion (2026-03-03). PyTorch CPU is ~1.8GB installed. `onnxruntime` is ~50MB. The `transformers` library (~500MB) remains needed for tokenization, so realistic savings are ~1.1GB, not the full 1.8GB.

## Two Phases

### Phase A: MiniLM (trivial)
- Use `sentence-transformers[onnx]` with `backend="onnx"`
- Native support since sentence-transformers v3.2.0
- Pre-exported ONNX files exist in HuggingFace repo (`optimum/all-MiniLM-L6-v2`)

### Phase B: UniXcoder (complex)
- Known ONNX export issue with 3D attention mask (`microsoft/CodeBERT#198`)
- Workaround exists (simplify to 2D) but produces different results
- Needs empirical validation: cosine similarity >0.999 vs torch output

## Acceptance Criteria

- [ ] Install size <500MB for `[ml]` dependency group
- [ ] Cosine similarity >0.999 vs torch output for both models
- [ ] CEDAR retrieval results unchanged for reference corpus
- [ ] No `torch` in dependency tree

## Dependencies

- Blocked by: F026 (Dual Embedding) — models must be finalized first
- Decision: Start with minimal path (keep sentence-transformers, swap backend only)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

O001: ONNX Runtime Migration (post-F026) #125

Summary

Background

Two Phases

Phase A: MiniLM (trivial)

Phase B: UniXcoder (complex)

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

O001: ONNX Runtime Migration (post-F026) #125

Description

Summary

Background

Two Phases

Phase A: MiniLM (trivial)

Phase B: UniXcoder (complex)

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions