-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Summary
Replace torch dependency with onnxruntime for embedding inference in the [ml] optional dependency group. Reduces install size by ~1.1GB.
Background
From tech stack audit discussion (2026-03-03). PyTorch CPU is ~1.8GB installed. onnxruntime is ~50MB. The transformers library (~500MB) remains needed for tokenization, so realistic savings are ~1.1GB, not the full 1.8GB.
Two Phases
Phase A: MiniLM (trivial)
- Use
sentence-transformers[onnx]withbackend="onnx" - Native support since sentence-transformers v3.2.0
- Pre-exported ONNX files exist in HuggingFace repo (
optimum/all-MiniLM-L6-v2)
Phase B: UniXcoder (complex)
- Known ONNX export issue with 3D attention mask (
microsoft/CodeBERT#198) - Workaround exists (simplify to 2D) but produces different results
- Needs empirical validation: cosine similarity >0.999 vs torch output
Acceptance Criteria
- Install size <500MB for
[ml]dependency group - Cosine similarity >0.999 vs torch output for both models
- CEDAR retrieval results unchanged for reference corpus
- No
torchin dependency tree
Dependencies
- Blocked by: F026 (Dual Embedding) — models must be finalized first
- Decision: Start with minimal path (keep sentence-transformers, swap backend only)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Projects
Status
Backlog