Skip to content

O001: ONNX Runtime Migration (post-F026) #125

@sefaertunc

Description

@sefaertunc

Summary

Replace torch dependency with onnxruntime for embedding inference in the [ml] optional dependency group. Reduces install size by ~1.1GB.

Background

From tech stack audit discussion (2026-03-03). PyTorch CPU is ~1.8GB installed. onnxruntime is ~50MB. The transformers library (~500MB) remains needed for tokenization, so realistic savings are ~1.1GB, not the full 1.8GB.

Two Phases

Phase A: MiniLM (trivial)

  • Use sentence-transformers[onnx] with backend="onnx"
  • Native support since sentence-transformers v3.2.0
  • Pre-exported ONNX files exist in HuggingFace repo (optimum/all-MiniLM-L6-v2)

Phase B: UniXcoder (complex)

  • Known ONNX export issue with 3D attention mask (microsoft/CodeBERT#198)
  • Workaround exists (simplify to 2D) but produces different results
  • Needs empirical validation: cosine similarity >0.999 vs torch output

Acceptance Criteria

  • Install size <500MB for [ml] dependency group
  • Cosine similarity >0.999 vs torch output for both models
  • CEDAR retrieval results unchanged for reference corpus
  • No torch in dependency tree

Dependencies

  • Blocked by: F026 (Dual Embedding) — models must be finalized first
  • Decision: Start with minimal path (keep sentence-transformers, swap backend only)

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions