Skip to content

[ENH] Add optional embeddings layer for all-MiniLM-L6-v2 on top of ort #45

@tazarov

Description

@tazarov

Summary

Add an optional high-level embeddings layer (separate from ort core) for all-MiniLM-L6-v2 that provides Python-parity preprocessing and postprocessing.

Why

ort should remain a low-level inference primitive, but practical embedding DX needs utilities that most users otherwise must reimplement:

  • tokenizer loading
  • truncation/padding policy
  • multi-input tensor assembly (input_ids, attention_mask, token_type_ids)
  • mean pooling with attention mask
  • L2 normalization

Without this layer, users can run inference but do not get drop-in embedding behavior comparable to Chroma Python/cgo paths.

Proposed approach

  • Keep ort package low-level and unchanged in responsibility.
  • Add a separate package (for example embeddings/minilm or examples/embeddings) built on top of ort.
  • Implement EmbedDocuments/EmbedQuery convenience APIs for MiniLM.

Functional requirements

  • Tokenizer behavior aligned with Python reference:
    • truncation to 256
    • fixed padding to 256
  • ONNX inputs:
    • input_ids (int64)
    • attention_mask (int64)
    • token_type_ids (int64 zeros)
  • Output handling:
    • read last_hidden_state
    • mean pooling weighted by attention mask
    • clip denominator with epsilon (1e-9)
    • row-wise L2 normalization with zero-safe epsilon (1e-12)
  • Return deterministic []float32 embeddings of length 384 per input.

Non-goals

  • Do not move embedding logic into ort core.
  • Do not block low-level inference roadmap on tokenizer/provider abstractions.
  • Do not require network download at runtime if model/tokenizer paths are already provided.

Acceptance criteria

  • New high-level package exists and is documented as optional.
  • End-to-end embedding output shape is N x 384 for N docs.
  • Golden/consistency test compares output behavior against current cgo/Python logic on sample inputs.
  • Core ort API remains low-level and backward compatible.

References

  • Python reference:
    • /Users/tazarov/RustroverProjects/chroma/chromadb/utils/embedding_functions/onnx_mini_lm_l6_v2.py
  • Existing cgo implementation:
    • /Users/tazarov/GolandProjects/chroma-go/pkg/embeddings/default_ef/default_ef.go
    • /Users/tazarov/GolandProjects/chroma-go/pkg/embeddings/default_ef/tensors_utils.go

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions