-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Add an optional high-level embeddings layer (separate from ort core) for all-MiniLM-L6-v2 that provides Python-parity preprocessing and postprocessing.
Why
ort should remain a low-level inference primitive, but practical embedding DX needs utilities that most users otherwise must reimplement:
- tokenizer loading
- truncation/padding policy
- multi-input tensor assembly (
input_ids,attention_mask,token_type_ids) - mean pooling with attention mask
- L2 normalization
Without this layer, users can run inference but do not get drop-in embedding behavior comparable to Chroma Python/cgo paths.
Proposed approach
- Keep
ortpackage low-level and unchanged in responsibility. - Add a separate package (for example
embeddings/minilmorexamples/embeddings) built on top ofort. - Implement
EmbedDocuments/EmbedQueryconvenience APIs for MiniLM.
Functional requirements
- Tokenizer behavior aligned with Python reference:
- truncation to 256
- fixed padding to 256
- ONNX inputs:
input_ids(int64)attention_mask(int64)token_type_ids(int64zeros)
- Output handling:
- read
last_hidden_state - mean pooling weighted by attention mask
- clip denominator with epsilon (
1e-9) - row-wise L2 normalization with zero-safe epsilon (
1e-12)
- read
- Return deterministic
[]float32embeddings of length384per input.
Non-goals
- Do not move embedding logic into
ortcore. - Do not block low-level inference roadmap on tokenizer/provider abstractions.
- Do not require network download at runtime if model/tokenizer paths are already provided.
Acceptance criteria
- New high-level package exists and is documented as optional.
- End-to-end embedding output shape is
N x 384forNdocs. - Golden/consistency test compares output behavior against current cgo/Python logic on sample inputs.
- Core
ortAPI remains low-level and backward compatible.
References
- Python reference:
/Users/tazarov/RustroverProjects/chroma/chromadb/utils/embedding_functions/onnx_mini_lm_l6_v2.py
- Existing cgo implementation:
/Users/tazarov/GolandProjects/chroma-go/pkg/embeddings/default_ef/default_ef.go/Users/tazarov/GolandProjects/chroma-go/pkg/embeddings/default_ef/tensors_utils.go
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels