Skip to content

Add multiple embedding providers for SemanticSimilarity #2373

@linear

Description

@linear

Summary

SemanticSimilarity currently defaults to OpenAI's text-embedding-3-small. While extensible via BaseEmbeddingModel, there are no pre-built alternatives. Add support for common embedding providers out of the box.

Motivation

Not all users have OpenAI API keys. Ragas supports multiple providers. Local embedding models (Sentence-Transformers) enable offline testing — important for CI/CD without API key management and for cost-sensitive users.

Implementation Guide

Steps

  1. Create pre-built embedding model implementations in libs/giskard-checks/src/giskard/checks/utils/embeddings.py

  2. Implement providers:

    • SentenceTransformerEmbedding — uses sentence-transformers library (local, free)
    • Document how to use existing BaseEmbeddingModel for custom providers
  3. Add optional dependencies:

    [project.optional-dependencies]
    local-embeddings = ["sentence-transformers>=2.0,<4"]
  4. Update set_default_embedding_model() documentation

Example usage

from giskard.checks import SemanticSimilarity, set_default_embedding_model
from giskard.checks.utils.embeddings import SentenceTransformerEmbedding

# Use local embeddings (no API key needed)
set_default_embedding_model(SentenceTransformerEmbedding("all-MiniLM-L6-v2"))

check = SemanticSimilarity(reference="Hello world", threshold=0.8)

Acceptance Criteria

  • Sentence-Transformers provider works out of the box
  • Clear documentation for custom providers
  • Optional dependency — doesn't break core install
  • Tests cover: local embeddings produce valid similarity scores

Metadata

Metadata

Assignees

No one assigned

    Labels

    Help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions