Skip to content

Add embeddings support across providers (OpenAI, Gemini, local OSS) #9

@ebowwa

Description

@ebowwa

Feature Request

Add comprehensive embeddings support across multiple providers, including local open-source models.

Motivation

  • Vector embeddings are crucial for RAG, semantic search, and similarity matching
  • Different providers excel at different embedding tasks
  • Gemini uniquely supports multimodal embeddings (text, image, video)
  • Local embeddings for privacy-sensitive applications

Proposed Implementation

1. Base Embeddings Interface

# ai_proxy_core/embeddings.py
class EmbeddingsHandler:
    async def create_embeddings(
        self,
        input: Union[str, List[str], bytes],
        model: str = "text-embedding-ada-002",
        input_type: str = "text"  # text, image, video
    ) -> Dict[str, Any]:
        # Route to appropriate provider
        if model.startswith("text-embedding"):
            return await self._openai_embeddings(input, model)
        elif model.startswith("models/embedding"):
            return await self._gemini_embeddings(input, model)
        elif model in LOCAL_MODELS:
            return await self._local_embeddings(input, model)

2. OpenAI Embeddings

async def _openai_embeddings(self, input, model):
    response = await openai_client.embeddings.create(
        input=input,
        model=model  # text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
    )
    return self._format_response(response)

3. Gemini Multimodal Embeddings

async def _gemini_embeddings(self, input, model):
    # Gemini supports text, image, AND video embeddings\!
    if isinstance(input, str):
        content = types.Content(parts=[types.Part.from_text(input)])
    elif self._is_image(input):
        content = types.Content(parts=[types.Part.from_image(input)])
    elif self._is_video(input):
        content = types.Content(parts=[types.Part.from_video(input)])
    
    response = await gemini_client.models.embed_content(
        model=model,  # models/embedding-001
        content=content
    )
    return self._format_response(response)

4. Local OSS Embeddings

# Support for sentence-transformers, instructor-embeddings, etc.
async def _local_embeddings(self, input, model):
    if model == "all-MiniLM-L6-v2":
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer(model)
        embeddings = model.encode(input)
    elif model.startswith("instructor-"):
        from InstructorEmbedding import INSTRUCTOR
        model = INSTRUCTOR(model)
        embeddings = model.encode(input)
    elif model.startswith("ollama/"):
        # Use Ollama's embedding endpoint
        embeddings = await self._ollama_embeddings(input, model)
    
    return {"embeddings": embeddings.tolist()}

5. Model Registry

EMBEDDING_MODELS = {
    # OpenAI
    "text-embedding-ada-002": {"dim": 1536, "provider": "openai"},
    "text-embedding-3-small": {"dim": 1536, "provider": "openai"},
    "text-embedding-3-large": {"dim": 3072, "provider": "openai"},
    
    # Gemini
    "models/embedding-001": {"dim": 768, "provider": "gemini", "multimodal": True},
    
    # Local OSS
    "all-MiniLM-L6-v2": {"dim": 384, "provider": "local"},
    "all-mpnet-base-v2": {"dim": 768, "provider": "local"},
    "instructor-xl": {"dim": 768, "provider": "local"},
    "instructor-large": {"dim": 1024, "provider": "local"},
    
    # Ollama
    "ollama/mxbai-embed-large": {"dim": 1024, "provider": "ollama"},
    "ollama/nomic-embed-text": {"dim": 768, "provider": "ollama"}
}

Unique Features

Gemini Video Embeddings

# Extract embeddings from video content\!
video_embedding = await handler.create_embeddings(
    input=video_bytes,
    model="models/embedding-001",
    input_type="video"
)

# Use for video similarity search, content matching, etc.

Batch Processing

# Efficient batch embedding generation
embeddings = await handler.create_embeddings(
    input=["text1", "text2", "text3"],
    model="text-embedding-3-small"
)

Hybrid Search Support

# Combine different embedding models for hybrid search
text_emb = await handler.create_embeddings(text, model="text-embedding-3-large")
image_emb = await handler.create_embeddings(image, model="models/embedding-001")

Configuration

# Embedding-specific settings
embedding_config:
  cache_embeddings: true
  normalize_vectors: true
  default_model: "text-embedding-3-small"
  local_model_path: "~/.cache/embeddings"

Benefits

  • Unified API for all embedding providers
  • Multimodal support via Gemini (text, image, video)
  • Privacy options with local models
  • Cost optimization by choosing appropriate models
  • Dimension flexibility for different use cases

Use Cases

  1. RAG pipelines - Generate embeddings for document chunks
  2. Semantic search - Find similar content across modalities
  3. Recommendation systems - Compute similarity scores
  4. Clustering - Group similar items
  5. Video search - Search videos by visual content (Gemini)

Provider Comparison

Provider Models Dimensions Multimodal Cost Speed
OpenAI 3 1536-3072 $$ Fast
Gemini 1 768 ✅ (video!) $ Fast
Local OSS Many 384-1024 Free Varies
Ollama Several 768-1024 Free Fast

Note: Anthropic does not currently offer an embeddings API.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions