-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Request
Add comprehensive embeddings support across multiple providers, including local open-source models.
Motivation
- Vector embeddings are crucial for RAG, semantic search, and similarity matching
- Different providers excel at different embedding tasks
- Gemini uniquely supports multimodal embeddings (text, image, video)
- Local embeddings for privacy-sensitive applications
Proposed Implementation
1. Base Embeddings Interface
# ai_proxy_core/embeddings.py
class EmbeddingsHandler:
async def create_embeddings(
self,
input: Union[str, List[str], bytes],
model: str = "text-embedding-ada-002",
input_type: str = "text" # text, image, video
) -> Dict[str, Any]:
# Route to appropriate provider
if model.startswith("text-embedding"):
return await self._openai_embeddings(input, model)
elif model.startswith("models/embedding"):
return await self._gemini_embeddings(input, model)
elif model in LOCAL_MODELS:
return await self._local_embeddings(input, model)2. OpenAI Embeddings
async def _openai_embeddings(self, input, model):
response = await openai_client.embeddings.create(
input=input,
model=model # text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
)
return self._format_response(response)3. Gemini Multimodal Embeddings
async def _gemini_embeddings(self, input, model):
# Gemini supports text, image, AND video embeddings\!
if isinstance(input, str):
content = types.Content(parts=[types.Part.from_text(input)])
elif self._is_image(input):
content = types.Content(parts=[types.Part.from_image(input)])
elif self._is_video(input):
content = types.Content(parts=[types.Part.from_video(input)])
response = await gemini_client.models.embed_content(
model=model, # models/embedding-001
content=content
)
return self._format_response(response)4. Local OSS Embeddings
# Support for sentence-transformers, instructor-embeddings, etc.
async def _local_embeddings(self, input, model):
if model == "all-MiniLM-L6-v2":
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(model)
embeddings = model.encode(input)
elif model.startswith("instructor-"):
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR(model)
embeddings = model.encode(input)
elif model.startswith("ollama/"):
# Use Ollama's embedding endpoint
embeddings = await self._ollama_embeddings(input, model)
return {"embeddings": embeddings.tolist()}5. Model Registry
EMBEDDING_MODELS = {
# OpenAI
"text-embedding-ada-002": {"dim": 1536, "provider": "openai"},
"text-embedding-3-small": {"dim": 1536, "provider": "openai"},
"text-embedding-3-large": {"dim": 3072, "provider": "openai"},
# Gemini
"models/embedding-001": {"dim": 768, "provider": "gemini", "multimodal": True},
# Local OSS
"all-MiniLM-L6-v2": {"dim": 384, "provider": "local"},
"all-mpnet-base-v2": {"dim": 768, "provider": "local"},
"instructor-xl": {"dim": 768, "provider": "local"},
"instructor-large": {"dim": 1024, "provider": "local"},
# Ollama
"ollama/mxbai-embed-large": {"dim": 1024, "provider": "ollama"},
"ollama/nomic-embed-text": {"dim": 768, "provider": "ollama"}
}Unique Features
Gemini Video Embeddings
# Extract embeddings from video content\!
video_embedding = await handler.create_embeddings(
input=video_bytes,
model="models/embedding-001",
input_type="video"
)
# Use for video similarity search, content matching, etc.Batch Processing
# Efficient batch embedding generation
embeddings = await handler.create_embeddings(
input=["text1", "text2", "text3"],
model="text-embedding-3-small"
)Hybrid Search Support
# Combine different embedding models for hybrid search
text_emb = await handler.create_embeddings(text, model="text-embedding-3-large")
image_emb = await handler.create_embeddings(image, model="models/embedding-001")Configuration
# Embedding-specific settings
embedding_config:
cache_embeddings: true
normalize_vectors: true
default_model: "text-embedding-3-small"
local_model_path: "~/.cache/embeddings"Benefits
- Unified API for all embedding providers
- Multimodal support via Gemini (text, image, video)
- Privacy options with local models
- Cost optimization by choosing appropriate models
- Dimension flexibility for different use cases
Use Cases
- RAG pipelines - Generate embeddings for document chunks
- Semantic search - Find similar content across modalities
- Recommendation systems - Compute similarity scores
- Clustering - Group similar items
- Video search - Search videos by visual content (Gemini)
Provider Comparison
| Provider | Models | Dimensions | Multimodal | Cost | Speed |
|---|---|---|---|---|---|
| OpenAI | 3 | 1536-3072 | ❌ | $$ | Fast |
| Gemini | 1 | 768 | ✅ (video!) | $ | Fast |
| Local OSS | Many | 384-1024 | ❌ | Free | Varies |
| Ollama | Several | 768-1024 | ❌ | Free | Fast |
Note: Anthropic does not currently offer an embeddings API.
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request