Flexible, vendor-agnostic embeddings generation for MCP Gateway Registry's semantic search functionality.
The MCP Gateway Registry provides semantic search capabilities across MCP servers, tools, and AI agents. You can choose from three embedding provider options to power this search:
- Sentence Transformers (Default) - Local models
- OpenAI - Cloud embeddings via API
- Any LiteLLM-supported provider - Amazon Bedrock Titan, Cohere, and 100+ other models
Switch between providers with simple configuration changes - no code modifications required.
- Vendor-agnostic: Switch between embeddings providers with configuration changes
- Local & Cloud Support: Use local models or cloud APIs (OpenAI, Cohere, Amazon Bedrock, etc.)
- Backward Compatible: Works seamlessly with existing FAISS indices
- Easy Configuration: Simple environment variable setup
- Extensible: Easy to add new providers
- AWS Deployable: Terraform support for AWS deployments
Local embedding models that run on your infrastructure.
# In .env
EMBEDDINGS_PROVIDER=sentence-transformers
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDINGS_MODEL_DIMENSIONS=384Characteristics:
- Runs locally on your infrastructure
- No API costs
- No external network calls required
- Requires CPU/GPU resources
- Model files stored locally
- Data stays within your infrastructure
Cloud-based embedding service via OpenAI API.
# In .env
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=openai/text-embedding-ada-002
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_API_KEY=sk-your-openai-api-keyCharacteristics:
- Cloud-based service
- Requires API key
- API costs per 1K tokens
- No local compute resources needed
- Network dependency
- Data sent to OpenAI
Cloud-based embedding service via AWS Bedrock.
# In .env
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=bedrock/amazon.titan-embed-text-v1
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_AWS_REGION=us-east-1
# No API key needed - uses IAMCharacteristics:
- Cloud-based service
- Uses IAM authentication (no API key required)
- Integrates with AWS security model
- API costs apply
- Requires AWS credentials
- Available in select AWS regions
| Variable | Description | Default | Required |
|---|---|---|---|
EMBEDDINGS_PROVIDER |
Provider type: sentence-transformers or litellm |
sentence-transformers |
No |
EMBEDDINGS_MODEL_NAME |
Model identifier | all-MiniLM-L6-v2 |
Yes |
EMBEDDINGS_MODEL_DIMENSIONS |
Embedding dimension | 384 |
Yes |
EMBEDDINGS_API_KEY |
API key for cloud provider (OpenAI, Cohere, etc.) | - | For cloud* |
EMBEDDINGS_API_BASE |
Custom API endpoint (LiteLLM only) | - | No |
EMBEDDINGS_AWS_REGION |
AWS region for Bedrock (LiteLLM only) | - | For Bedrock |
*Not required for AWS Bedrock - use standard AWS credential chain (IAM roles, environment variables, ~/.aws/credentials)
For AWS ECS deployments, configure embeddings in your terraform.tfvars:
# Local embeddings - no additional configuration needed
# Uses defaults: sentence-transformers with all-MiniLM-L6-v2embeddings_provider = "litellm"
embeddings_model_name = "openai/text-embedding-ada-002"
embeddings_model_dimensions = 1536
embeddings_api_key = "sk-proj-YOUR-OPENAI-API-KEY"embeddings_provider = "litellm"
embeddings_model_name = "bedrock/amazon.titan-embed-text-v1"
embeddings_model_dimensions = 1536
embeddings_aws_region = "us-east-1"
embeddings_api_key = "" # Empty for Bedrock (uses IAM)See terraform/aws-ecs/terraform.tfvars.example for complete examples.
| Model | Dimensions | Description |
|---|---|---|
all-MiniLM-L6-v2 |
384 | Fast, lightweight (default) |
all-mpnet-base-v2 |
768 | High quality |
paraphrase-multilingual-MiniLM-L12-v2 |
384 | Multilingual |
Any model from Hugging Face sentence-transformers is supported.
LiteLLM supports 100+ embedding models from various providers:
openai/text-embedding-3-small(1536 dimensions)openai/text-embedding-3-large(3072 dimensions)openai/text-embedding-ada-002(1536 dimensions)
cohere/embed-english-v3.0(1024 dimensions)cohere/embed-multilingual-v3.0(1024 dimensions)
bedrock/amazon.titan-embed-text-v1(1536 dimensions)bedrock/cohere.embed-english-v3(1024 dimensions)bedrock/cohere.embed-multilingual-v3(1024 dimensions)
- Azure OpenAI
- Anthropic (Claude)
- Google Vertex AI
- Hugging Face Inference API
- And 100+ more via LiteLLM
When you switch embedding providers or models with different dimensions, the registry automatically:
- Detects dimension mismatch
- Rebuilds the FAISS index
- Regenerates embeddings for all registered items
Example logs when switching from sentence-transformers (384) to OpenAI (1536):
WARNING: Embedding dimension mismatch detected
Expected: 384 (from existing index)
Got: 1536 (from current model)
Rebuilding FAISS index with new dimensions...
Regenerating embeddings for all items...
Index rebuild complete
Just update your environment variables or Terraform configuration:
# From
EMBEDDINGS_PROVIDER=sentence-transformers
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDINGS_MODEL_DIMENSIONS=384
# To
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=openai/text-embedding-ada-002
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_API_KEY=sk-your-keyRestart the service and the index will be automatically rebuilt.
For Amazon Bedrock embeddings, ensure your ECS task role has the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v1"
]
}
]
}IAM Roles (Recommended for ECS/EC2/EKS)
# No additional configuration needed
# ECS task, EC2 instance, or EKS pod automatically uses attached IAM roleEmbeddingsClient (Abstract Base Class)
├── SentenceTransformersClient (Local models)
└── LiteLLMClient (Cloud APIs via LiteLLM)
The embeddings module integrates seamlessly with the FAISS search service:
# In registry/search/service.py
from registry.embeddings import create_embeddings_client
class FaissService:
async def _load_embedding_model(self):
self.embedding_model = create_embeddings_client(
provider=settings.embeddings_provider,
model_name=settings.embeddings_model_name,
api_key=settings.embeddings_api_key,
aws_region=settings.embeddings_aws_region,
embedding_dimension=settings.embeddings_model_dimensions,
)- Runs on your infrastructure (CPU/GPU)
- No external API calls
- No per-request costs
- Model files stored locally
- Network-independent operation
- Runs on provider infrastructure
- Requires network connectivity
- API costs apply (varies by provider)
- No local compute requirements
- Data transmitted to provider
If the embedding model fails to load or is unreachable (e.g., invalid model name, expired API key, network failure), the search system automatically falls back to lexical-only search instead of returning errors.
What happens:
- The embeddings client caches the load error (
_load_error) to avoid repeated download/API attempts - The search repository detects the failure and sets
_embedding_unavailable = True - All subsequent searches use keyword matching (regex on path, name, description, tags, tools) instead of vector similarity
- Servers and agents are still indexed, but without embeddings (stored with empty vectors)
- The API response includes
"search_mode": "lexical-only"to indicate reduced search quality
How to detect:
- Check the API response
search_modefield:"hybrid"(normal) vs."lexical-only"(fallback) - Look for log warnings:
"Embedding model unavailable, falling back to lexical-only search" - During indexing:
"Embedding model unavailable, indexing '<name>' without embeddings"
How to recover:
Fix the embedding configuration and restart the service. On restart, the error cache is cleared and the system will attempt to load the model again. If successful, search returns to full hybrid mode automatically.
See Hybrid Search Architecture for details on lexical-only scoring.
Failed to load SentenceTransformer model: sentence-transformers/my-model is not a local folder
and is not a valid model identifier listed on 'https://huggingface.co/models'
Solution: Verify the model name in EMBEDDINGS_MODEL_NAME is correct. Check the Hugging Face model hub for valid names. The system will continue operating with lexical-only search until the model is fixed.
RuntimeError: LiteLLM is not installed. Install it with: uv add litellm
Solution:
uv add litellmWARNING: Embedding dimension mismatch: expected 384, got 1536
Solution: Update EMBEDDINGS_MODEL_DIMENSIONS to match your model's actual output dimension. The system will automatically rebuild the index.
OpenAI:
# Verify API key is set correctly
echo $EMBEDDINGS_API_KEY
# Should start with sk-Bedrock:
# Verify AWS credentials
aws sts get-caller-identity
# Check Bedrock access
aws bedrock list-foundation-models --region us-east-1If using AWS ECS and Bedrock, ensure the task execution role has access to the embeddings API key secret:
# Check IAM policy in terraform/aws-ecs/modules/mcp-gateway/iam.tf
# Should include: aws_secretsmanager_secret.embeddings_api_key.arnfrom registry.embeddings import create_embeddings_client
client = create_embeddings_client(
provider: str, # "sentence-transformers" or "litellm"
model_name: str, # Model identifier
api_key: Optional[str] = None, # API key (litellm only)
aws_region: Optional[str] = None, # AWS region (Bedrock only)
embedding_dimension: Optional[int] = None,
)Generate Embeddings:
embeddings = client.encode(["text1", "text2"])
# Returns: numpy array of shape (n_texts, embedding_dim)Get Dimension:
dim = client.get_embedding_dimension()
# Returns: int (e.g., 384, 1536)- Choose the provider that matches your deployment requirements
- Consider IAM authentication if deploying on AWS
- Monitor costs when using cloud APIs - implement caching if needed
- Keep dimension consistent - changing models requires index rebuild
- Test search results after switching providers to ensure they meet your requirements
- LiteLLM Documentation
- OpenAI Embeddings Guide
- Amazon Bedrock Embeddings
- Sentence Transformers Models
- FAISS Search Implementation
To add a new embeddings provider:
- Create a new client class inheriting from
EmbeddingsClient - Implement
encode()andget_embedding_dimension()methods - Update
create_embeddings_client()factory function - Add configuration options to
registry/core/config.py - Update this documentation
Apache 2.0 - See LICENSE file for details