Embeddings Configuration

Flexible, vendor-agnostic embeddings generation for MCP Gateway Registry's semantic search functionality.

Overview

The MCP Gateway Registry provides semantic search capabilities across MCP servers, tools, and AI agents. You can choose from three embedding provider options to power this search:

Sentence Transformers (Default) - Local models
OpenAI - Cloud embeddings via API
Any LiteLLM-supported provider - Amazon Bedrock Titan, Cohere, and 100+ other models

Switch between providers with simple configuration changes - no code modifications required.

Features

Vendor-agnostic: Switch between embeddings providers with configuration changes
Local & Cloud Support: Use local models or cloud APIs (OpenAI, Cohere, Amazon Bedrock, etc.)
Backward Compatible: Works seamlessly with existing FAISS indices
Easy Configuration: Simple environment variable setup
Extensible: Easy to add new providers
AWS Deployable: Terraform support for AWS deployments

Quick Start

Option 1: Sentence Transformers (Default)

Local embedding models that run on your infrastructure.

# In .env
EMBEDDINGS_PROVIDER=sentence-transformers
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDINGS_MODEL_DIMENSIONS=384

Characteristics:

Runs locally on your infrastructure
No API costs
No external network calls required
Requires CPU/GPU resources
Model files stored locally
Data stays within your infrastructure

Option 2: OpenAI

Cloud-based embedding service via OpenAI API.

# In .env
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=openai/text-embedding-ada-002
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_API_KEY=sk-your-openai-api-key

Characteristics:

Cloud-based service
Requires API key
API costs per 1K tokens
No local compute resources needed
Network dependency
Data sent to OpenAI

Option 3: Amazon Bedrock Titan

Cloud-based embedding service via AWS Bedrock.

# In .env
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=bedrock/amazon.titan-embed-text-v1
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_AWS_REGION=us-east-1
# No API key needed - uses IAM

Characteristics:

Cloud-based service
Uses IAM authentication (no API key required)
Integrates with AWS security model
API costs apply
Requires AWS credentials
Available in select AWS regions

Configuration

Environment Variables

Variable	Description	Default	Required
`EMBEDDINGS_PROVIDER`	Provider type: `sentence-transformers` or `litellm`	`sentence-transformers`	No
`EMBEDDINGS_MODEL_NAME`	Model identifier	`all-MiniLM-L6-v2`	Yes
`EMBEDDINGS_MODEL_DIMENSIONS`	Embedding dimension	`384`	Yes
`EMBEDDINGS_API_KEY`	API key for cloud provider (OpenAI, Cohere, etc.)	-	For cloud*
`EMBEDDINGS_API_BASE`	Custom API endpoint (LiteLLM only)	-	No
`EMBEDDINGS_AWS_REGION`	AWS region for Bedrock (LiteLLM only)	-	For Bedrock

*Not required for AWS Bedrock - use standard AWS credential chain (IAM roles, environment variables, ~/.aws/credentials)

Terraform Configuration

For AWS ECS deployments, configure embeddings in your terraform.tfvars:

Using Sentence Transformers (Default)

# Local embeddings - no additional configuration needed
# Uses defaults: sentence-transformers with all-MiniLM-L6-v2

Using OpenAI

embeddings_provider         = "litellm"
embeddings_model_name       = "openai/text-embedding-ada-002"
embeddings_model_dimensions = 1536
embeddings_api_key          = "sk-proj-YOUR-OPENAI-API-KEY"

Using Amazon Bedrock

embeddings_provider         = "litellm"
embeddings_model_name       = "bedrock/amazon.titan-embed-text-v1"
embeddings_model_dimensions = 1536
embeddings_aws_region       = "us-east-1"
embeddings_api_key          = ""  # Empty for Bedrock (uses IAM)

See terraform/aws-ecs/terraform.tfvars.example for complete examples.

Supported Models

Sentence Transformers (Local)

Model	Dimensions	Description
`all-MiniLM-L6-v2`	384	Fast, lightweight (default)
`all-mpnet-base-v2`	768	High quality
`paraphrase-multilingual-MiniLM-L12-v2`	384	Multilingual

Any model from Hugging Face sentence-transformers is supported.

LiteLLM (Cloud-based)

LiteLLM supports 100+ embedding models from various providers:

OpenAI

openai/text-embedding-3-small (1536 dimensions)
openai/text-embedding-3-large (3072 dimensions)
openai/text-embedding-ada-002 (1536 dimensions)

Cohere

cohere/embed-english-v3.0 (1024 dimensions)
cohere/embed-multilingual-v3.0 (1024 dimensions)

Amazon Bedrock

bedrock/amazon.titan-embed-text-v1 (1536 dimensions)
bedrock/cohere.embed-english-v3 (1024 dimensions)
bedrock/cohere.embed-multilingual-v3 (1024 dimensions)

Other Providers

Azure OpenAI
Anthropic (Claude)
Google Vertex AI
Hugging Face Inference API
And 100+ more via LiteLLM

Migration Between Providers

Switching Providers

When you switch embedding providers or models with different dimensions, the registry automatically:

Detects dimension mismatch
Rebuilds the FAISS index
Regenerates embeddings for all registered items

Example logs when switching from sentence-transformers (384) to OpenAI (1536):

WARNING: Embedding dimension mismatch detected
  Expected: 384 (from existing index)
  Got: 1536 (from current model)
Rebuilding FAISS index with new dimensions...
Regenerating embeddings for all items...
Index rebuild complete

No Code Changes Required

Just update your environment variables or Terraform configuration:

# From
EMBEDDINGS_PROVIDER=sentence-transformers
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDINGS_MODEL_DIMENSIONS=384

# To
EMBEDDINGS_PROVIDER=litellm
EMBEDDINGS_MODEL_NAME=openai/text-embedding-ada-002
EMBEDDINGS_MODEL_DIMENSIONS=1536
EMBEDDINGS_API_KEY=sk-your-key

Restart the service and the index will be automatically rebuilt.

AWS Bedrock Setup

IAM Permissions

For Amazon Bedrock embeddings, ensure your ECS task role has the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v1"
      ]
    }
  ]
}

Authentication Methods

IAM Roles (Recommended for ECS/EC2/EKS)

# No additional configuration needed
# ECS task, EC2 instance, or EKS pod automatically uses attached IAM role

Architecture

Embeddings Module Design

EmbeddingsClient (Abstract Base Class)
├── SentenceTransformersClient (Local models)
└── LiteLLMClient (Cloud APIs via LiteLLM)

Integration with FAISS Search

The embeddings module integrates seamlessly with the FAISS search service:

# In registry/search/service.py
from registry.embeddings import create_embeddings_client

class FaissService:
    async def _load_embedding_model(self):
        self.embedding_model = create_embeddings_client(
            provider=settings.embeddings_provider,
            model_name=settings.embeddings_model_name,
            api_key=settings.embeddings_api_key,
            aws_region=settings.embeddings_aws_region,
            embedding_dimension=settings.embeddings_model_dimensions,
        )

Performance Considerations

Local Models (Sentence Transformers)

Runs on your infrastructure (CPU/GPU)
No external API calls
No per-request costs
Model files stored locally
Network-independent operation

Cloud APIs (LiteLLM)

Runs on provider infrastructure
Requires network connectivity
API costs apply (varies by provider)
No local compute requirements
Data transmitted to provider

Graceful Degradation

Lexical Fallback When Model Unavailable

If the embedding model fails to load or is unreachable (e.g., invalid model name, expired API key, network failure), the search system automatically falls back to lexical-only search instead of returning errors.

What happens:

The embeddings client caches the load error (_load_error) to avoid repeated download/API attempts
The search repository detects the failure and sets _embedding_unavailable = True
All subsequent searches use keyword matching (regex on path, name, description, tags, tools) instead of vector similarity
Servers and agents are still indexed, but without embeddings (stored with empty vectors)
The API response includes "search_mode": "lexical-only" to indicate reduced search quality

How to detect:

Check the API response search_mode field: "hybrid" (normal) vs. "lexical-only" (fallback)
Look for log warnings: "Embedding model unavailable, falling back to lexical-only search"
During indexing: "Embedding model unavailable, indexing '<name>' without embeddings"

How to recover:

Fix the embedding configuration and restart the service. On restart, the error cache is cleared and the system will attempt to load the model again. If successful, search returns to full hybrid mode automatically.

See Hybrid Search Architecture for details on lexical-only scoring.

Troubleshooting

Embedding Model Not Found

Failed to load SentenceTransformer model: sentence-transformers/my-model is not a local folder
and is not a valid model identifier listed on 'https://huggingface.co/models'

Solution: Verify the model name in EMBEDDINGS_MODEL_NAME is correct. Check the Hugging Face model hub for valid names. The system will continue operating with lexical-only search until the model is fixed.

LiteLLM Not Installed

RuntimeError: LiteLLM is not installed. Install it with: uv add litellm

Solution:

uv add litellm

Dimension Mismatch

WARNING: Embedding dimension mismatch: expected 384, got 1536

Solution: Update EMBEDDINGS_MODEL_DIMENSIONS to match your model's actual output dimension. The system will automatically rebuild the index.

API Authentication Errors

OpenAI:

# Verify API key is set correctly
echo $EMBEDDINGS_API_KEY
# Should start with sk-

Bedrock:

# Verify AWS credentials
aws sts get-caller-identity

# Check Bedrock access
aws bedrock list-foundation-models --region us-east-1

Missing IAM Permissions

If using AWS ECS and Bedrock, ensure the task execution role has access to the embeddings API key secret:

# Check IAM policy in terraform/aws-ecs/modules/mcp-gateway/iam.tf
# Should include: aws_secretsmanager_secret.embeddings_api_key.arn

API Reference

Factory Function

from registry.embeddings import create_embeddings_client

client = create_embeddings_client(
    provider: str,                    # "sentence-transformers" or "litellm"
    model_name: str,                  # Model identifier
    api_key: Optional[str] = None,    # API key (litellm only)
    aws_region: Optional[str] = None, # AWS region (Bedrock only)
    embedding_dimension: Optional[int] = None,
)

Client Methods

Generate Embeddings:

embeddings = client.encode(["text1", "text2"])
# Returns: numpy array of shape (n_texts, embedding_dim)

Get Dimension:

dim = client.get_embedding_dimension()
# Returns: int (e.g., 384, 1536)

Best Practices

Choose the provider that matches your deployment requirements
Consider IAM authentication if deploying on AWS
Monitor costs when using cloud APIs - implement caching if needed
Keep dimension consistent - changing models requires index rebuild
Test search results after switching providers to ensure they meet your requirements

Contributing

To add a new embeddings provider:

Create a new client class inheriting from EmbeddingsClient
Implement encode() and get_embedding_dimension() methods
Update create_embeddings_client() factory function
Add configuration options to registry/core/config.py
Update this documentation

License

Apache 2.0 - See LICENSE file for details

FilesExpand file tree

embeddings.md

Latest commit

History

embeddings.md

File metadata and controls

Embeddings Configuration

Overview

Features

Quick Start

Option 1: Sentence Transformers (Default)

Option 2: OpenAI

Option 3: Amazon Bedrock Titan

Configuration

Environment Variables

Terraform Configuration

Using Sentence Transformers (Default)

Using OpenAI

Using Amazon Bedrock

Supported Models

Sentence Transformers (Local)

LiteLLM (Cloud-based)

OpenAI

Cohere

Amazon Bedrock

Other Providers

Migration Between Providers

Switching Providers

No Code Changes Required

AWS Bedrock Setup

IAM Permissions

Authentication Methods

Architecture

Embeddings Module Design

Integration with FAISS Search

Performance Considerations

Local Models (Sentence Transformers)

Cloud APIs (LiteLLM)

Graceful Degradation

Lexical Fallback When Model Unavailable

Troubleshooting

Embedding Model Not Found

LiteLLM Not Installed

Dimension Mismatch

API Authentication Errors

Missing IAM Permissions

API Reference

Factory Function

Client Methods

Best Practices

Further Reading

Contributing

License