Enterprise-Grade LLM Gateway
Unified API Proxy for OpenAI, Anthropic, and Compatible LLM Providers
Squirrel is a high-performance, production-ready proxy service that unifies access to multiple Large Language Model (LLM) providers. It acts as an intelligent gateway between your applications and LLM services, providing seamless failover, load balancing, comprehensive observability, and a modern management dashboard — now with first-class OpenAI Responses support and smooth protocol conversion across OpenAI Chat, OpenAI Responses, and Anthropic Messages.
- Single Integration Point: Connect once, access multiple LLM providers through a unified API
- Zero Code Changes: Drop-in replacement compatible with OpenAI and Anthropic SDKs
- Cost Optimization: Route requests intelligently across providers based on rules, priority, or cost
- Production Ready: Built-in retry logic, failover mechanisms, and detailed request logging
- Full Visibility: Track every request with token usage, latency metrics, and cost analytics
- OpenAI Compatible: Full support for
/v1/chat/completions,/v1/completions,/v1/embeddings,/v1/audio/*,/v1/images/* - OpenAI Responses Compatible:
/v1/responseswith streaming and tool-calls - Anthropic Compatible: Native support for
/v1/messagesendpoint - Protocol Conversion: Smoothly convert between OpenAI Chat ↔ OpenAI Responses ↔ Anthropic Messages (requests, responses, and streaming), powered by the built-in
llm_api_converter - Streaming Support: Full Server-Sent Events (SSE) support for real-time responses
- Rule-Based Routing: Route requests based on model name, headers, message content, or token count
- Load Balancing Strategies:
- Round-Robin: Distribute requests evenly across providers
- Priority-Based: Use preferred providers first, fallback to others
- Weight-Based: Distribute by custom weight ratios
- Cost-Based: Automatically select the lowest-priced model based on API pricing
- Model Mapping: Map virtual model names to multiple backend providers
- Automatic Retries: Configurable retry attempts for server errors (HTTP 500+)
- Provider Failover: Seamlessly switch to backup providers on failure
- Timeout Management: Configurable request timeouts with long streaming support (default: 30 minutes)
- Full Request/Response Capture: Complete logging of request and response bodies (including streaming) to help debug issues and optimize AI system performance
- Token Tracking: Automatic token counting using tiktoken
- Latency Metrics: First-byte delay and total response time
- Cost Analytics: Aggregated statistics by time, model, provider, and API key
- Data Sanitization: Automatic redaction of sensitive information in logs
Built with Next.js 16 + TypeScript + shadcn/ui:
- Provider management with connection testing
- Model mapping configuration with rule editor
- API key generation and lifecycle management
- Advanced log viewer with multi-dimensional filtering
- Cost statistics and usage analytics
The fastest way to get started with PostgreSQL:
# Clone the repository
git clone https://github.com/mylxsw/llm-gateway.git
cd llm-gateway
# Start services
docker compose -f docker-compose.prod.yml up -dAccess the dashboard at http://localhost:8000 (or the port you set in LLM_GATEWAY_PORT)
Run with SQLite for simple deployments:
docker run -d \
-p 8000:8000 \
-v $(pwd)/data:/data \
--name llm-gateway \
ghcr.io/mylxsw/llm-gateway:latest- Python 3.12+
- Node.js 18+
- npm (for frontend)
cd backend
# Install dependencies (choose one)
uv sync # Recommended: using uv
pip install -r requirements.txt # Or using pip
# Initialize database
alembic upgrade head
# Start server
uvicorn app.main:app --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Development
npm run dev
# Production build
npm run build && npm run start-
Add a Provider: Navigate to Providers page and add your LLM provider (e.g., OpenAI)
- Set the base URL (e.g.,
https://api.openai.com/v1) - Add your API key
- Select the protocol (OpenAI, OpenAI Responses, or Anthropic)
- Set the base URL (e.g.,
-
Create Model Mapping: Go to Models page and create a mapping
- Define a model name (e.g.,
gpt-4) - Associate it with one or more providers
- Set routing priority/weight
- Define a model name (e.g.,
-
Generate API Key: Create a gateway API key in API Keys page
-
Connect Your Application:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="lgw-your-gateway-api-key"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="lgw-your-gateway-api-key"
)
response = client.responses.create(
model="gpt-4.1-mini",
input="Summarize this in one sentence."
)| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/models |
List available models |
| POST | /v1/chat/completions |
Chat completions |
| POST | /v1/completions |
Text completions |
| POST | /v1/embeddings |
Generate embeddings |
| POST | /v1/audio/speech |
Text-to-speech |
| POST | /v1/audio/transcriptions |
Speech-to-text |
| POST | /v1/audio/translations |
Speech-to-text (translation) |
| POST | /v1/images/generations |
Image generation |
| POST | /v1/responses |
Responses API |
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/messages |
Messages API |
| Resource | Endpoints |
|---|---|
| Providers | GET/POST /api/admin/providers, GET/PUT/DELETE /api/admin/providers/{id} |
| Models | GET/POST /api/admin/models, GET/PUT/DELETE /api/admin/models/{model} |
| API Keys | GET/POST /api/admin/api-keys, GET/PUT/DELETE /api/admin/api-keys/{id} |
| Logs | GET /api/admin/logs, GET /api/admin/logs/stats |
See docs/api.md for complete API documentation.
| Variable | Default | Description |
|---|---|---|
APP_NAME |
LLM Gateway | Application name |
DEBUG |
false | Enable debug mode |
DATABASE_TYPE |
sqlite | Database type: sqlite or postgresql |
DATABASE_URL |
sqlite+aiosqlite:///./llm_gateway.db | Database connection string |
RETRY_MAX_ATTEMPTS |
3 | Max retry attempts for 500+ errors |
RETRY_DELAY_MS |
1000 | Delay between retries (milliseconds) |
HTTP_TIMEOUT |
1800 | Upstream request timeout (seconds) |
API_KEY_PREFIX |
lgw- | Prefix for generated API keys |
API_KEY_LENGTH |
32 | Length of generated API keys |
ADMIN_USERNAME |
- | Admin login username (optional) |
ADMIN_PASSWORD |
- | Admin login password (optional) |
ADMIN_TOKEN_TTL_SECONDS |
86400 | Admin session TTL (24 hours) |
LOG_RETENTION_DAYS |
7 | Log retention period |
LOG_CLEANUP_HOUR |
4 | Log cleanup time (UTC hour) |
LLM_GATEWAY_PORT |
8000 | Host port for Docker Compose |
KV_STORE_TYPE |
database | KV store backend: database or redis |
REDIS_URL |
- | Redis connection URL (when using Redis KV store) |
SQLite (default, simple deployments):
DATABASE_TYPE=sqlite
DATABASE_URL=sqlite+aiosqlite:///./llm_gateway.dbPostgreSQL (recommended for production):
DATABASE_TYPE=postgresql
DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/llm_gatewaySquirrel can proxy requests to any OpenAI or Anthropic compatible API:
| Provider | Protocol | Notes |
|---|---|---|
| OpenAI | OpenAI | Full support including GPT-4, GPT-3.5, embeddings, audio, images |
| OpenAI | OpenAI Responses | Responses API via /v1/responses |
| Anthropic | Anthropic | Claude models via Messages API |
| Azure OpenAI | OpenAI | Use Azure endpoint URL |
| Local Models | OpenAI | Ollama, vLLM, LocalAI, etc. |
| Other Providers | OpenAI/Anthropic | Any compatible API endpoint |
llm-gateway/
├── backend/
│ ├── app/
│ │ ├── api/ # API routes (proxy, admin)
│ │ ├── services/ # Business logic
│ │ ├── providers/ # Protocol adapters
│ │ ├── repositories/ # Data access layer
│ │ ├── db/ # Database models
│ │ ├── domain/ # DTOs and domain models
│ │ ├── rules/ # Rule evaluation engine
│ │ └── common/ # Utilities
│ ├── migrations/ # Alembic migrations
│ └── tests/ # Test suite
├── llm_api_converter/ # Protocol conversion SDK (OpenAI/Responses/Anthropic)
├── frontend/
│ └── src/
│ ├── app/ # Next.js App Router pages
│ ├── components/ # React components
│ └── lib/ # Utilities and API client
├── docker-compose.yml
└── Dockerfile
cd backend
pytestcd backend
# Create new migration
alembic revision --autogenerate -m "description"
# Apply migrations
alembic upgrade headMade with care for the LLM community


