Unified LLM API Gateway is a scalable, extensible platform that aggregates and normalises calls to multiple LLM backends (OpenAI, Hugging Face, Groq, Anthropic, Gemini, and more).
It provides a unified API with built-in caching, rate limiting, authentication, logging, metrics, and production-ready deployment manifests for Docker and Kubernetes.
- API Gateway (Go):
- Accepts client requests
- Handles authentication, routing and request transformation
- Aggregates/fans-out to LLM backends
- LLM Adapters (microservices):
- Wrap each provider’s API (OpenAI, Hugging Face, etc.) with a unified internal interface
- Cache Layer:
- Redis for result caching (prompt+params as cache key)
- Rate Limiter:
- Redis-based leaky-bucket or token-bucket (shared across instances)
- Auth & Quotas:
- API keys / JWT, per-key quotas (Redis or DB)
- Observability:
- Structured logs (JSON), Prometheus metrics, traces
- Deployment:
- Docker images, Helm charts, Kubernetes manifests, CI builds
llm-api-gateway/
├── README.md
├── LICENSE
├── .github/ # CI/CD workflows
├── infra/ # Docker Compose & Kubernetes manifests
│ ├── k8s/
│ └── docker-compose.yml
├── gateway/ # Go API gateway
│ ├── cmd/server/
│ ├── internal/
│ │ ├── handlers/
│ │ ├── adapters/
│ │ ├── cache/
│ │ ├── ratelimit/
│ │ └── metrics/
│ ├── go.mod
│ └── Dockerfile
├── adapters/ # Per-provider adapters (microservices)
│ ├── openai-adapter/
│ └── hf-adapter/
├── admin/ # NestJS admin dashboard (API keys, usage, logs)
│ ├── src/
│ ├── package.json
│ └── Dockerfile
└── tooling/
└── tests/ # e2e test helpers
- Start all services:
docker-compose up --build
- Gateway API:
http://localhost:3020/gateway/query - Admin Dashboard:
http://localhost:3040
- OpenAI (GPT-3.5, GPT-4, GPT-4o, etc.)
- Hugging Face Inference API
- Groq
- OpenRouter
- Anthropic (Claude)
- Gemini (Google)
- More coming soon!
- Unified API: One endpoint for all LLMs
- Authentication: API key/JWT middleware
- Caching: Redis-based, prompt+params as key
- Rate Limiting: Per-key, Redis-backed
- Logging: Structured, JSON logs
- Monitoring: Prometheus metrics endpoint
- Adapters: Microservices for each provider
- Kubernetes & Docker: Production-ready manifests
POST /gateway/query
Authorization: <your-gateway-api-key>
Content-Type: application/json
{
"provider": "openai" | "hf" | "groq" | "openrouter" | "anthropic" | "gemini",
"prompt": "Your prompt here"
}
{
"cached": false,
"response": "LLM output"
}
- Unified
/query
endpoint - OpenAI, Hugging Face, Groq, OpenRouter, Anthropic, Gemini support
- Redis caching
- API key authentication
- Rate limiting
- Logging
- Per-provider adapters as microservices
- Unified internal API for adapters
- Docker Compose & K8s manifests
- Prometheus metrics
- Admin dashboard (NestJS)
- Usage quotas & billing
- Tracing (OpenTelemetry)
- Multi-provider aggregation/fan-out
- Request/response transforms
- Fine-grained quotas & billing
- User/project management
- Webhooks & streaming
- Model selection & fallback
- More adapters (Cohere, Mistral, etc.)
- Add more LLM providers & adapters
- Streaming & webhooks support
- Advanced admin features (usage, billing, analytics)
- Helm charts for K8s
- OpenAPI/Swagger docs
Contributions are welcome! Please open issues or PRs for bugs, features, or improvements.
MIT