LLM Gateway Core is a production-grade infrastructure component designed to abstract multiple Large Language Model (LLM) providers behind a single, unified API. It implements intelligent routing, distributed caching, atomic rate limiting, and comprehensive observability to provide reliable and cost-effective LLM access.
The gateway is built on a high-performance FastAPI backend, utilizing a provider-agnostic interface that allows for seamless integration of both cloud-based and local model providers.
- API Layer: FastAPI-based REST API providing standardized chat completion endpoints.
- Provider Router: Dynamically selects the optimal model provider based on request hints (online, local, fast, secure).
- Redis Integration:
- Distributed Cache: Persistently stores provider responses to reduce latency and API costs.
- Rate Limiter: Implements a token bucket algorithm via Redis Lua scripts for atomic, distributed request throttling.
- Monitoring Stack: Full observability with Prometheus for metrics collection and Grafana for visualization.
- Streamlit Frontend: A clean, responsive interface for demonstration and testing purposes.
The gateway currently supports the following providers:
- Google Gemini: High-performance cloud integration for 'online' and 'fast' request modes.
- Ollama: Local integration for 'local' and 'secure' request modes, enabling private, on-premise inference.
The Streamlit frontend provides a simplified interface for interacting with the gateway, allowing users to select the execution mode and submit queries.
The system exports detailed metrics to Prometheus, allowing for real-time monitoring of request rates, provider latency, cache performance, and rate limiting status.
The gateway ensures system stability by enforcing per-client rate limits. Requests exceeding the defined threshold are rejected with a standard 429 status code.
- Docker and Docker Compose
- Google Gemini API Key (for online providers)
- Local Ollama instance (for local providers)
System configuration is managed via environment variables in a .env file:
PROVIDER_TIMEOUT_SECONDS=60
PROVIDER_MAX_RETRIES=3
GEMINI_API_KEY=your_api_key_here
REDIS_URL=redis://redis:6379/0
OLLAMA_BASE_URL=http://host.docker.internal:11434
API_KEYS=sk-gateway-123Deploy the entire stack using Docker Compose:
docker-compose up -d --buildThe services will be available at:
- Streamlit Frontend: http://localhost:8501
- Gateway API: http://localhost:8000
- Grafana Dashboard: http://localhost:3000



