LLM Gateway Core

LLM Gateway Core is a production-grade infrastructure component designed to abstract multiple Large Language Model (LLM) providers behind a single, unified API. It implements intelligent routing, distributed caching, atomic rate limiting, and comprehensive observability to provide reliable and cost-effective LLM access.

System Architecture

The gateway is built on a high-performance FastAPI backend, utilizing a provider-agnostic interface that allows for seamless integration of both cloud-based and local model providers.

Core Components

API Layer: FastAPI-based REST API providing standardized chat completion endpoints.
Provider Router: Dynamically selects the optimal model provider based on request hints (online, local, fast, secure).
Redis Integration:
- Distributed Cache: Persistently stores provider responses to reduce latency and API costs.
- Rate Limiter: Implements a token bucket algorithm via Redis Lua scripts for atomic, distributed request throttling.
Monitoring Stack: Full observability with Prometheus for metrics collection and Grafana for visualization.
Streamlit Frontend: A clean, responsive interface for demonstration and testing purposes.

Integrated Providers

The gateway currently supports the following providers:

Google Gemini: High-performance cloud integration for 'online' and 'fast' request modes.
Ollama: Local integration for 'local' and 'secure' request modes, enabling private, on-premise inference.

User Interface

The Streamlit frontend provides a simplified interface for interacting with the gateway, allowing users to select the execution mode and submit queries.

Gemini Integration (Online Mode)

Ollama Integration (Local Mode)

Monitoring and Observability

The system exports detailed metrics to Prometheus, allowing for real-time monitoring of request rates, provider latency, cache performance, and rate limiting status.

Performance Dashboard

Operational Reliability

Distributed Rate Limiting

The gateway ensures system stability by enforcing per-client rate limits. Requests exceeding the defined threshold are rejected with a standard 429 status code.

Getting Started

Prerequisites

Docker and Docker Compose
Google Gemini API Key (for online providers)
Local Ollama instance (for local providers)

Configuration

System configuration is managed via environment variables in a .env file:

PROVIDER_TIMEOUT_SECONDS=60
PROVIDER_MAX_RETRIES=3
GEMINI_API_KEY=your_api_key_here
REDIS_URL=redis://redis:6379/0
OLLAMA_BASE_URL=http://host.docker.internal:11434
API_KEYS=sk-gateway-123

Deployment

Deploy the entire stack using Docker Compose:

docker-compose up -d --build

The services will be available at:

Streamlit Frontend: http://localhost:8501
Gateway API: http://localhost:8000
Grafana Dashboard: http://localhost:3000

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
docs/assets		docs/assets
frontend		frontend
grafana/provisioning		grafana/provisioning
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Gateway Core