Skip to content

Scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, Groq, Anthropic, Gemini, etc.), includes caching, rate limiting, logging, monitoring and production-ready deployment.

Notifications You must be signed in to change notification settings

JawherKl/llm-api-gateway

Repository files navigation

🚀 Unified LLM API Gateway

Repository Size Last Commit Issues Forks Stars

Gateway Banner


✨ Overview

Unified LLM API Gateway is a scalable, extensible platform that aggregates and normalises calls to multiple LLM backends (OpenAI, Hugging Face, Groq, Anthropic, Gemini, and more).
It provides a unified API with built-in caching, rate limiting, authentication, logging, metrics, and production-ready deployment manifests for Docker and Kubernetes.


Go Version Gin Framework OpenRouter AI License: MIT


🏗️ Architecture

  • API Gateway (Go):
    • Accepts client requests
    • Handles authentication, routing and request transformation
    • Aggregates/fans-out to LLM backends
  • LLM Adapters (microservices):
    • Wrap each provider’s API (OpenAI, Hugging Face, etc.) with a unified internal interface
  • Cache Layer:
    • Redis for result caching (prompt+params as cache key)
  • Rate Limiter:
    • Redis-based leaky-bucket or token-bucket (shared across instances)
  • Auth & Quotas:
    • API keys / JWT, per-key quotas (Redis or DB)
  • Observability:
    • Structured logs (JSON), Prometheus metrics, traces
  • Deployment:
    • Docker images, Helm charts, Kubernetes manifests, CI builds

📁 Monorepo Layout

llm-api-gateway/
├── README.md
├── LICENSE
├── .github/           # CI/CD workflows
├── infra/             # Docker Compose & Kubernetes manifests
│   ├── k8s/
│   └── docker-compose.yml
├── gateway/           # Go API gateway
│   ├── cmd/server/
│   ├── internal/
│   │   ├── handlers/
│   │   ├── adapters/
│   │   ├── cache/
│   │   ├── ratelimit/
│   │   └── metrics/
│   ├── go.mod
│   └── Dockerfile
├── adapters/          # Per-provider adapters (microservices)
│   ├── openai-adapter/
│   └── hf-adapter/
├── admin/             # NestJS admin dashboard (API keys, usage, logs)
│   ├── src/
│   ├── package.json
│   └── Dockerfile
└── tooling/
    └── tests/         # e2e test helpers

⚡ Quickstart

  1. Start all services:
    docker-compose up --build
  2. Gateway API:
    http://localhost:3020/gateway/query
  3. Admin Dashboard:
    http://localhost:3040

🔌 Supported Providers

  • OpenAI (GPT-3.5, GPT-4, GPT-4o, etc.)
  • Hugging Face Inference API
  • Groq
  • OpenRouter
  • Anthropic (Claude)
  • Gemini (Google)
  • More coming soon!

🛡️ Features

  • Unified API: One endpoint for all LLMs
  • Authentication: API key/JWT middleware
  • Caching: Redis-based, prompt+params as key
  • Rate Limiting: Per-key, Redis-backed
  • Logging: Structured, JSON logs
  • Monitoring: Prometheus metrics endpoint
  • Adapters: Microservices for each provider
  • Kubernetes & Docker: Production-ready manifests

🧑‍💻 API Usage

Request

POST /gateway/query
Authorization: <your-gateway-api-key>
Content-Type: application/json

{
  "provider": "openai" | "hf" | "groq" | "openrouter" | "anthropic" | "gemini",
  "prompt": "Your prompt here"
}

Response

{
  "cached": false,
  "response": "LLM output"
}

🚦 Development Phases

Phase 1: Core Gateway

  • Unified /query endpoint
  • OpenAI, Hugging Face, Groq, OpenRouter, Anthropic, Gemini support
  • Redis caching
  • API key authentication
  • Rate limiting
  • Logging

Phase 2: Adapters & Extensibility

  • Per-provider adapters as microservices
  • Unified internal API for adapters
  • Docker Compose & K8s manifests

Phase 3: Observability & Admin

  • Prometheus metrics
  • Admin dashboard (NestJS)
  • Usage quotas & billing
  • Tracing (OpenTelemetry)

Phase 4: Advanced Features (Planned)

  • Multi-provider aggregation/fan-out
  • Request/response transforms
  • Fine-grained quotas & billing
  • User/project management
  • Webhooks & streaming
  • Model selection & fallback
  • More adapters (Cohere, Mistral, etc.)

📈 Roadmap

  • Add more LLM providers & adapters
  • Streaming & webhooks support
  • Advanced admin features (usage, billing, analytics)
  • Helm charts for K8s
  • OpenAPI/Swagger docs

🤝 Contributing

Contributions are welcome! Please open issues or PRs for bugs, features, or improvements.


📄 License

MIT

About

Scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, Groq, Anthropic, Gemini, etc.), includes caching, rate limiting, logging, monitoring and production-ready deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published