Skip to content

Build a scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, etc.), includes caching, rate limiting, logging, and monitoring

Notifications You must be signed in to change notification settings

GoUtilsHub/llm-api-gateway

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-api-gateway

Unified LLM API Gateway — aggregates multiple LLM providers behind a single API with caching, rate limiting, and observability.

Quickstart (dev)

  1. docker-compose up --build
  2. Gateway: http://localhost:8080/v1/llm/chat
  3. Admin: http://localhost:3000

Components

  • gateway (Go): edge gateway
  • adapters: provider adapters
  • admin (NestJS): API key & usage dashboard
  • infra: docker-compose / k8s manifests

Running tests

  • Gateway unit tests: cd gateway && go test ./...
  • Admin tests: cd admin && npm test

Unified LLM API Gateway

Project summary

Unified LLM API Gateway a scalable API gateway that aggregates and normalizes calls to multiple LLM backends (OpenAI, HF Inference, self-hosted models), with caching, rate limiting, logging, metrics, and deployment manifests for Docker/Kubernetes.


1) High-level architecture

  • API Gateway (edge): accepts client requests, auth, routing, request transforms, aggregator/fan-out to LLM backends.
  • LLM Adapters (microservices): small services that wrap each provider’s API (OpenAI, Hugging Face, etc.) to expose a unified internal interface.
  • Cache layer: Redis (result caching, cache keys based on prompt+params).
  • Rate limiter: Redis-based leaky-bucket or token-bucket (shared across instances).
  • Auth & Quotas: API keys / JWT + per-key quotas stored in Redis or DB.
  • Observability: Structured logs (JSON), metrics exported in Prometheus format, traces.
  • Deployment: Docker images, Helm charts or Kubernetes manifests, CI builds and image publishing.

2) Repo & workspace layout (monorepo)

llm-api-gateway/
├── README.md
├── LICENSE
├── .github/
│   └── workflows/ci.yml
├── infra/
│   ├── k8s/                  # k8s manifests or Helm charts
│   └── docker-compose.yml
├── gateway/                  # Go API gateway
│   ├── cmd/
│   │   └── server/
│   │       └── main.go
│   ├── internal/
│   │   ├── handlers/
│   │   ├── adapters/
│   │   ├── cache/
│   │   ├── ratelimit/
│   │   └── metrics/
│   ├── go.mod
│   └── Dockerfile
├── adapters/                 # per-provider adapters
│   ├── openai-adapter/
│   └── hf-adapter/
├── admin/                    # NestJS service for keys, dashboard, logs
│   ├── src/
│   ├── package.json
│   └── Dockerfile
└── tooling/
    └── tests/                # e2e test helpers

About

Build a scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, etc.), includes caching, rate limiting, logging, and monitoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 94.6%
  • Dockerfile 5.4%