A Rails engine that turns your app into an OpenAI-compatible API gateway. Any provider RubyLLM supports (OpenAI, Anthropic, Gemini, DeepSeek, Mistral, Ollama, ...) gets exposed through a single endpoint with automatic fallback, multi-key routing, rate limiting, caching, and a full admin dashboard.
Point Cursor, Windsurf, Continue.dev, OpenCode, or any OpenAI-compatible tool at your gateway and go.
gem 'ruby_llm-gateway'bundle install
rails generate ruby_llm_gateway:install
rails db:migrateThe generator creates 6 migrations (gateway_api_keys, gateway_usage_logs, gateway_provider_keys, gateway_provider_settings, gateway_model_chains, gateway_audit_logs) and an initializer.
# config/routes.rb
mount RubyLLM::Gateway::Engine, at: '/llm'| Method | Path | Description |
|---|---|---|
POST |
/llm/v1/chat/completions |
Chat completions (sync + streaming + tool calling) |
POST |
/llm/v1/embeddings |
Text embeddings |
GET |
/llm/v1/models |
List available models |
GET |
/llm/v1/models/:id |
Single model details |
| Path | Description |
|---|---|
/llm/admin |
Dashboard — provider keys, routing, fallback chains, API keys, usage stats |
# config/initializers/ruby_llm_gateway.rb
RubyLLM::Gateway.configure do |config|
# Gateway API keys
config.api_key_prefix = "sk-gw-"
config.default_rate_limit_rpm = 60
# Logging & tracking
config.log_requests = true
config.cost_tracking_enabled = true
config.audit_logging_enabled = false
# Rate limiting
config.rate_limiting_enabled = true
# Fallback chains
config.fallback_enabled = true
# Response caching
config.cache_enabled = false
config.cache_ttl = 3600 # seconds
# Circuit breaker
config.circuit_breaker_threshold = 5 # failures before open
config.circuit_breaker_cooldown = 60 # seconds before half-open
# Model aliasing
config.model_aliases = {
"fast" => "gpt-4o-mini",
"smart" => "claude-sonnet-4-20250514",
"code" => "deepseek-coder"
}
# Security
config.max_request_body_size = 10_485_760 # 10MB
# Webhooks (HMAC-signed)
config.webhooks = [
{ url: "https://your-app.com/hooks/llm", secret: "whsec_...", events: [:budget_exceeded, :provider_down] }
]
# Admin authentication
config.admin_authenticator = ->(request) {
# Return true/false. Example with Devise:
# request.env['warden'].user&.admin?
false
}
endManage backend provider credentials through the admin dashboard or Rails console. Provider keys are encrypted at rest via Rails encrypts.
# Add an OpenAI key
RubyLLM::Gateway::ProviderKey.create!(
provider: "openai",
name: "Production Key #1",
api_key: "sk-proj-..."
)
# Add an Anthropic key with custom base URL
RubyLLM::Gateway::ProviderKey.create!(
provider: "anthropic",
name: "Anthropic Main",
api_key: "sk-ant-...",
api_base: "https://custom-proxy.example.com"
)
# Multiple keys per provider enable load balancing
RubyLLM::Gateway::ProviderKey.create!(
provider: "openai",
name: "Production Key #2",
api_key: "sk-proj-...",
position: 1
)When a provider has multiple keys, choose how to distribute requests:
| Strategy | Description |
|---|---|
round_robin |
Rotate evenly across keys (default) |
fill_first |
Use first healthy key until it fails |
least_used |
Pick the key with fewest recent requests |
cost_optimized |
Pick the key with lowest recent cost |
random |
Random selection |
RubyLLM::Gateway::ProviderSetting.for_provider("openai").update!(routing_strategy: :round_robin)key = RubyLLM::Gateway::ProviderKey.find(1)
result = RubyLLM::Gateway::ConnectionTester.new(key).test
result[:success] # true/false
result[:models] # ["gpt-4o", "gpt-4o-mini", ...]Configure automatic failover across providers. When the primary model fails, the gateway tries the next in the chain.
chain = RubyLLM::Gateway::ModelChain.for("default_model")
chain.add_entry(provider: "anthropic", model: "claude-sonnet-4-20250514")
chain.add_entry(provider: "openai", model: "gpt-4o")
chain.add_entry(provider: "gemini", model: "gemini-2.0-flash")
# Reorder, toggle, remove
chain.reorder_entries([2, 0, 1])
chain.toggle_entry(index: 1)
chain.remove_entry(index: 2)Chain types: default_model, embedding, image, moderation, transcription, audio.
Issue keys to users/teams for accessing the gateway:
api_key, raw_key = RubyLLM::Gateway::ApiKey.generate(
name: "Dev Team Key",
user: current_user, # polymorphic (optional)
allowed_models: ["gpt-4o", "claude-*"], # wildcard patterns
rate_limit_rpm: 120, # requests per minute
model_aliases: { "default" => "gpt-4o-mini"} # per-key aliases
)
puts raw_key # sk-gw-abc123... (save this! shown only once)
# Budget limits
api_key.update!(budget_limit_cents: 5000, budget_period: "monthly")
# IP restrictions
api_key.update!(allowed_ips: ["10.0.0.0/8"], blocked_ips: ["10.0.0.99"])
# Revoke
api_key.update!(active: false)Three-tier resolution: per-key alias → global alias → raw model ID.
# Global aliases (config)
config.model_aliases = { "fast" => "gpt-4o-mini", "smart" => "claude-sonnet-4-20250514" }
# Per-key aliases (override global)
api_key.update!(model_aliases: { "fast" => "deepseek-chat" })
# Client sends model: "fast" → resolves to actual modelcurl http://localhost:3000/llm/v1/chat/completions \
-H "Authorization: Bearer sk-gw-your-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'curl http://localhost:3000/llm/v1/chat/completions \
-H "Authorization: Bearer sk-gw-your-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}],"stream":true}'curl http://localhost:3000/llm/v1/embeddings \
-H "Authorization: Bearer sk-gw-your-key" \
-H "Content-Type: application/json" \
-d '{"model":"text-embedding-3-small","input":"Hello world"}'from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/llm/v1",
api_key="sk-gw-your-key"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)require "openai"
client = OpenAI::Client.new(
uri_base: "http://localhost:3000/llm/v1",
access_token: "sk-gw-your-key"
)
response = client.chat(parameters: {
model: "claude-sonnet-4-20250514",
messages: [{ role: "user", content: "Hello from Ruby!" }]
})Settings > Models > OpenAI API Key:
- API Key:
sk-gw-your-key - Base URL:
http://your-server/llm/v1
models:
- title: Gateway
provider: openai
model: claude-sonnet-4-20250514
apiBase: http://your-server/llm/v1
apiKey: sk-gw-your-key{
"provider": {
"openai": {
"apiKey": "sk-gw-your-key",
"baseURL": "http://your-server/llm/v1"
}
}
}- Base URL:
http://your-server/llm/v1 - API Key: your gateway key
- Model: any model ID from
GET /llm/v1/models
Access at /llm/admin (requires config.admin_authenticator).
Provider Keys — CRUD, toggle enable/disable, test connection, fetch available models. Turbo Stream live updates.
Routing Strategies — Per-provider dropdown (round_robin, fill_first, least_used, cost_optimized, random). Auto-submit.
Fallback Chains — Add/remove/reorder models per chain type. ↑↓ buttons, toggle enable, Turbo Stream updates.
Gateway API Keys — Generate, view prefix, deactivate. Raw key shown once on creation.
Usage Stats — Requests today, tokens, cost, error rate. Top models and keys.
API Gateway
- OpenAI-compatible chat completions with full format compliance
- SSE streaming with proper
text/event-streamhandling - Tool/function calling passthrough for agentic workflows
- Embeddings endpoint
- Works with any RubyLLM provider (14+ supported)
Key Management
- Gateway API keys (SHA-256 hashed, per-user, wildcards, expiration)
- Provider keys (encrypted at rest, multi-key per provider)
- BYOK multi-tenant (per-user provider keys → global fallback)
- Model aliasing (3-tier: per-key → global → raw)
- Budget limits (monthly/total, auto-enforce)
Reliability
- Fallback chains with automatic failover across providers
- Circuit breaker per provider (closed → open → half-open)
- 5 load balancing strategies (round_robin, fill_first, least_used, cost_optimized, random)
- Rate limiting (RPM per key, 429 + Retry-After)
Observability
- Request logging (model, tokens, latency, cost, IP)
- Cost tracking per request and per key
- Audit logging (key CRUD, chain changes, routing updates)
- Admin dashboard with real-time stats
Security
- IP allowlist/blocklist per key (CIDR support)
- Request size limits
- API key rotation support
- HMAC-signed webhooks (budget_exceeded, provider_down, high_error_rate)
Caching
- Exact-match response cache (SHA256 of request)
- Rails.cache backend (Redis, Memcached, file, memory)
- X-Gateway-Cache headers (hit/miss/skip)
| Table | Purpose |
|---|---|
gateway_api_keys |
User-facing API keys (sk-gw-xxx) with permissions, budget, IP rules |
gateway_usage_logs |
Request log: model, tokens, cost, latency, status |
gateway_provider_keys |
Backend provider credentials (encrypted) |
gateway_provider_settings |
Per-provider routing strategy + state |
gateway_model_chains |
Fallback chain entries per chain type |
gateway_audit_logs |
Audit trail for admin actions |
Coming in future versions:
- Semantic caching (vector similarity)
- Image generation + audio transcription endpoints
- OpenTelemetry + Prometheus metrics
- System prompt injection (global + per-key)
- Guardrails (PII detection, prompt injection, content moderation)
- A/B testing (model variant traffic splitting)
- LLM evaluations (golden set testing)
See CONTRIBUTING.md in the main RubyLLM repository.
Released under the MIT License.