ruby_llm-gateway

A Rails engine that turns your app into an OpenAI-compatible API gateway. Any provider RubyLLM supports (OpenAI, Anthropic, Gemini, DeepSeek, Mistral, Ollama, ...) gets exposed through a single endpoint with automatic fallback, multi-key routing, rate limiting, caching, and a full admin dashboard.

Point Cursor, Windsurf, Continue.dev, OpenCode, or any OpenAI-compatible tool at your gateway and go.

Installation

gem 'ruby_llm-gateway'

bundle install
rails generate ruby_llm_gateway:install
rails db:migrate

The generator creates 6 migrations (gateway_api_keys, gateway_usage_logs, gateway_provider_keys, gateway_provider_settings, gateway_model_chains, gateway_audit_logs) and an initializer.

Mount the engine

# config/routes.rb
mount RubyLLM::Gateway::Engine, at: '/llm'

API Endpoints

Method	Path	Description
`POST`	`/llm/v1/chat/completions`	Chat completions (sync + streaming + tool calling)
`POST`	`/llm/v1/embeddings`	Text embeddings
`GET`	`/llm/v1/models`	List available models
`GET`	`/llm/v1/models/:id`	Single model details

Admin Dashboard

Path	Description
`/llm/admin`	Dashboard — provider keys, routing, fallback chains, API keys, usage stats

Configuration

# config/initializers/ruby_llm_gateway.rb
RubyLLM::Gateway.configure do |config|
  # Gateway API keys
  config.api_key_prefix = "sk-gw-"
  config.default_rate_limit_rpm = 60

  # Logging & tracking
  config.log_requests = true
  config.cost_tracking_enabled = true
  config.audit_logging_enabled = false

  # Rate limiting
  config.rate_limiting_enabled = true

  # Fallback chains
  config.fallback_enabled = true

  # Response caching
  config.cache_enabled = false
  config.cache_ttl = 3600                    # seconds

  # Circuit breaker
  config.circuit_breaker_threshold = 5       # failures before open
  config.circuit_breaker_cooldown = 60       # seconds before half-open

  # Model aliasing
  config.model_aliases = {
    "fast"  => "gpt-4o-mini",
    "smart" => "claude-sonnet-4-20250514",
    "code"  => "deepseek-coder"
  }

  # Security
  config.max_request_body_size = 10_485_760  # 10MB

  # Webhooks (HMAC-signed)
  config.webhooks = [
    { url: "https://your-app.com/hooks/llm", secret: "whsec_...", events: [:budget_exceeded, :provider_down] }
  ]

  # Admin authentication
  config.admin_authenticator = ->(request) {
    # Return true/false. Example with Devise:
    # request.env['warden'].user&.admin?
    false
  }
end

Provider Keys

Manage backend provider credentials through the admin dashboard or Rails console. Provider keys are encrypted at rest via Rails encrypts.

# Add an OpenAI key
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #1",
  api_key: "sk-proj-..."
)

# Add an Anthropic key with custom base URL
RubyLLM::Gateway::ProviderKey.create!(
  provider: "anthropic",
  name: "Anthropic Main",
  api_key: "sk-ant-...",
  api_base: "https://custom-proxy.example.com"
)

# Multiple keys per provider enable load balancing
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #2",
  api_key: "sk-proj-...",
  position: 1
)

Routing Strategies

When a provider has multiple keys, choose how to distribute requests:

Strategy	Description
`round_robin`	Rotate evenly across keys (default)
`fill_first`	Use first healthy key until it fails
`least_used`	Pick the key with fewest recent requests
`cost_optimized`	Pick the key with lowest recent cost
`random`	Random selection

RubyLLM::Gateway::ProviderSetting.for_provider("openai").update!(routing_strategy: :round_robin)

Connection Testing

key = RubyLLM::Gateway::ProviderKey.find(1)
result = RubyLLM::Gateway::ConnectionTester.new(key).test
result[:success]  # true/false
result[:models]   # ["gpt-4o", "gpt-4o-mini", ...]

Fallback Chains

Configure automatic failover across providers. When the primary model fails, the gateway tries the next in the chain.

chain = RubyLLM::Gateway::ModelChain.for("default_model")
chain.add_entry(provider: "anthropic", model: "claude-sonnet-4-20250514")
chain.add_entry(provider: "openai",    model: "gpt-4o")
chain.add_entry(provider: "gemini",    model: "gemini-2.0-flash")

# Reorder, toggle, remove
chain.reorder_entries([2, 0, 1])
chain.toggle_entry(index: 1)
chain.remove_entry(index: 2)

Chain types: default_model, embedding, image, moderation, transcription, audio.

Gateway API Keys

Issue keys to users/teams for accessing the gateway:

api_key, raw_key = RubyLLM::Gateway::ApiKey.generate(
  name: "Dev Team Key",
  user: current_user,                          # polymorphic (optional)
  allowed_models: ["gpt-4o", "claude-*"],      # wildcard patterns
  rate_limit_rpm: 120,                         # requests per minute
  model_aliases: { "default" => "gpt-4o-mini"} # per-key aliases
)
puts raw_key  # sk-gw-abc123... (save this! shown only once)

# Budget limits
api_key.update!(budget_limit_cents: 5000, budget_period: "monthly")

# IP restrictions
api_key.update!(allowed_ips: ["10.0.0.0/8"], blocked_ips: ["10.0.0.99"])

# Revoke
api_key.update!(active: false)

Model Aliasing

Three-tier resolution: per-key alias → global alias → raw model ID.

# Global aliases (config)
config.model_aliases = { "fast" => "gpt-4o-mini", "smart" => "claude-sonnet-4-20250514" }

# Per-key aliases (override global)
api_key.update!(model_aliases: { "fast" => "deepseek-chat" })

# Client sends model: "fast" → resolves to actual model

Usage

curl

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

Streaming

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}],"stream":true}'

Embeddings

curl http://localhost:3000/llm/v1/embeddings \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"text-embedding-3-small","input":"Hello world"}'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/llm/v1",
    api_key="sk-gw-your-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Ruby

require "openai"

client = OpenAI::Client.new(
  uri_base: "http://localhost:3000/llm/v1",
  access_token: "sk-gw-your-key"
)

response = client.chat(parameters: {
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello from Ruby!" }]
})

IDE Configuration

Cursor

Settings > Models > OpenAI API Key:

API Key: sk-gw-your-key
Base URL: http://your-server/llm/v1

Continue.dev

models:
  - title: Gateway
    provider: openai
    model: claude-sonnet-4-20250514
    apiBase: http://your-server/llm/v1
    apiKey: sk-gw-your-key

OpenCode

{
  "provider": {
    "openai": {
      "apiKey": "sk-gw-your-key",
      "baseURL": "http://your-server/llm/v1"
    }
  }
}

Windsurf / Any OpenAI-compatible client

Base URL: http://your-server/llm/v1
API Key: your gateway key
Model: any model ID from GET /llm/v1/models

Admin Dashboard

Access at /llm/admin (requires config.admin_authenticator).

Provider Keys — CRUD, toggle enable/disable, test connection, fetch available models. Turbo Stream live updates.

Routing Strategies — Per-provider dropdown (round_robin, fill_first, least_used, cost_optimized, random). Auto-submit.

Fallback Chains — Add/remove/reorder models per chain type. ↑↓ buttons, toggle enable, Turbo Stream updates.

Gateway API Keys — Generate, view prefix, deactivate. Raw key shown once on creation.

Usage Stats — Requests today, tokens, cost, error rate. Top models and keys.

Features

API Gateway

OpenAI-compatible chat completions with full format compliance
SSE streaming with proper text/event-stream handling
Tool/function calling passthrough for agentic workflows
Embeddings endpoint
Works with any RubyLLM provider (14+ supported)

Key Management

Gateway API keys (SHA-256 hashed, per-user, wildcards, expiration)
Provider keys (encrypted at rest, multi-key per provider)
BYOK multi-tenant (per-user provider keys → global fallback)
Model aliasing (3-tier: per-key → global → raw)
Budget limits (monthly/total, auto-enforce)

Reliability

Fallback chains with automatic failover across providers
Circuit breaker per provider (closed → open → half-open)
5 load balancing strategies (round_robin, fill_first, least_used, cost_optimized, random)
Rate limiting (RPM per key, 429 + Retry-After)

Observability

Request logging (model, tokens, latency, cost, IP)
Cost tracking per request and per key
Audit logging (key CRUD, chain changes, routing updates)
Admin dashboard with real-time stats

Security

IP allowlist/blocklist per key (CIDR support)
Request size limits
API key rotation support
HMAC-signed webhooks (budget_exceeded, provider_down, high_error_rate)

Caching

Exact-match response cache (SHA256 of request)
Rails.cache backend (Redis, Memcached, file, memory)
X-Gateway-Cache headers (hit/miss/skip)

Database Tables

Table	Purpose
`gateway_api_keys`	User-facing API keys (sk-gw-xxx) with permissions, budget, IP rules
`gateway_usage_logs`	Request log: model, tokens, cost, latency, status
`gateway_provider_keys`	Backend provider credentials (encrypted)
`gateway_provider_settings`	Per-provider routing strategy + state
`gateway_model_chains`	Fallback chain entries per chain type
`gateway_audit_logs`	Audit trail for admin actions

Roadmap

Coming in future versions:

Semantic caching (vector similarity)
Image generation + audio transcription endpoints
OpenTelemetry + Prometheus metrics
System prompt injection (global + per-key)
Guardrails (PII detection, prompt injection, content moderation)
A/B testing (model variant traffic splitting)
LLM evaluations (golden set testing)

Contributing

See CONTRIBUTING.md in the main RubyLLM repository.

License

Released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ruby_llm-gateway

Installation

Mount the engine

API Endpoints

Admin Dashboard

Configuration

Provider Keys

Routing Strategies

Connection Testing

Fallback Chains

Gateway API Keys

Model Aliasing

Usage

curl

Streaming

Embeddings

Python (openai SDK)

Ruby

IDE Configuration

Cursor

Continue.dev

OpenCode

Windsurf / Any OpenAI-compatible client

Admin Dashboard

Features

Database Tables

Roadmap

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ruby_llm-gateway

Installation

Mount the engine

API Endpoints

Admin Dashboard

Configuration

Provider Keys

Routing Strategies

Connection Testing

Fallback Chains

Gateway API Keys

Model Aliasing

Usage

curl

Streaming

Embeddings

Python (openai SDK)

Ruby

IDE Configuration

Cursor

Continue.dev

OpenCode

Windsurf / Any OpenAI-compatible client

Admin Dashboard

Features

Database Tables

Roadmap

Contributing

License