Skip to content

Latest commit

 

History

History
390 lines (292 loc) · 10.8 KB

File metadata and controls

390 lines (292 loc) · 10.8 KB

ruby_llm-gateway

Gem Version License: MIT

A Rails engine that turns your app into an OpenAI-compatible API gateway. Any provider RubyLLM supports (OpenAI, Anthropic, Gemini, DeepSeek, Mistral, Ollama, ...) gets exposed through a single endpoint with automatic fallback, multi-key routing, rate limiting, caching, and a full admin dashboard.

Point Cursor, Windsurf, Continue.dev, OpenCode, or any OpenAI-compatible tool at your gateway and go.

Installation

gem 'ruby_llm-gateway'
bundle install
rails generate ruby_llm_gateway:install
rails db:migrate

The generator creates 6 migrations (gateway_api_keys, gateway_usage_logs, gateway_provider_keys, gateway_provider_settings, gateway_model_chains, gateway_audit_logs) and an initializer.

Mount the engine

# config/routes.rb
mount RubyLLM::Gateway::Engine, at: '/llm'

API Endpoints

Method Path Description
POST /llm/v1/chat/completions Chat completions (sync + streaming + tool calling)
POST /llm/v1/embeddings Text embeddings
GET /llm/v1/models List available models
GET /llm/v1/models/:id Single model details

Admin Dashboard

Path Description
/llm/admin Dashboard — provider keys, routing, fallback chains, API keys, usage stats

Configuration

# config/initializers/ruby_llm_gateway.rb
RubyLLM::Gateway.configure do |config|
  # Gateway API keys
  config.api_key_prefix = "sk-gw-"
  config.default_rate_limit_rpm = 60

  # Logging & tracking
  config.log_requests = true
  config.cost_tracking_enabled = true
  config.audit_logging_enabled = false

  # Rate limiting
  config.rate_limiting_enabled = true

  # Fallback chains
  config.fallback_enabled = true

  # Response caching
  config.cache_enabled = false
  config.cache_ttl = 3600                    # seconds

  # Circuit breaker
  config.circuit_breaker_threshold = 5       # failures before open
  config.circuit_breaker_cooldown = 60       # seconds before half-open

  # Model aliasing
  config.model_aliases = {
    "fast"  => "gpt-4o-mini",
    "smart" => "claude-sonnet-4-20250514",
    "code"  => "deepseek-coder"
  }

  # Security
  config.max_request_body_size = 10_485_760  # 10MB

  # Webhooks (HMAC-signed)
  config.webhooks = [
    { url: "https://your-app.com/hooks/llm", secret: "whsec_...", events: [:budget_exceeded, :provider_down] }
  ]

  # Admin authentication
  config.admin_authenticator = ->(request) {
    # Return true/false. Example with Devise:
    # request.env['warden'].user&.admin?
    false
  }
end

Provider Keys

Manage backend provider credentials through the admin dashboard or Rails console. Provider keys are encrypted at rest via Rails encrypts.

# Add an OpenAI key
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #1",
  api_key: "sk-proj-..."
)

# Add an Anthropic key with custom base URL
RubyLLM::Gateway::ProviderKey.create!(
  provider: "anthropic",
  name: "Anthropic Main",
  api_key: "sk-ant-...",
  api_base: "https://custom-proxy.example.com"
)

# Multiple keys per provider enable load balancing
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #2",
  api_key: "sk-proj-...",
  position: 1
)

Routing Strategies

When a provider has multiple keys, choose how to distribute requests:

Strategy Description
round_robin Rotate evenly across keys (default)
fill_first Use first healthy key until it fails
least_used Pick the key with fewest recent requests
cost_optimized Pick the key with lowest recent cost
random Random selection
RubyLLM::Gateway::ProviderSetting.for_provider("openai").update!(routing_strategy: :round_robin)

Connection Testing

key = RubyLLM::Gateway::ProviderKey.find(1)
result = RubyLLM::Gateway::ConnectionTester.new(key).test
result[:success]  # true/false
result[:models]   # ["gpt-4o", "gpt-4o-mini", ...]

Fallback Chains

Configure automatic failover across providers. When the primary model fails, the gateway tries the next in the chain.

chain = RubyLLM::Gateway::ModelChain.for("default_model")
chain.add_entry(provider: "anthropic", model: "claude-sonnet-4-20250514")
chain.add_entry(provider: "openai",    model: "gpt-4o")
chain.add_entry(provider: "gemini",    model: "gemini-2.0-flash")

# Reorder, toggle, remove
chain.reorder_entries([2, 0, 1])
chain.toggle_entry(index: 1)
chain.remove_entry(index: 2)

Chain types: default_model, embedding, image, moderation, transcription, audio.

Gateway API Keys

Issue keys to users/teams for accessing the gateway:

api_key, raw_key = RubyLLM::Gateway::ApiKey.generate(
  name: "Dev Team Key",
  user: current_user,                          # polymorphic (optional)
  allowed_models: ["gpt-4o", "claude-*"],      # wildcard patterns
  rate_limit_rpm: 120,                         # requests per minute
  model_aliases: { "default" => "gpt-4o-mini"} # per-key aliases
)
puts raw_key  # sk-gw-abc123... (save this! shown only once)

# Budget limits
api_key.update!(budget_limit_cents: 5000, budget_period: "monthly")

# IP restrictions
api_key.update!(allowed_ips: ["10.0.0.0/8"], blocked_ips: ["10.0.0.99"])

# Revoke
api_key.update!(active: false)

Model Aliasing

Three-tier resolution: per-key alias → global alias → raw model ID.

# Global aliases (config)
config.model_aliases = { "fast" => "gpt-4o-mini", "smart" => "claude-sonnet-4-20250514" }

# Per-key aliases (override global)
api_key.update!(model_aliases: { "fast" => "deepseek-chat" })

# Client sends model: "fast" → resolves to actual model

Usage

curl

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

Streaming

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}],"stream":true}'

Embeddings

curl http://localhost:3000/llm/v1/embeddings \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"text-embedding-3-small","input":"Hello world"}'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/llm/v1",
    api_key="sk-gw-your-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Ruby

require "openai"

client = OpenAI::Client.new(
  uri_base: "http://localhost:3000/llm/v1",
  access_token: "sk-gw-your-key"
)

response = client.chat(parameters: {
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello from Ruby!" }]
})

IDE Configuration

Cursor

Settings > Models > OpenAI API Key:

  • API Key: sk-gw-your-key
  • Base URL: http://your-server/llm/v1

Continue.dev

models:
  - title: Gateway
    provider: openai
    model: claude-sonnet-4-20250514
    apiBase: http://your-server/llm/v1
    apiKey: sk-gw-your-key

OpenCode

{
  "provider": {
    "openai": {
      "apiKey": "sk-gw-your-key",
      "baseURL": "http://your-server/llm/v1"
    }
  }
}

Windsurf / Any OpenAI-compatible client

  • Base URL: http://your-server/llm/v1
  • API Key: your gateway key
  • Model: any model ID from GET /llm/v1/models

Admin Dashboard

Access at /llm/admin (requires config.admin_authenticator).

Provider Keys — CRUD, toggle enable/disable, test connection, fetch available models. Turbo Stream live updates.

Routing Strategies — Per-provider dropdown (round_robin, fill_first, least_used, cost_optimized, random). Auto-submit.

Fallback Chains — Add/remove/reorder models per chain type. ↑↓ buttons, toggle enable, Turbo Stream updates.

Gateway API Keys — Generate, view prefix, deactivate. Raw key shown once on creation.

Usage Stats — Requests today, tokens, cost, error rate. Top models and keys.

Features

API Gateway

  • OpenAI-compatible chat completions with full format compliance
  • SSE streaming with proper text/event-stream handling
  • Tool/function calling passthrough for agentic workflows
  • Embeddings endpoint
  • Works with any RubyLLM provider (14+ supported)

Key Management

  • Gateway API keys (SHA-256 hashed, per-user, wildcards, expiration)
  • Provider keys (encrypted at rest, multi-key per provider)
  • BYOK multi-tenant (per-user provider keys → global fallback)
  • Model aliasing (3-tier: per-key → global → raw)
  • Budget limits (monthly/total, auto-enforce)

Reliability

  • Fallback chains with automatic failover across providers
  • Circuit breaker per provider (closed → open → half-open)
  • 5 load balancing strategies (round_robin, fill_first, least_used, cost_optimized, random)
  • Rate limiting (RPM per key, 429 + Retry-After)

Observability

  • Request logging (model, tokens, latency, cost, IP)
  • Cost tracking per request and per key
  • Audit logging (key CRUD, chain changes, routing updates)
  • Admin dashboard with real-time stats

Security

  • IP allowlist/blocklist per key (CIDR support)
  • Request size limits
  • API key rotation support
  • HMAC-signed webhooks (budget_exceeded, provider_down, high_error_rate)

Caching

  • Exact-match response cache (SHA256 of request)
  • Rails.cache backend (Redis, Memcached, file, memory)
  • X-Gateway-Cache headers (hit/miss/skip)

Database Tables

Table Purpose
gateway_api_keys User-facing API keys (sk-gw-xxx) with permissions, budget, IP rules
gateway_usage_logs Request log: model, tokens, cost, latency, status
gateway_provider_keys Backend provider credentials (encrypted)
gateway_provider_settings Per-provider routing strategy + state
gateway_model_chains Fallback chain entries per chain type
gateway_audit_logs Audit trail for admin actions

Roadmap

Coming in future versions:

  • Semantic caching (vector similarity)
  • Image generation + audio transcription endpoints
  • OpenTelemetry + Prometheus metrics
  • System prompt injection (global + per-key)
  • Guardrails (PII detection, prompt injection, content moderation)
  • A/B testing (model variant traffic splitting)
  • LLM evaluations (golden set testing)

Contributing

See CONTRIBUTING.md in the main RubyLLM repository.

License

Released under the MIT License.