Skip to content

compasify/ruby_llm-gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ruby_llm-gateway

Gem Version License: MIT

A Rails engine that turns your app into an OpenAI-compatible API gateway. Any provider RubyLLM supports (OpenAI, Anthropic, Gemini, DeepSeek, Mistral, Ollama, ...) gets exposed through a single endpoint with automatic fallback, multi-key routing, rate limiting, caching, and a full admin dashboard.

Point Cursor, Windsurf, Continue.dev, OpenCode, or any OpenAI-compatible tool at your gateway and go.

Installation

gem 'ruby_llm-gateway'
bundle install
rails generate ruby_llm_gateway:install
rails db:migrate

The generator creates 6 migrations (gateway_api_keys, gateway_usage_logs, gateway_provider_keys, gateway_provider_settings, gateway_model_chains, gateway_audit_logs) and an initializer.

Mount the engine

# config/routes.rb
mount RubyLLM::Gateway::Engine, at: '/llm'

API Endpoints

Method Path Description
POST /llm/v1/chat/completions Chat completions (sync + streaming + tool calling)
POST /llm/v1/embeddings Text embeddings
GET /llm/v1/models List available models
GET /llm/v1/models/:id Single model details

Admin Dashboard

Path Description
/llm/admin Dashboard — provider keys, routing, fallback chains, API keys, usage stats

Configuration

# config/initializers/ruby_llm_gateway.rb
RubyLLM::Gateway.configure do |config|
  # Gateway API keys
  config.api_key_prefix = "sk-gw-"
  config.default_rate_limit_rpm = 60

  # Logging & tracking
  config.log_requests = true
  config.cost_tracking_enabled = true
  config.audit_logging_enabled = false

  # Rate limiting
  config.rate_limiting_enabled = true

  # Fallback chains
  config.fallback_enabled = true

  # Response caching
  config.cache_enabled = false
  config.cache_ttl = 3600                    # seconds

  # Circuit breaker
  config.circuit_breaker_threshold = 5       # failures before open
  config.circuit_breaker_cooldown = 60       # seconds before half-open

  # Model aliasing
  config.model_aliases = {
    "fast"  => "gpt-4o-mini",
    "smart" => "claude-sonnet-4-20250514",
    "code"  => "deepseek-coder"
  }

  # Security
  config.max_request_body_size = 10_485_760  # 10MB

  # Webhooks (HMAC-signed)
  config.webhooks = [
    { url: "https://your-app.com/hooks/llm", secret: "whsec_...", events: [:budget_exceeded, :provider_down] }
  ]

  # Admin authentication
  config.admin_authenticator = ->(request) {
    # Return true/false. Example with Devise:
    # request.env['warden'].user&.admin?
    false
  }
end

Provider Keys

Manage backend provider credentials through the admin dashboard or Rails console. Provider keys are encrypted at rest via Rails encrypts.

# Add an OpenAI key
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #1",
  api_key: "sk-proj-..."
)

# Add an Anthropic key with custom base URL
RubyLLM::Gateway::ProviderKey.create!(
  provider: "anthropic",
  name: "Anthropic Main",
  api_key: "sk-ant-...",
  api_base: "https://custom-proxy.example.com"
)

# Multiple keys per provider enable load balancing
RubyLLM::Gateway::ProviderKey.create!(
  provider: "openai",
  name: "Production Key #2",
  api_key: "sk-proj-...",
  position: 1
)

Routing Strategies

When a provider has multiple keys, choose how to distribute requests:

Strategy Description
round_robin Rotate evenly across keys (default)
fill_first Use first healthy key until it fails
least_used Pick the key with fewest recent requests
cost_optimized Pick the key with lowest recent cost
random Random selection
RubyLLM::Gateway::ProviderSetting.for_provider("openai").update!(routing_strategy: :round_robin)

Connection Testing

key = RubyLLM::Gateway::ProviderKey.find(1)
result = RubyLLM::Gateway::ConnectionTester.new(key).test
result[:success]  # true/false
result[:models]   # ["gpt-4o", "gpt-4o-mini", ...]

Fallback Chains

Configure automatic failover across providers. When the primary model fails, the gateway tries the next in the chain.

chain = RubyLLM::Gateway::ModelChain.for("default_model")
chain.add_entry(provider: "anthropic", model: "claude-sonnet-4-20250514")
chain.add_entry(provider: "openai",    model: "gpt-4o")
chain.add_entry(provider: "gemini",    model: "gemini-2.0-flash")

# Reorder, toggle, remove
chain.reorder_entries([2, 0, 1])
chain.toggle_entry(index: 1)
chain.remove_entry(index: 2)

Chain types: default_model, embedding, image, moderation, transcription, audio.

Gateway API Keys

Issue keys to users/teams for accessing the gateway:

api_key, raw_key = RubyLLM::Gateway::ApiKey.generate(
  name: "Dev Team Key",
  user: current_user,                          # polymorphic (optional)
  allowed_models: ["gpt-4o", "claude-*"],      # wildcard patterns
  rate_limit_rpm: 120,                         # requests per minute
  model_aliases: { "default" => "gpt-4o-mini"} # per-key aliases
)
puts raw_key  # sk-gw-abc123... (save this! shown only once)

# Budget limits
api_key.update!(budget_limit_cents: 5000, budget_period: "monthly")

# IP restrictions
api_key.update!(allowed_ips: ["10.0.0.0/8"], blocked_ips: ["10.0.0.99"])

# Revoke
api_key.update!(active: false)

Model Aliasing

Three-tier resolution: per-key alias → global alias → raw model ID.

# Global aliases (config)
config.model_aliases = { "fast" => "gpt-4o-mini", "smart" => "claude-sonnet-4-20250514" }

# Per-key aliases (override global)
api_key.update!(model_aliases: { "fast" => "deepseek-chat" })

# Client sends model: "fast" → resolves to actual model

Usage

curl

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

Streaming

curl http://localhost:3000/llm/v1/chat/completions \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}],"stream":true}'

Embeddings

curl http://localhost:3000/llm/v1/embeddings \
  -H "Authorization: Bearer sk-gw-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"text-embedding-3-small","input":"Hello world"}'

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/llm/v1",
    api_key="sk-gw-your-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Ruby

require "openai"

client = OpenAI::Client.new(
  uri_base: "http://localhost:3000/llm/v1",
  access_token: "sk-gw-your-key"
)

response = client.chat(parameters: {
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello from Ruby!" }]
})

IDE Configuration

Cursor

Settings > Models > OpenAI API Key:

  • API Key: sk-gw-your-key
  • Base URL: http://your-server/llm/v1

Continue.dev

models:
  - title: Gateway
    provider: openai
    model: claude-sonnet-4-20250514
    apiBase: http://your-server/llm/v1
    apiKey: sk-gw-your-key

OpenCode

{
  "provider": {
    "openai": {
      "apiKey": "sk-gw-your-key",
      "baseURL": "http://your-server/llm/v1"
    }
  }
}

Windsurf / Any OpenAI-compatible client

  • Base URL: http://your-server/llm/v1
  • API Key: your gateway key
  • Model: any model ID from GET /llm/v1/models

Admin Dashboard

Access at /llm/admin (requires config.admin_authenticator).

Provider Keys — CRUD, toggle enable/disable, test connection, fetch available models. Turbo Stream live updates.

Routing Strategies — Per-provider dropdown (round_robin, fill_first, least_used, cost_optimized, random). Auto-submit.

Fallback Chains — Add/remove/reorder models per chain type. ↑↓ buttons, toggle enable, Turbo Stream updates.

Gateway API Keys — Generate, view prefix, deactivate. Raw key shown once on creation.

Usage Stats — Requests today, tokens, cost, error rate. Top models and keys.

Features

API Gateway

  • OpenAI-compatible chat completions with full format compliance
  • SSE streaming with proper text/event-stream handling
  • Tool/function calling passthrough for agentic workflows
  • Embeddings endpoint
  • Works with any RubyLLM provider (14+ supported)

Key Management

  • Gateway API keys (SHA-256 hashed, per-user, wildcards, expiration)
  • Provider keys (encrypted at rest, multi-key per provider)
  • BYOK multi-tenant (per-user provider keys → global fallback)
  • Model aliasing (3-tier: per-key → global → raw)
  • Budget limits (monthly/total, auto-enforce)

Reliability

  • Fallback chains with automatic failover across providers
  • Circuit breaker per provider (closed → open → half-open)
  • 5 load balancing strategies (round_robin, fill_first, least_used, cost_optimized, random)
  • Rate limiting (RPM per key, 429 + Retry-After)

Observability

  • Request logging (model, tokens, latency, cost, IP)
  • Cost tracking per request and per key
  • Audit logging (key CRUD, chain changes, routing updates)
  • Admin dashboard with real-time stats

Security

  • IP allowlist/blocklist per key (CIDR support)
  • Request size limits
  • API key rotation support
  • HMAC-signed webhooks (budget_exceeded, provider_down, high_error_rate)

Caching

  • Exact-match response cache (SHA256 of request)
  • Rails.cache backend (Redis, Memcached, file, memory)
  • X-Gateway-Cache headers (hit/miss/skip)

Database Tables

Table Purpose
gateway_api_keys User-facing API keys (sk-gw-xxx) with permissions, budget, IP rules
gateway_usage_logs Request log: model, tokens, cost, latency, status
gateway_provider_keys Backend provider credentials (encrypted)
gateway_provider_settings Per-provider routing strategy + state
gateway_model_chains Fallback chain entries per chain type
gateway_audit_logs Audit trail for admin actions

Roadmap

Coming in future versions:

  • Semantic caching (vector similarity)
  • Image generation + audio transcription endpoints
  • OpenTelemetry + Prometheus metrics
  • System prompt injection (global + per-key)
  • Guardrails (PII detection, prompt injection, content moderation)
  • A/B testing (model variant traffic splitting)
  • LLM evaluations (golden set testing)

Contributing

See CONTRIBUTING.md in the main RubyLLM repository.

License

Released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors