Skip to content

Releases: scientist-labs/ruby_llm-red_candle

RubyLLM::RedCandle v0.1.0

11 Dec 18:48
f162ab6

Choose a tag to compare

The first RubyLLM provider that runs models locally.

What is it?

RubyLLM::RedCandle is a plugin for RubyLLM that enables local LLM execution using quantized GGUF models. While other RubyLLM providers connect to cloud APIs (OpenAI, Anthropic, etc.), Red Candle runs models directly in your Ruby process using the Candle ML framework via Rust bindings.

Why it matters

  • No API costs - Run inference for free once models are downloaded
  • No network latency - Models execute locally, no round-trips to external servers
  • Data privacy - Your prompts and responses never leave your machine
  • Offline capable - Works without internet after initial model download
  • Hardware acceleration - Metal (Apple Silicon), CUDA (NVIDIA), or CPU

Features

Full RubyLLM compatibility

Drop-in replacement for cloud providers - same RubyLLM.chat interface you already know:

chat = RubyLLM.chat(provider: :red_candle, model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
response = chat.ask("What is the capital of France?")

Streaming output

Token-by-token generation for responsive UIs:

chat.ask("Explain recursion") { |chunk| print chunk.content }

Structured output (JSON Schema)

Grammar-constrained generation ensures valid JSON every time:

schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    age: { type: 'integer' }
  },
  required: ['name', 'age']
}

response = chat.with_schema(schema).ask("Generate a person profile")
response.content  # => {"name"=>"Alice", "age"=>30}

Schema validation

Invalid schemas fail fast with helpful error messages showing what's wrong and how to fix it.

Configurable JSON instruction templates

Tune the prompt template used for structured generation to optimize for different models.

Helpful error messages

Raw Rust/Candle errors are wrapped with context-specific guidance (OOM, context length, model loading issues).

Supported Models

Model Context Size Use Case
TinyLlama-1.1B 2K ~600MB Testing, prototypes
Qwen2.5-1.5B-Instruct 32K ~900MB General chat, long context
Gemma-3-4B 8K ~2.5GB Balanced quality/speed
Phi-3-mini-4k 4K ~2GB Reasoning tasks
Mistral-7B-Instruct 32K ~4GB High quality responses

Models are automatically downloaded from HuggingFace on first use.

Requirements

  • Ruby 3.1+
  • Rust toolchain (for native extension compilation)
  • RubyLLM >= 1.2

Installation

gem 'ruby_llm-red_candle'

Links