Releases · scientist-labs/ruby_llm-red_candle

The first RubyLLM provider that runs models locally.

What is it?

RubyLLM::RedCandle is a plugin for RubyLLM that enables local LLM execution using quantized GGUF models. While other RubyLLM providers connect to cloud APIs (OpenAI, Anthropic, etc.), Red Candle runs models directly in your Ruby process using the Candle ML framework via Rust bindings.

Why it matters

No API costs - Run inference for free once models are downloaded
No network latency - Models execute locally, no round-trips to external servers
Data privacy - Your prompts and responses never leave your machine
Offline capable - Works without internet after initial model download
Hardware acceleration - Metal (Apple Silicon), CUDA (NVIDIA), or CPU

Features

Full RubyLLM compatibility

Drop-in replacement for cloud providers - same RubyLLM.chat interface you already know:

chat = RubyLLM.chat(provider: :red_candle, model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
response = chat.ask("What is the capital of France?")

Streaming output

Token-by-token generation for responsive UIs:

chat.ask("Explain recursion") { |chunk| print chunk.content }

Structured output (JSON Schema)

Grammar-constrained generation ensures valid JSON every time:

schema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    age: { type: 'integer' }
  },
  required: ['name', 'age']
}

response = chat.with_schema(schema).ask("Generate a person profile")
response.content  # => {"name"=>"Alice", "age"=>30}

Schema validation

Invalid schemas fail fast with helpful error messages showing what's wrong and how to fix it.

Configurable JSON instruction templates

Tune the prompt template used for structured generation to optimize for different models.

Helpful error messages

Raw Rust/Candle errors are wrapped with context-specific guidance (OOM, context length, model loading issues).

Supported Models

Model	Context	Size	Use Case
TinyLlama-1.1B	2K	~600MB	Testing, prototypes
Qwen2.5-1.5B-Instruct	32K	~900MB	General chat, long context
Gemma-3-4B	8K	~2.5GB	Balanced quality/speed
Phi-3-mini-4k	4K	~2GB	Reasoning tasks
Mistral-7B-Instruct	32K	~4GB	High quality responses

Models are automatically downloaded from HuggingFace on first use.

Requirements

Ruby 3.1+
Rust toolchain (for native extension compilation)
RubyLLM >= 1.2

Installation

gem 'ruby_llm-red_candle'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What is it?

Why it matters

Features

Full RubyLLM compatibility

Streaming output

Structured output (JSON Schema)

Schema validation

Configurable JSON instruction templates

Helpful error messages

Supported Models

Requirements

Installation

Links

Uh oh!

Releases: scientist-labs/ruby_llm-red_candle

RubyLLM::RedCandle v0.1.0

What is it?

Why it matters

Features

Full RubyLLM compatibility

Streaming output

Structured output (JSON Schema)

Schema validation

Configurable JSON instruction templates

Helpful error messages

Supported Models

Requirements

Installation

Links

Uh oh!