Common questions about rlm-rs and troubleshooting tips.
RLM-RS is a CLI tool implementing the Recursive Language Model (RLM) pattern from arXiv:2512.24601. It enables AI assistants to process documents up to 100x larger than their context window by chunking content and using pass-by-reference architecture.
RLM-RS is specifically designed for AI assistant workflows with:
- Multiple chunking strategies (semantic, code-aware, fixed, parallel)
- Automatic embeddings for semantic search
- Hybrid search combining semantic + BM25
- Pass-by-reference architecture to reduce token usage
- SQLite persistence for reliable state management
- Claude Code integration via MCP plugin
No, rlm-rs is a standalone CLI tool. However, it's optimized for use with Claude Code via the rlm-rs plugin.
RLM-RS works with any text-based file format:
- Markdown (.md) - Use semantic chunker
- Source code (.rs, .py, .js, .ts, .go, .java, etc.) - Use code chunker
- Plain text (.txt) - Use fixed or semantic chunker
- Logs - Use fixed chunker with overlap
- JSON/YAML - Use fixed or semantic chunker
Binary files are not supported.
Three options:
- Cargo (recommended):
cargo install rlm-cli - Homebrew:
brew install zircote/tap/rlm-rs - From source: See Getting Started
- OS: macOS, Linux, or Windows
- Rust: 1.88+ (if building from source)
- Disk: ~50MB for binaries + embeddings model
- Memory: 512MB minimum, 2GB+ recommended for large files
By default, RLM-RS creates .rlm/rlm-state.db in your current directory.
You can override this with the RLM_DB_PATH environment variable:
export RLM_DB_PATH=/path/to/custom/rlm-state.db
rlm-cli initrlm-cli resetWarning: This deletes all buffers, chunks, and state. Cannot be undone.
Alternatively, manually delete the database:
rm -rf .rlm/| Content Type | Recommended Strategy | Why |
|---|---|---|
| Markdown, documentation | semantic |
Preserves logical structure (headings, paragraphs) |
| Source code | code |
Respects function/class boundaries |
| Logs, plain text | fixed |
Predictable chunk sizes |
| Large files (>10MB) | parallel |
Faster processing via multi-threading |
Example:
rlm-cli load docs.md --chunker semantic
rlm-cli load src/main.rs --chunker code
rlm-cli load app.log --chunker fixed --chunk-size 150000Default: 50,000 bytes (50KB)
Guidelines:
- Smaller chunks (10-30KB): Better precision, more chunks to search
- Larger chunks (50-100KB): Better context, fewer chunks
- Very large chunks (100KB+): Risk losing granularity
Recommendation: Start with defaults, adjust based on your content.
rlm-cli load file.txt --chunk-size 30000 --overlap 1000| Search Type | How It Works | Best For |
|---|---|---|
| Semantic | Finds similar meaning using embeddings | Conceptual queries ("how to install") |
| BM25 | Finds keyword matches with ranking | Exact terms ("error code 404") |
| Hybrid | Combines both via RRF | Most use cases (default) |
Example:
# Semantic search
rlm-cli search "installation process" --mode semantic
# Keyword search
rlm-cli search "error" --mode bm25
# Hybrid (default)
rlm-cli search "database connection error" --mode hybridOmit the --buffer flag to search all buffers:
rlm-cli search "error handling"Or specify multiple buffers:
# Load files
rlm-cli load src/lib.rs --name lib
rlm-cli load src/main.rs --name main
# Search both
rlm-cli search "parse" --buffer lib --buffer mainNote: Current implementation searches one buffer at a time when --buffer is specified.
Use update-buffer:
rlm-cli update-buffer readme --file README-updated.mdThis re-chunks and re-embeds the content while preserving the buffer ID.
Yes:
# Export all buffers to JSON
rlm-cli export-buffers > buffers.json
# Export individual buffer content
rlm-cli show readme > readme-content.txt
# Export chunks to individual files
rlm-cli write-chunks readme --output-dir ./chunks/Embedding time depends on:
- Chunk count: ~10-50ms per chunk on CPU
- Model: BGE-M3 (default) is optimized for CPU
- Hardware: Faster on GPU (not yet supported)
Example: 100 chunks ≈ 1-5 seconds on modern CPU.
Tip: Embeddings are generated automatically during load and cached. Re-loading the same content reuses existing embeddings.
Not directly, but you can:
- Use BM25-only search:
--mode bm25 - Build without fastembed:
cargo build --no-default-features
- Database: Proportional to content size (roughly 1.5-2x source file size)
- Embeddings: ~4KB per chunk (1024 dimensions × 4 bytes)
- Models: ~150MB for BGE-M3 (downloaded once)
Example: 10MB document with 200 chunks ≈ 20MB database + 800KB embeddings.
Use parallel chunking:
rlm-cli load huge-log.txt --chunker parallel --chunk-size 100000Tips:
- Use
--chunk-sizeto control memory usage - Consider splitting files externally if >1GB
- Use
grepfor keyword search instead of loading entire file
You need to initialize the database first:
rlm-cli initThis creates .rlm/rlm-state.db.
Cause: Network issue or insufficient disk space.
Solutions:
- Check network connection
- Verify disk space (~150MB needed)
- Try manual download: models are cached in
~/.cache/rlm-rs/ - Build without embeddings:
cargo build --no-default-features
Cause: Buffer name or ID doesn't exist.
Solution: List buffers to verify:
rlm-cli listPossible causes:
- No embeddings: Check
rlm-cli chunk status - Wrong buffer: Verify with
rlm-cli list - Query mismatch: Try different search mode or keywords
Debug:
# Check embedding status
rlm-cli chunk status
# Try keyword search
rlm-cli search "your query" --mode bm25
# Try broader query
rlm-cli search "error" --top-k 20Causes:
- Large chunk sizes
- Many buffers loaded
- Memory-mapped files not released
Solutions:
- Use smaller
--chunk-size - Delete unused buffers:
rlm-cli delete <buffer> - Restart CLI to release memory maps
Solutions:
- Enable HNSW: Build with
--features usearch-hnsw - Reduce search space: Use
--bufferto limit scope - Use BM25-only:
--mode bm25(faster than semantic) - Reduce
--top-k: Fewer results = faster search
Install the rlm-rs MCP plugin:
- Add to
.vscode/mcp.jsonor~/.config/Claude/claude_desktop_config.json - Restart Claude Code
- Use RLM commands in conversations
See Plugin Integration for details.
Yes! RLM-RS is a standard CLI tool. Integration requires:
- Ability to execute shell commands
- Parsing JSON output (use
--format json) - Managing state (buffer IDs, chunk IDs)
Absolutely:
#!/bin/bash
rlm-cli init
rlm-cli load document.md --name doc --format json > load-result.json
chunk_id=$(jq '.chunks[0].id' load-result.json)
rlm-cli chunk get "$chunk_id"Use --format json for machine-readable output.
git clone https://github.com/zircote/rlm-rs.git
cd rlm-rs
cargo build --release
# Binary at: target/release/rlm-cli| Feature | Description | Default |
|---|---|---|
fastembed-embeddings |
BGE-M3 semantic embeddings | ✅ Enabled |
usearch-hnsw |
HNSW approximate search | ❌ Disabled |
full-search |
Both embeddings + HNSW | ❌ Disabled |
See Features Guide for details.
See CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Testing requirements
- PR process
- Unit tests:
src/**/*.rs(#[cfg(test)]modules) - Integration tests:
tests/integration_test.rs - Property tests: Using
proptestfor property-based testing
Run tests:
cargo test- Getting Started - Quick start tutorial
- Troubleshooting - Detailed troubleshooting guide
- CLI Reference - Complete command reference
- Examples - Practical examples and workflows
Last Updated: 2026-02-18
Version: 1.2.4
Still have questions?
- Open an issue: GitHub Issues
- Start a discussion: GitHub Discussions