Cerebrum™ - Distributed Interactive AI Code Assistant

Token-streaming code generation optimized for Raspberry Pi CM4 + cloud architecture

Real-time AI code completion and refactoring running on a Clockwork Pi uConsole Raspberry Pi CM4, powered by VPS inference and intelligent context management.

What Makes Cerebrum Different

Streaming-First Design

Token-by-token Server-Sent Events (SSE) for real-time code generation feedback

Intelligent Context Management

Smart chunking reduces 8KB+ prompts by up to 62% while preserving refactoring instructions

Edge Orchestration

Raspberry Pi CM4 handles routing, chunking, and request coordination with <100ms overhead

Production-Grade Resilience

Circuit breakers, connection pooling, load shedding, and request correlation IDs throughout

Language-Aware Model Routing

Automatic selection between Qwen-7B (Python/JS) and CodeLLaMA-7B (Rust/C/C++) per request

Native Qt/QML Interface (Debian)

uConsole GUI with real-time streaming chat, live CPU monitoring, health indicators, and system metrics

Includes Optional Lightweight REPL CLI

Streaming REPL with multiline support, command history, and live token display for headless environments

Architecture

Cerebrum was designed to run alongside uConsole cyberdeck router, running on a single Raspberry Pi CM4 handling both VPN routing and AI orchestration simultaneously.

Note: Cerebrum does not require Cyberdeck Router for the uConsole and can be run standalone on any compatible edge device.

┌─────────────────────────────────────────────────────────┐
│  Raspberry Pi CM4 (Orchestrator + VPN Router)           │
│  ┌───────────────────────────────────────────────────┐  │
│  │  Cyberdeck Router (isolated)                      │  │
│  └───────────────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────────────┐  │
│  │  FastAPI Server (Port 7000)                       │  │
│  │  • Instruction extraction & prompt assembly       │  │
│  │  • Smart chunking (1000 char blocks, 150 overlap) │  │
│  │  • Deduplication (hash-based fingerprinting)      │  │
│  │  • Load shedding (max 2 concurrent requests)      │  │
│  │  • Request tracking (UUID correlation)            │  │
│  │  • Zero impact on VPN throughput                  │  │
│  └───────────────────────────────────────────────────┘  │
└───────────────────┬─────────────────────────────────────┘
                    │
                    │ HTTP/Tailscale (Streaming SSE)
                    │ Chunked prompts → Token stream
                    │
┌───────────────────▼─────────────────────────────────────┐
│  VPS (Inference) Backend                                │
│  ┌───────────────────────────────────────────────────┐  │
│  │  llama.cpp Runtime (Port 9000)                    │  │ 
│  │  • Model: qwen-7b-q4.gguf / codellama-7b-q4.gguf  │  │
│  │  • Inference: ~1.6 tok/s (CPU, single-threaded)   │  │
│  │  • Connection pool: Persistent httpx client       │  │
│  │  • Circuit breaker: 10s cooldown on failures      │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Data Flow:

User prompt → CM4 extracts instructions
CM4 chunks large code (if >1500 chars)
CM4 deduplicates repeated patterns
CM4 selects top 3 relevant chunks
CM4 assembles instruction-first prompt
VPS streams tokens back via SSE
CM4 proxies stream to client in real-time

Real-World Performance and Capabilities

Intelligent Prompt Handling

Instruction extraction (e.g. refactor / rewrite / TODO directives)
Instruction-first prompt assembly for base code models
Automatic fallback to raw prompts when transformation is not beneficial

Smart Chunking & Deduplication

Chunks only when prompts exceed safe thresholds
Deduplicates overlapping code blocks
Uses task-aware ranking (instruction-driven, not naive similarity)
Skips chunking entirely when reduction is insignificant

Streaming Inference:

Small prompts (<100 chars): ~17s for 33 tokens (1.9 tok/s)
Large prompts (8KB): ~182s for 129 tokens (0.7 tok/s) after 62% chunking reduction
CM4 overhead: <100ms for chunking + routing

Context Management:

Input: 8,344 chars (repeated synchronous code)
After chunking: 3,167 chars (62% reduction)
Result: Actual async/await refactored code (not TODO lists!)

Resource-Aware Design:

Max concurrent: 2 requests (load shedding)
Circuit breaker: 10s cooldown after VPS failures
Request timeout: Configurable per endpoint
Connection pooling: Persistent HTTP client (no repeated initialization)
Zero degradation in VPN connection quality or throughput (cyberdeck router/WireGuard)

Interactive REPL + API

Bash-based interactive shell for fast iteration
Full FastAPI surface for automation and tooling

🚀 Quick Start

Prerequisites

Raspberry Pi CM4 (4GB RAM 0GB eMMC Lite), or other compatible Edge device
VPS with 4GB+ RAM (8GB+ for multiple large models running simultaneously)
Base OS: Debian 12 (Bookworm) installed on both CM4 and VPS
Python 3.11+
Deployment Models Pre-installed (See below)

1. Start VPS Backend

# On VPS
cd ~/cerebrum-backend
./start.sh

# Verify health
curl http://localhost:9000/health

2. Start CM4 Orchestrator

# On Raspberry Pi
cd /opt/cerebrum-pi
./start.sh

# Verify health
curl http://localhost:7000/health

3. Launch Streaming REPL

cd /opt/cerebrum-pi/scripts
./cerebrum_repl.sh

REPL Commands:

>>> :help              Show commands
>>> :model qwen_7b     Switch model
>>> :lang python       Set language
>>> :multi             Toggle multiline mode
>>> def fibonacci(n):  Generate code!

Deployment Model

Cerebrum is composed of two independently deployed systems

CM4 Orchestrator (Raspberry Pi)

The CM4 never runs large models. It decides what to send, how much to send, and how to stream results back efficiently.

Runs continuously on the Pi
Handles all user interaction
Enforces safety and performance constraints

📘 Deployment Guide:

cerebrum-pi/README.md

VPS Inference Backend

Runs heavy LLM inference using llama.cpp with strict resource controls

The backend supports multiple GGUF models via llama.cpp-compatible runtimes
Models are selected dynamically at request time
Exposes inference and streaming endpoints
Tuned for CPU/GPU efficiency

📙 Deployment Guide:

cerebrum-backend/README.md

Note:
The root of this repository is not directly executable.
All runtime instructions live in the component-specific READMEs above.

📂 Project Structure

Cerebrum/                        # 🎩 Root
│
├── cerebrum-pi/                   # 🔹 CM4 Orchestrator (Raspberry Pi) - Debian 12
│   ├── cerebrum/
│   │   ├── api/                     # 💫 FastAPI Application (Active)
│   │   │   ├── main.py                  # Application entry point
│   │   │   ├── middleware/              # Request processing
│   │   │   │   ├── request_id.py        # UUID correlation
│   │   │   │   ├── log_context.py       # Request logging
│   │   │   │   └── load_shed.py         # Concurrency limiting
│   │   │   │
│   │   │   ├── routes/              # ✨ API endpoints
│   │   │   │   ├── inference.py         # Streaming code completion
│   │   │   │   ├── _chunking_helper.py  # Smart prompt processing
│   │   │   │   ├── health.py            # Health checks
│   │   │   │   ├── models.py            # Model listing
│   │   │   │   └── stats.py             # System statistics
│   │   │   │
│   │   │   └── schemas/             # 🔮 API schemas / future Pydantic models
│   │   │
│   │   ├── core/                    # 🪄 VPS Integration (Active)
│   │   │   └── vps_client.py            # Connection pooling, circuit breaker
│   │   │
│   │   ├── retrieval/               # 🧬 Context Management (Active)
│   │   │   ├── chunker.py               # Text chunking (1000 char blocks)
│   │   │   ├── ranker.py                # Relevance ranking + deduplication
│   │   │   ├── assembler.py             # Prompt assembly
│   │   │   └── instruction_parser.py    # Instruction extraction
│   │   │
│   │   ├── orchestration/           # 🔮 Future: Multi-step task coordination
│   │   ├── reasoning/               # 🔮 Future: Symbolic / constraint-based reasoning
│   │   ├── tasks/                   # 🔮 Future: Reusable task templates
│   │   └── utils/                   # 🔮 Future: Shared helper functions
│   │
│   ├── scripts/
│   │   └── cerebrum_repl.sh             # Interactive streaming CLI
│   │
│   ├── config/
│   │   └── cerebrum-tunnel.service      # Tailscale VPN systemd service
│   │
│   ├── data/                        # 📄 Runtime data
│   │   ├── cache/
│   │   ├── embeddings/
│   │   └── knowledge_base/
│   │
│   ├── tests/                       # 🧪 Test suites
│   │   ├── test_api/
│   │   ├── test_core/
│   │   └── test_integration/
│   │
│   ├── start.sh                         # Start orchestrator
│   ├── stop.sh                          # Stop orchestrator
│   └── requirements.txt                 # Python dependencies
│
├── cerebrum-backend/              # 🔸 VPS Inference Backend - Debian 12
│   ├── vps_server/                  # ⚙️ Inference Engine (Active)
│   │   └── main.py                      # FastAPI + llama.cpp streaming
│   │
│   ├── scripts/
│   │   ├── start.sh
│   │   ├── test.sh                      # Health check tests
│   │   └── generate_api_key.sh          # API key generation
│   │
│   ├── config/                          # Configuration files
│   ├── logs/                            # Runtime logs
│   ├── cerebrum-backend.service         # Systemd service
│   └── requirements.txt                 # Python dependencies
│
├── deployment/                      # 🔮 Future: Deployment Automation
│   ├── scripts/
│   └── systemd/
│
├── docs/                            # 📚 Documentation
│   ├── api/
│   │   └── API.md
│   ├── architecture/
│   │   └── ARCHITECTURE.md
│   ├── diagrams/
│   │   └── images/
│   ├── guides/
│   │   └── DEVELOPMENT.md
│   └── optimization/
│       └── PERFORMANCE.md
│   
├── scripts/                        # 🔧 Development Tools
│   ├── sync_to_cm4.sh                  # Rsync to Raspberry Pi
│   └── sync_to_vps.sh                  # Rsync to VPS
│
└── shared/                         # 🧺 Shared Resources
    ├── embeddings/                     # Vector embeddings cache
    ├── knowledge_base/                 # Curated reference material
    │   ├── code_snippets/              # Reusable code examples
    │   ├── documentation/              # External reference materials
    │   │   └── vendor_docs/            # Third-party API docs, language specs
    │   └── examples/                   # Sample projects
    │
    └── models/
        ├── download_scripts/           # Model acquisition utilities
        │   └── download_models.sh
        └── lists/                      # Model manifests / allowlists

☕️ Development Workflow

Edit on macOS (VS Code + VS Code Insider)
Sync to CM4 (rsync)
Sync to VPS (rsync)
Test locally via REPL or API
Iterate without redeploying the full system

📚 Documentation

See docs/ directory for detailed information:

API - Available endpoints, request formats, and streaming behavior
Architecture - System design, data flow, and component boundaries
Development - Local workflow, testing, and contribution notes
Optimization - Performance characteristics and tuning considerations

License

This project is licensed under the MIT License.

Third-Party Licenses

This project uses Qt, which is licensed under LGPL v3.
See Qt's Open Source Licensing for details.

Contributing

The Cerebrum project is authored by a sole developer and maintainer.

Bug reports, documentation fixes, and design suggestions are always welcome and appreciated. If you encounter an issue or have an idea to share, please open an issue in Cerebrum/issues.

At this time, direct write access and unsolicited feature pull requests are not accepted. All code changes are curated by the maintainer to ensure architectural consistency and system stability.

Acknowledgments

Built with:

FastAPI - High-performance async web framework
llama.cpp - Efficient LLM inference
httpx - Modern HTTP client with connection pooling
Qwen - Alibaba's excellent code model
Qt - GUI framework with Qt Design Studio for native GUI development
Debian Project - Bookworm base system foundations
Raspberry Pi is a trademark of Raspberry Pi Ltd

Inspired by the challenge of running production AI on a Raspberry Pi.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebrum™ - Distributed Interactive AI Code Assistant

Token-streaming code generation optimized for Raspberry Pi CM4 + cloud architecture

What Makes Cerebrum Different

Architecture

Real-World Performance and Capabilities

🚀 Quick Start

1. Start VPS Backend

2. Start CM4 Orchestrator

3. Launch Streaming REPL

Deployment Model

CM4 Orchestrator (Raspberry Pi)

📘 Deployment Guide:

VPS Inference Backend

📙 Deployment Guide:

📂 Project Structure

☕️ Development Workflow

📚 Documentation

License

Third-Party Licenses

Contributing

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
cerebrum-backend		cerebrum-backend
cerebrum-pi		cerebrum-pi
deployment/systemd		deployment/systemd
docs		docs
scripts		scripts
shared		shared
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

artcore-c/Cerebrum

Folders and files

Latest commit

History

Repository files navigation

Cerebrum™ - Distributed Interactive AI Code Assistant

Token-streaming code generation optimized for Raspberry Pi CM4 + cloud architecture

What Makes Cerebrum Different

Architecture

Real-World Performance and Capabilities

🚀 Quick Start

1. Start VPS Backend

2. Start CM4 Orchestrator

3. Launch Streaming REPL

Deployment Model

CM4 Orchestrator (Raspberry Pi)

📘 Deployment Guide:

VPS Inference Backend

📙 Deployment Guide:

📂 Project Structure

☕️ Development Workflow

📚 Documentation

License

Third-Party Licenses

Contributing

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages