Version 2.4.0 | Fully local AI stack deployed on user hardware with a single command
Dream Server is a self-hosted AI platform that orchestrates 19 microservices via Docker Compose across four GPU backends (NVIDIA, AMD, Apple Silicon, Intel Arc) and CPU-only fallback. The system is structured in two layers: an outer wrapper (installer scripts, CI, resources) and the core product (dream-server/) containing all deployable code.
The architecture follows a layered compose model: a base compose file defines core services, GPU-specific overlays configure hardware acceleration, and extension compose files add optional services. A registry-driven CLI (dream-cli) manages the lifecycle.
graph TB
subgraph External["User Access (localhost only)"]
Browser["Browser"]
end
subgraph Core["Core Services"]
LLAMA["llama-server<br/>:8080<br/>LLM Inference"]
WEBUI["open-webui<br/>:3000<br/>Chat UI"]
DASH["dashboard<br/>:3001<br/>Control Center"]
DAPI["dashboard-api<br/>:3002<br/>System Status API"]
end
subgraph Gateway["API Gateway"]
LITE["litellm<br/>:4000<br/>OpenAI-compatible proxy"]
end
subgraph Voice["Voice Pipeline"]
WHISPER["whisper<br/>:9000<br/>Speech-to-Text"]
TTS["tts / Kokoro<br/>:8880<br/>Text-to-Speech"]
end
subgraph Search["Search & Research"]
SEARX["searxng<br/>:8888<br/>Metasearch"]
PERP["perplexica<br/>:3004<br/>Deep Research"]
end
subgraph Agents["Agents & Automation"]
CLAW["openclaw<br/>:7860<br/>Agent Framework"]
APE["ape<br/>:7890<br/>Policy Engine"]
N8N["n8n<br/>:5678<br/>Workflows"]
end
subgraph RAG["RAG Pipeline"]
QDRANT["qdrant<br/>:6333<br/>Vector DB"]
EMBED["embeddings<br/>:8090<br/>TEI Vectors"]
end
subgraph Media["Media Generation"]
COMFY["comfyui<br/>:8188<br/>Image Gen"]
end
subgraph Privacy["Privacy & Observability"]
SHIELD["privacy-shield<br/>:8085<br/>PII Protection"]
SPY["token-spy<br/>:3005<br/>Usage Monitor"]
LANG["langfuse<br/>:3006<br/>LLM Tracing"]
end
subgraph Dev["Development"]
CODE["opencode<br/>:3003<br/>Web IDE"]
end
Browser --> WEBUI
Browser --> DASH
Browser --> LITE
Browser --> CODE
WEBUI --> LLAMA
WEBUI --> COMFY
WEBUI --> WHISPER
WEBUI --> TTS
WEBUI --> SEARX
LITE --> LLAMA
DASH --> DAPI
DAPI --> LLAMA
DAPI --> N8N
DAPI --> SPY
DAPI --> SHIELD
CLAW --> LLAMA
CLAW --> SEARX
PERP --> LLAMA
PERP --> SEARX
SHIELD --> LLAMA
The LLM inference engine (llama-server) is the foundation. GPU overlays select the correct container image and runtime:
| Backend | Image | Acceleration |
|---|---|---|
| NVIDIA | llama.cpp:server-cuda-b8248 |
CUDA, all GPUs reserved |
| AMD | Custom dream-lemonade-server |
ROCm / Vulkan / NPU via Lemonade |
| Apple | llama.cpp:server-b8248 (ARM64) |
CPU in Docker (Metal on host) |
| Intel Arc | SYCL backend | Experimental |
| CPU | llama.cpp:server-b8248 |
Pure CPU fallback |
LiteLLM (port 4000) sits in front as an OpenAI-compatible proxy, enabling cloud fallback in hybrid mode and standardized API access for all consumers.
- open-webui (port 3000) — Primary chat interface with integrated image generation (ComfyUI/SDXL), voice I/O (Whisper + Kokoro), and web search (SearXNG)
- dashboard (port 3001) — React/Vite control center for feature discovery, service health, setup wizard, model management
- dashboard-api (port 3002) — FastAPI backend with routers for setup, features, agents, privacy, workflows, and updates
- searxng (port 8888) — Privacy-respecting metasearch engine
- perplexica (port 3004) — Deep research combining search results with LLM reasoning
- openclaw (port 7860) — AI agent framework with tool access (exec, read, write, web), up to 20 concurrent subagents
- ape (port 7890) — Agent Policy Engine enforcing allow/deny rules on tool access
- n8n (port 5678) — Visual workflow automation with a pre-built catalog
- qdrant (port 6333) — Vector database for document retrieval
- embeddings (port 8090) — HuggingFace TEI for generating vector embeddings
- whisper (port 9000) — Speech-to-text (OpenAI-compatible API)
- tts/Kokoro (port 8880) — Text-to-speech (OpenAI-compatible API)
- comfyui (port 8188) — Image generation with SDXL Lightning (4-step)
- privacy-shield (port 8085) — PII detection and scrubbing middleware
- token-spy (port 3005) — Token usage and cost tracking
- langfuse (port 3006) — LLM observability and tracing
- opencode (port 3003) — Web IDE (runs as host systemd service, not Docker)
The installer is a 13-phase pipeline orchestrated by install-core.sh. Libraries in installers/lib/ are pure functions (no side effects); phases in installers/phases/ execute sequentially.
graph LR
subgraph Libraries["installers/lib/ (pure functions)"]
C[constants] --> D[detection]
D --> T[tier-map]
T --> CS[compose-select]
P[packaging]
U[ui]
L[logging]
end
subgraph Phases["installers/phases/ (sequential)"]
P01["01 Preflight"] --> P02["02 Detection"]
P02 --> P03["03 Features"]
P03 --> P04["04 Requirements"]
P04 --> P05["05 Docker"]
P05 --> P06["06 Directories"]
P06 --> P07["07 DevTools"]
P07 --> P08["08 Images"]
P08 --> P09["09 Offline"]
P09 --> P10["10 AMD Tuning"]
P10 --> P11["11 Services"]
P11 --> P12["12 Health"]
P12 --> P13["13 Summary"]
end
Libraries --> Phases
| Phase | Purpose |
|---|---|
| 01 Preflight | Root/OS/tools checks, existing install detection |
| 02 Detection | GPU hardware detection, tier assignment, compose config selection |
| 03 Features | Interactive feature selection (voice, workflows, RAG, images, etc.) |
| 04 Requirements | RAM, disk, GPU, port availability checks |
| 05 Docker | Install Docker, Compose, NVIDIA Container Toolkit |
| 06 Directories | Create dirs, copy source, generate .env, configure services |
| 07 DevTools | Install Claude Code, Codex CLI, OpenCode |
| 08 Images | Build image pull list, download all Docker images |
| 09 Offline | Configure air-gapped operation |
| 10 AMD Tuning | AMD APU sysctl, modprobe, GRUB, tuned setup |
| 11 Services | Download GGUF model, generate models.ini, launch stack |
| 12 Health | Verify all services responding, pre-download STT models |
| 13 Summary | Generate URLs, desktop shortcuts, summary JSON |
The stack uses compose file merging. The resolver script dynamically discovers enabled extensions and composes the full stack:
graph TB
BASE["docker-compose.base.yml<br/>(core services)"]
GPU["docker-compose.{nvidia,amd,apple,cpu}.yml<br/>(GPU overlay)"]
EXT1["extensions/services/comfyui/compose.yaml"]
EXT2["extensions/services/n8n/compose.yaml"]
EXT3["extensions/services/.../compose.yaml"]
EXTGPU["extensions/services/.../compose.nvidia.yaml"]
BASE --> MERGE["resolve-compose-stack.sh"]
GPU --> MERGE
EXT1 --> MERGE
EXT2 --> MERGE
EXT3 --> MERGE
EXTGPU --> MERGE
MERGE --> STACK["Final Docker Compose Stack"]
install.sh → install-core.sh → sources installers/lib/*.sh → sources installers/phases/01..13.sh sequentially. Each phase reads state set by prior phases via exported variables. Hardware detection (phase 02) drives all downstream decisions: tier assignment selects the model GGUF, context window, batch size, and compose overlays.
dream-cli start → resolve-compose-stack.sh reads enabled services from .env → assembles docker compose -f base -f gpu-overlay -f ext1 -f ext2 ... → docker compose up -d. Health checks gate dependent services (e.g., open-webui waits for llama-server healthy).
Browser → open-webui:3000 → llama-server:8080/v1/chat/completions → GPU inference → response streamed back. If hybrid mode: open-webui → litellm:4000 → tries llama-server first, falls back to cloud API.
Browser → openclaw:7860 → agent spawns with tools (exec, read, write, web) → tool calls hit searxng:8888 for search, llama-server:8080 for reasoning → ape:7890 enforces policy on each tool invocation → results streamed back.
Browser → dashboard:3001 → dashboard-api:3002/api/features → API reads all service manifests, checks container health via Docker socket, cross-references GPU capabilities and VRAM → returns feature list with status (enabled, available, insufficient_vram, services_needed) and recommendations.
| Variable | Default | Controls |
|---|---|---|
GPU_BACKEND |
detected | nvidia, amd, apple, cpu |
GGUF_FILE |
tier-dependent | Model file in /data/models/ |
CTX_SIZE |
16384 |
Context window (tokens) |
DREAM_MODE |
local |
local, cloud, hybrid |
LITELLM_KEY |
generated | API gateway authentication |
DASHBOARD_API_KEY |
generated | Dashboard API authentication |
All services bind to 127.0.0.1 (localhost only). Canonical port assignments live in config/ports.json.
| Port | Service | Port | Service |
|---|---|---|---|
| 3000 | open-webui | 6333 | qdrant |
| 3001 | dashboard | 7860 | openclaw |
| 3002 | dashboard-api | 7890 | ape |
| 3003 | opencode | 8080 | llama-server |
| 3004 | perplexica | 8085 | privacy-shield |
| 3005 | token-spy | 8090 | embeddings |
| 3006 | langfuse | 8188 | comfyui |
| 4000 | litellm | 8880 | tts |
| 5678 | n8n | 8888 | searxng |
| 9000 | whisper |
Every service is an extension under extensions/services/<id>/ with:
manifest.yaml— Service contract (id, port, health endpoint, category, GPU backends, dependencies, features)compose.yaml— Docker Compose service definition (optional; core services live indocker-compose.base.yml)compose.{nvidia,amd}.yaml— GPU-specific overlaysDockerfile— Custom image build (if needed)
The manifest schema is enforced by extensions/schema/service-manifest.v1.json. The service registry library (lib/service-registry.sh) provides lookup functions for the CLI and installer.
| Workflow | Purpose |
|---|---|
test-linux.yml |
Integration suite: smoke, manifests, health, BATS, tier map, contracts |
matrix-smoke.yml |
Multi-distro smoke (Ubuntu, Debian, Fedora, Arch, openSUSE) |
validate-compose.yml |
Docker Compose file validation |
validate-env.yml |
Environment variable schema validation |
dashboard.yml |
Dashboard build and lint |
lint-shell.yml |
ShellCheck on all .sh files |
lint-python.yml |
Python linting (ruff, black) |
type-check-python.yml |
Python type checking (mypy) |
secret-scan.yml |
GitLeaks secret detection |
lint-powershell.yml |
PowerShell linting for Windows installer |
Priority when principles conflict: Let It Crash > KISS > Pure Functions > SOLID
- Let It Crash: No broad catches, no silent swallowing. Errors propagate visibly. Bash uses
set -euo pipefaileverywhere. - KISS: Readable over clever. One function, one job. No premature abstraction.
- Pure Functions: Installer libraries (
installers/lib/) are the pure functional core; phases are the imperative shell. - SOLID: Extend via config/data (manifests, backend JSON), not code modification.