Local Profiles (No Models Bundled)

Local mode is a first-class path in this project, but models are not shipped inside Docker images. You must download and mount models into ./models (default) or provide equivalent paths via environment variables.

Goals

Predictable “local stack” that boots reliably
Clear expectations for CPU/RAM/GPU
Explicit build profiles so contributors don’t accidentally ship multi-GB images

Recommended Profiles

Profile: `local-core` (recommended default)

Use when you want “fully local” call handling with a predictable, smaller stack:

STT: Vosk
LLM: llama.cpp (GGUF) (e.g., Phi-3-mini)
TTS: Piper
No Sherpa, no Kokoro, no embedded Kroko

Run:

docker compose -f docker-compose.yml -f docker-compose.local-core.yml build local_ai_server
docker compose up -d

Profile: `local-full` (power users)

Enable additional backends (Sherpa, Kokoro, embedded Kroko). This is heavier, increases build times, and may exceed CI runner disk limits.

Run:

docker compose build local_ai_server (default settings)

Profile: `local-gpu` (GPU-accelerated)

Uses Dockerfile.gpu for CUDA-enabled llama.cpp + Faster Whisper. Includes all backends from local-full plus CUDA support.

Run:

docker compose -f docker-compose.yml -f docker-compose.gpu.yml build local_ai_server
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d local_ai_server

Prerequisites: NVIDIA GPU, nvidia-container-toolkit. See LOCAL_ONLY_SETUP.md for full GPU setup.

Embedded Kroko (Default: Off)

INCLUDE_KROKO_EMBEDDED is off by default (lighter image). Enable it only if you specifically want the embedded Kroko ONNX websocket server inside local_ai_server.

Enable for a build: INCLUDE_KROKO_EMBEDDED=true docker compose build local_ai_server

Hardware Expectations (Rule of Thumb)

CPU-only “core”: expect multi-second LLM turns on small CPUs; prioritize fewer concurrent local calls.
GPU (if used): large variance by GPU + model; tune LOCAL_LLM_GPU_LAYERS and thread counts.
RAM: ensure the model(s) you choose fit comfortably; leave headroom for Docker + Asterisk.

Typical Disk/RAM Footprints

These are approximate and vary by model variant/quantization and OS memory behavior. Verify with du -sh on your actual model files.

Local-Core (CPU)

Component	Example default	Disk (approx)	RAM (approx)	Notes
STT	Vosk `vosk-model-en-us-0.22`	~1.5–2.0 GB	~0.5–1.0 GB	Larger models improve accuracy; CPU-only
LLM	Phi-3-mini GGUF `Q4_K_M`	~2.0–2.6 GB	~3–5 GB	CPU-only can be slow; GPU layers reduce CPU load
TTS	Piper `en_US-lessac-medium.onnx`	~50–150 MB	~0.2–0.6 GB	Voice/model dependent
Total (core)	Vosk + Phi-3-mini + Piper	~3.6–4.8 GB	~4–7 GB	Add headroom for Docker + Asterisk

Local-GPU (Recommended with NVIDIA GPU)

Component	Example default	Disk (approx)	RAM/VRAM (approx)	Notes
STT	Faster Whisper `base`	~150 MB	~0.5 GB VRAM	CUDA accelerated; `tiny` to `large-v3` available
LLM	Phi-3-mini GGUF `Q4_K_M`	~2.0–2.6 GB	~3–5 GB VRAM	`LOCAL_LLM_GPU_LAYERS=-1` for full offload
TTS	Kokoro (HF mode)	~300–500 MB	~0.5–1.0 GB	Auto-downloads from HuggingFace Hub
Total (GPU)	Faster Whisper + Phi-3 + Kokoro	~2.5–3.3 GB	~4–7 GB VRAM	Validated ~665ms E2E on RTX 4090

Practical guidance:

CPU-only host: 8 GB RAM minimum; 16 GB recommended for smoother turns and fewer OOM surprises.
GPU host: VRAM needs depend on how many layers you offload; start with LOCAL_LLM_GPU_LAYERS=0 then increase gradually.

Model Paths (defaults)

Mounted by docker-compose.yml:

Host: ./models
Container: /app/models

Defaults used by local_ai_server:

STT: LOCAL_STT_MODEL_PATH=/app/models/stt/...
LLM: LOCAL_LLM_MODEL_PATH=/app/models/llm/...
TTS: LOCAL_TTS_MODEL_PATH=/app/models/tts/...

Adjust these in .env to match your downloaded model locations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Profiles (No Models Bundled)

Goals

Recommended Profiles

Profile: `local-core` (recommended default)

Profile: `local-full` (power users)

Profile: `local-gpu` (GPU-accelerated)

Embedded Kroko (Default: Off)

Hardware Expectations (Rule of Thumb)

Typical Disk/RAM Footprints

Local-Core (CPU)

Local-GPU (Recommended with NVIDIA GPU)

Model Paths (defaults)

FilesExpand file tree

LOCAL_PROFILES.md

Latest commit

History

LOCAL_PROFILES.md

File metadata and controls

Local Profiles (No Models Bundled)

Goals

Recommended Profiles

Profile: local-core (recommended default)

Profile: local-full (power users)

Profile: local-gpu (GPU-accelerated)

Embedded Kroko (Default: Off)

Hardware Expectations (Rule of Thumb)

Typical Disk/RAM Footprints

Local-Core (CPU)

Local-GPU (Recommended with NVIDIA GPU)

Model Paths (defaults)

Profile: `local-core` (recommended default)

Profile: `local-full` (power users)

Profile: `local-gpu` (GPU-accelerated)