Skip to content

Latest commit

 

History

History
458 lines (334 loc) · 18.2 KB

File metadata and controls

458 lines (334 loc) · 18.2 KB

Technical Reference

System architecture and internals for developers and power users. For API endpoints, see API.md.


Architecture Overview

main.py (runner with restart loop)
└── sapphire.py (VoiceChatSystem)
    ├── LLMChat (core/chat/)
    │   ├── llm_providers → Claude, OpenAI, Fireworks, LM Studio, Responses
    │   ├── plugin_loader → plugins/*, user/plugins/*
    │   ├── function_manager → functions/*, scopes, story tools
    │   └── session_manager → chat history (SQLite)
    ├── Continuity (core/modules/continuity/)
    │   ├── scheduler → cron-based task runner
    │   └── executor → context isolation, task execution
    ├── TTS Server (core/tts/) → port 5012 (HTTP subprocess)
    ├── STT (core/stt/) → thread in main process (hot-toggleable)
    ├── Wake Word (core/wakeword/) → thread (hot-toggleable)
    ├── FastAPI Server (core/api_fastapi.py) → 0.0.0.0:8073
    └── Event Bus (core/event_bus.py) → SSE pub/sub

Process model: main.py is a runner that spawns sapphire.py with automatic restart on crash or restart request (exit code 42). sapphire.py spawns the TTS server as a subprocess via ProcessManager. STT runs as a thread. The FastAPI/uvicorn server handles all web traffic directly (auth, static files, API, SSE) on a single port. Everything else runs in the main process.


Scopes Architecture

Seven scope types isolate data per-chat via ContextVars in function_manager.py:

Scope What it isolates Overlay
scope_memory Memory slot Yes (sees own + global)
scope_goal Goal set Yes
scope_knowledge Knowledge tabs Yes
scope_people Contacts Yes
scope_email Email account No
scope_bitcoin Wallet No
scope_rag Per-chat documents No (strict)

Global overlay: Memory, goals, knowledge, and people scopes see both their own data AND entries in the "global" scope. RAG is strict — only the chat's own documents.

Setting scopes: Per-chat in Chat Settings sidebar → Mind Scopes. Set to "none" to disable a system for that chat.

ContextVars: Thread/async-safe isolation. Each chat execution context gets its own scope values via function_manager.set_*_scope().


User Directory

All user customization lives in user/ (gitignored). Created on first run.

user/
├── settings.json           # Your settings overrides
├── settings/
│   └── chat_defaults.json  # Defaults for new chats
├── prompts/
│   ├── prompt_monoliths.json
│   ├── prompt_pieces.json
│   └── prompt_spices.json
├── personas/
│   ├── personas.json       # Persona definitions
│   └── avatars/            # Persona avatar images
├── toolsets/
│   └── toolsets.json       # Custom toolsets
├── continuity/
│   ├── tasks.json          # Scheduled task definitions
│   └── activity.json       # Task execution log
├── story_presets/           # Custom story presets
├── webui/
│   └── plugins/            # Plugin settings (HA, email, etc.)
├── functions/              # Your custom tools
├── plugins/                # Your private plugins
├── history/
│   └── sapphire_history.db # Chat sessions (SQLite WAL)
├── public/
│   └── avatars/            # User/assistant avatars
├── memory.db               # Long-term memory (SQLite)
├── knowledge.db            # Knowledge + people (SQLite)
├── goals.db                # Goals + progress (SQLite)
├── ssl/                    # Self-signed cert (10yr, persistent)
└── logs/                   # Application logs

Bootstrap: On first run, core/setup.py copies factory defaults from core/modules/system/ to user/.


Configuration System

config.py (thin proxy)
    ↓
core/settings_manager.py
    ↓ merges
core/settings_defaults.json  ← Factory defaults (don't edit)
        +
user/settings.json           ← Your overrides
        =
Runtime config

Access pattern: import config then config.TTS_ENABLED, config.LLM_PROVIDERS, etc.

Settings Categories

Category Examples
identity DEFAULT_USERNAME, DEFAULT_AI_NAME
network SOCKS_ENABLED, SOCKS_HOST, SOCKS_PORT
privacy START_IN_PRIVACY_MODE, PRIVACY_NETWORK_WHITELIST
features MODULES_ENABLED, PLUGINS_ENABLED
wakeword WAKE_WORD_ENABLED, WAKEWORD_MODEL, WAKEWORD_THRESHOLD
stt STT_ENABLED, STT_MODEL_SIZE, STT_ENGINE
tts TTS_ENABLED, TTS_VOICE_NAME, TTS_SPEED, TTS_PITCH_SHIFT
llm LLM_PROVIDERS, LLM_FALLBACK_ORDER, LLM_MAX_HISTORY
audio AUDIO_INPUT_DEVICE, AUDIO_OUTPUT_DEVICE
tools MAX_TOOL_ITERATIONS, MAX_PARALLEL_TOOLS, TOOL_MAKER_VALIDATION
rag RAG_SIMILARITY_THRESHOLD
backups BACKUPS_ENABLED, BACKUPS_KEEP_DAILY, etc.

Settings Reload Tiers

Tier When Applied Examples
Hot Immediate Names, TTS voice/speed/pitch, LLM settings, SOCKS, privacy mode, generation params
Hot-toggle Runtime on/off Wakeword, STT (no restart needed)
File-watched ~2s after save settings.json, prompts/*.json, toolsets.json
Restart Exit code 42 Port changes, model configs, code changes

The settings manager tracks which changes need restart via get_pending_restart_keys().

Tool-registered settings: Tool modules can declare SETTINGS and SETTINGS_HELP dicts. These are registered at startup via register_tool_settings() and appear in the Settings UI under Custom Tools.

LLM Configuration

{
  "LLM_PROVIDERS": {
    "lmstudio": { "provider": "openai", "base_url": "http://127.0.0.1:1234/v1", "enabled": true },
    "claude": { "provider": "claude", "model": "claude-sonnet-4-5", "enabled": false },
    "fireworks": { "provider": "fireworks", "base_url": "...", "model": "...", "enabled": false },
    "openai": { "provider": "openai", "base_url": "...", "model": "gpt-4o", "enabled": false },
    "responses": { "provider": "responses", "base_url": "...", "enabled": false },
    "other": { "provider": "openai", "base_url": "...", "enabled": false }
  },
  "LLM_FALLBACK_ORDER": ["lmstudio", "claude", "fireworks", "openai"]
}

Providers are tried in fallback order. Each chat can override to use a specific provider.

Claude-Friendly Settings

For prompt caching (90% cost savings):

  • Enable caching: Settings → LLM → Claude → Enable prompt caching
  • Disable Spice — Changes system prompt every turn, breaks cache
  • Disable Datetime injection — Same problem, changes every turn
  • Disable State vars in prompt — Changes on state updates, breaks cache
  • "Story in prompt" is fine — Only changes on scene advance

Cache TTL can be 5m (default) or 1h for longer sessions.


Extended Thinking & Reasoning

Provider Feature How It Works
Claude Extended Thinking Structured thinking blocks with budget, thinking API param
GPT-5.x Reasoning Summaries Responses API, reasoning_summary param
Fireworks Reasoning Effort Qwen-Thinking, Kimi-K2 use reasoning_effort param

Claude: Enable in LLM settings → Claude → Extended Thinking. Budget default: 10,000 tokens. Auto-disables for continue mode and tool cycles without thinking. Thinking blocks preserved across tool calls.

GPT-5.x: Uses Responses API. Configure reasoning_effort (low/medium/high) and reasoning_summary (auto/detailed).

Fireworks: Models with "thinking" in the name return reasoning in reasoning_content field.

Cross-provider: Thinking blocks are stripped from history when switching to non-Claude providers.


Authentication & Credentials

Password / API Key

One bcrypt hash serves as login password, API key (X-API-Key header), and session secret.

OS Path
Linux ~/.config/sapphire/secret_key
macOS ~/Library/Application Support/Sapphire/secret_key
Windows %APPDATA%\Sapphire\secret_key

Reset password: Delete the secret_key file and restart.

Credential Manager

API keys, SOCKS credentials, email accounts, and wallet keys stored separately via core/credentials_manager.py.

OS Path
Linux ~/.config/sapphire/credentials.json
macOS ~/Library/Application Support/Sapphire/credentials.json
Windows %APPDATA%\Sapphire\credentials.json

Not included in backups for security. Sensitive fields encrypted with machine-identity Fernet key.

Priority: Stored credential → Environment variable fallback (ANTHROPIC_API_KEY, OPENAI_API_KEY, FIREWORKS_API_KEY, SAPPHIRE_SOCKS_USERNAME, SAPPHIRE_SOCKS_PASSWORD)

Credential Encryption Details

Sensitive fields (Bitcoin WIF keys, API keys, passwords) are encrypted at rest using Fernet symmetric encryption:

Layer Detail
Cipher Fernet = AES-128-CBC + HMAC-SHA256 (encrypt-then-MAC)
Key derivation PBKDF2-HMAC-SHA256, 100,000 iterations
Key input Random 32-byte salt + machine identity (hostname:username)
Salt file ~/.config/sapphire/.scramble_salt (permissions 0600)

Machine binding: The encryption key is derived from a salt file plus the current machine's hostname and OS username. This means credentials.json cannot be decrypted on a different machine or after an OS reinstall, even if copied.

Permanent key loss scenarios:

  • Machine hardware failure or OS reinstall
  • ~/.config/sapphire/ directory deleted
  • .scramble_salt file deleted or corrupted
  • Username or hostname changed (different key derivation input)

Backup implications:

  • credentials.json is deliberately excluded from Sapphire's user/ backup system
  • For Bitcoin wallets: use the Export Backup button in Settings → Plugins → Bitcoin to save a plaintext WIF file you can import on any machine
  • For API keys: re-enter them in Settings after a fresh install (or set via environment variables)

Plugin Signing & Verification

Plugins are signed with ed25519 to detect tampering. The signing key lives outside the repo; the public key is baked into the app.

How Signing Works (Authors)

The signing tool (user/tools/sign_plugin.py) walks every file in a plugin directory matching SIGNABLE_EXTENSIONS (.py, .json, .js, .css, .html, .md), computes a SHA256 hash of each, builds a JSON manifest, and signs it with an ed25519 private key. The output is plugin.sig in the plugin directory.

python user/tools/sign_plugin.py plugins/stop/
python user/tools/sign_plugin.py --all          # sign all plugins in plugins/

Private key: user/plugin_signing_key.pem (gitignored). Generate with user/tools/generate_signing_key.py.

How Verification Works (App)

On plugin load (core/plugin_verify.py), the app:

  1. Loads plugin.sig and verifies the ed25519 signature against the baked-in public key
  2. Re-hashes every file listed in the manifest and compares to the signed hashes
  3. Scans for any new files not in the manifest (injection detection)

Results: verified (load), unsigned (load with warning if sideloading enabled, block if disabled), or tampered (always block).

Cross-Platform Line Ending Normalization

Both the signer and verifier normalize line endings before hashing — CRLF (\r\n) is converted to LF (\n) in memory. This ensures signatures are valid regardless of OS or git core.autocrlf settings.

Without this, a plugin signed on Linux (LF) would read as tampered on Windows if git converts line endings to CRLF on checkout. The normalization is in-memory only — no files are modified on disk.

Settings

Setting Default Effect
ALLOW_UNSIGNED_PLUGINS true Allow unsigned plugins with sideloading confirmation

When false, only signed+verified plugins load. Unsigned plugins are blocked entirely.


Default Ports

Service Port Binding
FastAPI Server 8073 0.0.0.0 (all interfaces, HTTPS)
TTS Server 5012 0.0.0.0 (configurable)
LM Studio (default) 1234 External

Component Services

TTS (Text-to-Speech)

  • Server: core/tts/tts_server.py (Kokoro, HTTP subprocess)
  • Client: core/tts/tts_client.py
  • Null provider: core/tts/providers/null.py (when disabled, wrapped in TTSClient)

Started by ProcessManager if TTS_ENABLED=true. Auto-restarts on crash. Server auto-restarts at 3GB memory or 500 requests.

17 voices available (American and British, male and female). Pitch shifting via resampling, speed control via Kokoro parameter.

STT (Speech-to-Text)

  • Server: core/stt/server.py (faster-whisper, loaded in main process)
  • Recorder: core/stt/recorder.py (adaptive VAD, silence detection)
  • Guard: core/stt/utils.py (shared can_transcribe() check)

Runs as thread if STT_ENABLED=true. Supports hot-toggle at runtime via VoiceChatSystem.toggle_stt(). GPU (CUDA) with CPU fallback.

Wake Word

  • Detector: core/wakeword/wake_detector.py (OpenWakeWord)
  • Recorder: core/wakeword/audio_recorder.py
  • Null impl: core/wakeword/wakeword_null.py

Supports hot-toggle at runtime. Auto-suppresses when web UI mic is active. Custom models supported in user/wakeword/models/ (.onnx, .tflite).

Audio Device Manager

  • Manager: core/audio/device_manager.py (singleton)
  • Cross-platform device detection, sample rate negotiation, fallback logic
  • Shared by STT and wakeword systems

Privacy Mode

Blocks cloud LLM providers to keep conversations local.

  • is_local: True providers (lmstudio) — always allowed
  • privacy_check_whitelist: True providers — allowed if base_url passes whitelist
  • Cloud providers (claude, openai, fireworks) — blocked
  • Whitelist supports CIDR ranges (e.g., 192.168.0.0/16)

Toggle via Settings or PUT /api/privacy.


Event Bus & SSE

Real-time UI updates via Server-Sent Events.

  • Backend: core/event_bus.py — thread-safe pub/sub with sync and async subscribers
  • Frontend: core/event-bus.js — EventSource client with auto-reconnect
  • Boot version tracking: detects server restarts without clearing browser state
  • 50-event replay buffer for late subscribers
  • 15-second keepalive pings

Event types: AI typing, messages, TTS/STT state, chat switches, settings/prompt/toolset changes, continuity tasks, wakeword detection, errors.


File Watchers

Watcher Files Delay
Settings user/settings.json ~2s
Prompts user/prompts/*.json ~2s
Toolsets user/toolsets/toolsets.json ~2s

Chat Sessions

SQLite database user/history/sapphire_history.db (WAL mode):

Schema: chats(name TEXT PRIMARY KEY, settings JSON, messages JSON, updated_at TEXT)

Each session has message history, per-chat settings (prompt, voice, toolset, LLM, spice, scopes), and metadata. Story engine state stored in state_current and state_log tables in the same database.


Key Source Files

Path Purpose
main.py Runner with restart loop
sapphire.py VoiceChatSystem entry point
config.py Settings proxy
core/api_fastapi.py Unified FastAPI server (221 endpoints)
core/auth.py Session auth, CSRF, rate limiting
core/ssl_utils.py Self-signed certificate generation
core/settings_manager.py Settings merge, file watcher, restart tiers
core/credentials_manager.py API keys, secrets, Fernet encryption
core/setup.py Bootstrap, auth, first-run
core/event_bus.py Real-time event pub/sub for SSE
core/chat/chat.py LLM orchestration
core/chat/chat_streaming.py SSE response streaming
core/chat/llm_providers/ Claude, OpenAI, Fireworks, Responses providers
core/chat/function_manager.py Tool loading, scopes, story tools
core/chat/history.py Session management
core/story_engine/engine.py Story state, presets, custom tools
core/modules/continuity/scheduler.py Cron-based task scheduler
core/audio/device_manager.py Audio device handling
functions/knowledge.py Knowledge base + people
functions/memory.py Long-term memory + embeddings

Reference for AI

Sapphire architecture for troubleshooting and development.

PROCESSES:

  • main.py: Runner with restart loop (exit 42 = restart)
  • sapphire.py: Core VoiceChatSystem
  • core/api_fastapi.py: Unified FastAPI server (port 8073, HTTPS, 221 endpoints)
  • TTS server: Kokoro HTTP subprocess (port 5012, if enabled)
  • STT: Faster-whisper thread in main process

PORTS:

  • 8073: FastAPI server (HTTPS, all routes)
  • 5012: TTS server (if enabled)
  • 1234: Default LLM (LM Studio)

SCOPES (7 types, ContextVar-based):

  • scope_memory, scope_goal, scope_knowledge, scope_people: global overlay
  • scope_email, scope_bitcoin: no overlay
  • scope_rag: strict per-chat isolation
  • Set per-chat in sidebar Mind Scopes

LLM PROVIDERS:

  • lmstudio, claude, fireworks, openai, other, responses
  • LLM_FALLBACK_ORDER controls Auto mode
  • Per-chat override via session settings
  • API keys: ~/.config/sapphire/credentials.json or env vars
  • Privacy mode blocks cloud, whitelist-based for configurable endpoints

CREDENTIALS:

  • ~/.config/sapphire/secret_key: Password/API key hash
  • ~/.config/sapphire/credentials.json: LLM, SOCKS, email, bitcoin, SSH, HA
  • Not in user/ directory, not in backups
  • Sensitive fields Fernet-encrypted (machine identity key)

HOT RELOAD:

  • Settings/prompts/toolsets: ~2s after file change
  • Wakeword/STT: hot-toggle on/off at runtime
  • TTS: hot-stop/start via ProcessManager
  • LLM settings, SOCKS, privacy: immediate
  • Ports, models, code: require restart

API: See docs/API.md for all 221 endpoints

DATABASES:

  • user/history/sapphire_history.db: chats, state_current, state_log
  • user/memory.db: memories, memories_fts, memory_scopes
  • user/knowledge.db: people, knowledge_tabs, knowledge_entries, knowledge_fts
  • user/goals.db: goals, progress_journal

LOGS:

  • user/logs/sapphire.log: Main log
  • user/logs/tts.log: TTS server log