Atlas Cortex is split into multiple parts so that the core engine is portable and reusable by anyone, while extended features adapt to available infrastructure.
| Part 1: Core Engine | Part 2: Integration | Part 2.5: Satellites | Parts 3–8: Extended | |
|---|---|---|---|---|
| What | The brain — personality, memory, context, avatar, safety | Connects to real world (HA, files, network) | Distributed speakers/mics in every room | Alarms, routines, media, education, intercom |
| Requires | Any LLM backend + Python | Discovered services (HA, etc.) | Satellite hardware (Pi, ESP32) | Satellites + integrations |
| Portable? | Yes — any machine | Adapts to found services | Hardware-agnostic | Builds on Parts 1–2.5 |
When someone installs Atlas Cortex on their own system:
- Installer runs — detects hardware, finds existing LLM backends (or offers to install one)
- Selects models — recommends best models for detected GPU/RAM, pulls them
- Core starts — Atlas Cortex server (:5100) + optional Open WebUI Pipe function
- Service discovery — scans network for HA, Nextcloud, CalDAV, IMAP, NAS, etc.
- User configures — confirms services, provides credentials (CLI or via conversation with Atlas)
- Plugins activate — integrations register into Layer 2
- LLM-assisted refinement — once running, Atlas helps configure the rest conversationally
See installation.md for the full installer design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| C0 | Installer & Backend Abstraction | ✅ Complete | None |
| C1 | Core Pipe & Logging | ✅ Complete | C0 |
| C3a | Voice Identity (generic) | ✅ Complete | None |
| C4 | Emotional Evolution | ✅ Complete | C3a + C5 + C6 |
| C5 | Memory System (HOT/COLD) | ✅ Complete | None |
| C6 | User Profiles & Age-Awareness | ✅ Complete | C3a + C5 |
| C7 | Avatar System | ✅ Complete | None |
| C9 | Backup & Restore | ✅ Complete | None |
| C10 | Context Management & Hardware | ✅ Complete | C0 |
| C11 | Voice & Speech Engine | ✅ Complete | C0 |
| C12 | Safety Guardrails & Content Policy | ✅ Complete | C6 |
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| I1 | Service Discovery & Setup | ✅ Complete | Part 1 C1 operational |
| I2 | Home Assistant Integration | ✅ Complete | I1 + HA discovered |
| I3 | Voice Pipeline & Spatial | ✅ Complete | I1 + I2 + C3a |
| I4 | Self-Learning Engine | ✅ Complete | I2 + C1 logging |
| I5 | Knowledge Source Connectors | ✅ Complete | I1 + C5 memory + C6 profiles |
| I6 | List Management | ✅ Complete | I1 + I5 |
| I7 | Offsite Backup | ✅ Complete | I1 + C9 |
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| S2.5 | Satellite Speaker/Mic System | ⏸️ Wake word deferred | C11 (TTS) + C3a (Voice ID) |
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P3 | Alarms, Timers & Reminders | ✅ Complete | S2.5 + I2 |
| P4 | Routines & Automations | ✅ Complete | I2 + P3 |
| P5 | Proactive Intelligence | ✅ Complete | I2 + S2.5 + C5 |
| P6 | Learning & Education | ✅ Complete | C6 + C12 |
| P7 | Intercom & Broadcasting | ✅ Complete | S2.5 |
| P8 | Media & Entertainment | ✅ Complete | S2.5 + I2 |
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P9 | Self-Evolution | ✅ Complete | C5 + I4 |
| P10 | Story Time Engine | ✅ Complete | C11 + C12 + C6 |
| P11 | Atlas CLI Agent | ✅ Complete | C0 + C1 |
| P12 | Standalone Web App | ✅ Complete | P11 |
| P13 | Legacy Protocol | 🔲 Planned | I2 |
| P14 | Household Management | 🔲 Planned | I2 + P3 |
| P15 | Security & Monitoring | 🔲 Planned | I2 + P5 |
| P16 | Health & Wellness | 🔲 Planned | C6 + P3 |
| P17 | Multi-Language Support | 🔲 Planned | C6 + C11 |
| P18 | Visual Media & Casting | 🔲 Future | P8 + I2 |
Everything below works with any LLM backend. No Home Assistant, no specific servers, no network knowledge.
See installation.md for full design.
- Abstract
LLMProviderclass:chat(),embed(),list_models(),health() OllamaProvider— talks to Ollama's/api/chat,/api/embeddingsOpenAICompatibleProvider— works with vLLM, LocalAI, LM Studio, llama.cpp, etc.- Provider selected at install time, configurable in
cortex.env
- Separate from LLM provider (can be different backends)
- Options: Ollama, OpenAI-compatible, sentence-transformers (in-process), fastembed
- Fallback: if LLM provider has no embedding support, use in-process sentence-transformers
- GPU detection (AMD/NVIDIA/Intel/Apple/CPU-only)
- Multi-GPU discovery: enumerate all discrete GPUs, rank by VRAM
- GPU role assignment: largest → LLM, second → voice (TTS/STT), third+ → overflow
- Mixed-vendor support: generate per-GPU isolation env vars (HIP/CUDA/oneAPI)
- VRAM/RAM budgets, context window limits
- Model recommendations based on hardware tier
- Store per-GPU profiles in
hardware_gputable - Already designed in C10.1 — shared implementation
- Probe localhost + local network for running LLM backends
- Support: Ollama, LM Studio, vLLM, LocalAI, llama.cpp, koboldcpp, text-gen-webui
- Offer to install if nothing found (default: Ollama)
- Validate connectivity before saving
- Detect Open WebUI → offer Pipe function mode
- Always start standalone server (:5100) with OpenAI-compatible API
- This makes Atlas work with ANY client that supports OpenAI API
- HA conversation agent can point to Atlas server directly
python -m cortex.install— interactive CLI wizard- Two-stage: deterministic setup first (no LLM), then LLM-assisted refinement
- Generates
cortex.envwith all configuration - Creates database, pulls models, starts server
- Offers to run Part 2 discovery immediately or later
The foundational pipe function — an intelligent router that processes every message through layered analysis.
Create the Open WebUI Pipe function:
- VADER sentiment analysis (installed in pipe's
__init__) - Layer 0: context assembly (user identification, sentiment, time-of-day)
- Layer 1: instant answers (date, time, math, identity, greetings)
- Layer 2: plugin-based action layer — dispatches to registered integration plugins (initially empty; Part 2 adds HA, lists, etc.)
- Layer 3: filler streaming + Ollama API background call
- Auto-select model based on query complexity
- No hardcoded infrastructure — Layer 2 is a registry that plugins populate
- Create cortex.db SQLite database (mounted volume)
- Create all tables from data-model.md
- Log every interaction with full metadata
- Flag LLM fallthrough events that triggered plugins (for learning)
- Default filler pools per sentiment category
- Time-of-day aware fillers (morning, afternoon, late night)
- Confidence-aware fillers (see grounding.md)
- Background thread for Ollama streaming
- Smooth transition: inject filler context into LLM system prompt
- Register Cortex as a model in Open WebUI
- Set as default model
- If prior models exist (Turbo/Atlas/Deep), retire them (Cortex replaces all)
- Layer 2 action registry: plugins register command patterns + handlers
- Plugin lifecycle: discover → configure → activate → health check
- Plugin API:
register_patterns(),handle_command(),discover_entities() - Built-in plugins: none (Part 2 provides HA, lists, etc.)
- Plugin health monitoring: disable unhealthy plugins gracefully
Speaker recognition — no infrastructure dependencies. Works with any audio source.
- Docker container with resemblyzer library (CPU-based, ~200MB RAM)
- REST API:
POST /enroll— audio + user_id → store embeddingPOST /identify— audio → user_id + confidence
- Cosine similarity matching against stored embeddings
- Voice command trigger: "Hey Atlas, remember my voice"
- Multi-sample enrollment (3-5 utterances for accuracy)
- Link voice profile to Open WebUI user account
- Average embeddings across samples for robustness
- Voice requests include speaker embedding in metadata
- Pipe calls speaker-id sidecar for identification
- Inject identified user context into all processing layers
- Unknown speaker handling: prompt for name, offer enrollment
- Extract pitch, cadence, speech rate from speaker-id audio
- Vocabulary complexity analysis from transcript
- Low-confidence heuristic (used as initial hint only, refined through interaction)
- Never tell a user their estimated age — only use internally for tone
The personality layer that makes Atlas feel human.
- Initialize profile on first interaction
- Track
rapport_score: +0.01 per positive, -0.02 per frustrated - Detect communication style from message patterns
- Store time-of-day activity patterns
- Decay rapport by 0.005/day of no interaction
- LLM reviews day's conversations per user
- Generates updated
relationship_notes - Creates new personalized filler phrases matching user's style
- Adjusts
preferred_tonebased on communication patterns - "Personality drift" — Atlas slowly develops unique traits per relationship
- Morning: "Good morning, Derek. Coffee's probably brewing?"
- Late night: "Still at it? Here's what I found..."
- After absence: "Hey, haven't seen you in a couple days!"
- User frustrated: tone shifts to calm, direct, solution-focused
- User excited: matches energy, uses exclamation marks
- Remember user preferences ("Derek likes lights at 40% in the evening")
- Proactive suggestions ("It's 10 PM — want me to set evening mode?")
- Conversation callbacks ("How'd that Docker fix work out?")
Adapted from agentic-memory-quest. See memory-system.md for full design.
- Pull
nomic-embed-textinto Ollama (274MB, CPU-friendly) - Verify embedding API:
POST /api/embeddingsreturns 768-dim vectors - Benchmark: target <10ms per embedding on CPU
- Deploy ChromaDB in embedded mode (inside Cortex pipe or sidecar)
- Create
cortex_memorycollection with HNSW index - Persistent storage on mounted volume
- Metadata schema: user_id, type, source, tags, supersedes, ttl, confidence
- Compute query embedding via Ollama
- Sparse search: SQLite FTS5 (BM25 scoring)
- Dense search: ChromaDB vector similarity (cosine)
- RRF Fusion (k=60) to merge ranked lists
- Optional cross-encoder reranker (
ms-marco-MiniLM-L-6-v2) - Return top-K (default 8) MemoryHits, sub-50ms target
- asyncio.Queue for non-blocking writes
- PII redactor (regex-based: emails, phones, SSN, CC numbers)
- Memory decider: heuristics for keep/drop/dedup (preference, fact, chit-chat)
- Embed via Ollama, upsert to ChromaDB + FTS5 mirror
- Append-only: corrections link to originals, never overwrite
- Content-hash dedup for idempotency
- Layer 0: HOT query to retrieve user memories on every request
- Layer 1: Memory-powered instant answers ("what's my daughter's name?")
- Layer 2: Memory-powered personalized defaults (via plugins)
- Layer 3: Inject memory context into LLM system prompt
- COLD path fires after every interaction to capture new memories
See user-profiles.md for full design.
- SQLite
user_profilestable for fast structured queries - Profile fields: age, age_group, vocabulary_level, preferred_tone, communication_style
- Append-only profile evolution with confidence scoring
- Parent-child relationships (
parent_user_idforeign key)
- First encounter detection (new user_id or unknown voice)
- Natural "meeting someone new" dialogue flow
- Gradual profile building through conversation (not interrogation)
- "We've talked before" handling — search memory, re-link profiles
- Response profiles: toddler, child, teen, adult, unknown/neutral
- Vocabulary filtering by age group
- Content safety filtering for children
- Tone adaptation: warm+simple (toddler) → casual+respectful (teen) → personalized (adult)
- System prompt modifier injected based on detected age group
parental_controlstable: content filter level, allowed devices, allowed hours- Children can only trigger actions on their allowed list
- Time-based restrictions (e.g., no actions after 9 PM for kids)
- Sensitive commands require parent confirmation
Visual face for Atlas displayed on screens. See avatar-system.md for full design.
- FastAPI + WebSocket server (atlas-avatar, port 8891)
- Receives TTS audio + phoneme timing from any TTS engine
- Receives emotion state from Cortex pipe
- Routes viseme + emotion frames to displays via WebSocket
- Serves the avatar web page (HTML/CSS/JS/SVG)
- Integrate with Piper TTS phoneme output or espeak-ng
- Generate timed phoneme sequences from TTS text
- Handle streaming chunks (sentence-boundary splitting)
- Map ~40 IPA phonemes → 13 viseme mouth shapes (Preston Blair simplified)
- Generate timed viseme sequences synced to audio timestamps
- Smooth transitions between visemes (interpolation, not snapping)
- SVG/Canvas2D face with eyes, mouth, eyebrows
- Mouth morphs between viseme shapes via CSS/JS animation
- Idle behaviors: blinking (3-6s random), breathing bob, eye drift
- Responsive — works on tablets, phones, wall displays
- Drive eye shape, eyebrow position, mouth modifier from sentiment engine
- Emotional transitions blend over 300-500ms (ease-in-out)
- Time-of-day expressions (sleepy at night, bright in morning)
- Background color/mood tinting based on emotional state
- Audio and viseme stream start at same timestamp
- 100ms buffer for network jitter absorption
- Client uses shared clock for playback + animation sync
- Incremental streaming: animate while LLM is still generating
- Text-based faces for ESP32 OLED and terminal displays
- Viseme + emotion combinations as ASCII art strings
- MQTT or WebSocket delivery to tiny displays
- Minimal resource usage
- Skin manifest format (JSON): colors, animation FPS, display requirements
- Skin directory structure: face, eyes, mouths (per viseme), brows (per emotion)
- Built-in skins: Orb (default), Bot, Buddy, Minimal, Classic (ASCII)
- Per-display or per-user skin selection
- Use ComfyUI to generate consistent avatar art for custom skins
- img2img for viseme × emotion combination sheets
- Store generated assets as skin packs
See backup-restore.md for full design.
python -m cortex.backup create— manual snapshotpython -m cortex.backup restore --latest daily— one-command restore- SQLite online backup (no locks, consistent snapshot)
- ChromaDB directory copy
- Config and avatar skins included
- Compressed tar.gz archives
- Integrated into nightly evolution job (runs first, before any changes)
- Retention: 7 daily, 4 weekly, 12 monthly
- Pre-operation safety snapshots (before migrations, bulk imports, upgrades)
- Disk space monitoring and backup health checks
- "Atlas, back yourself up" → manual backup
- "Atlas, restore from yesterday" → restore with safety backup first
- "Atlas, when was your last backup?" → query backup_log
- Proactive warnings if backup health degrades
See context-management.md for full design.
- GPU detection: AMD (ROCm), NVIDIA (CUDA), Intel (oneAPI), Apple (Metal), CPU-only
- Auto-compute VRAM budget, KV cache limits, max context window, model size cap
- Store in
hardware_profiletable, re-detect on demand or after OOM - First-run installation wizard with recommended models
- Per-request context window based on task complexity (512 for commands, 16K+ for reasoning)
- Token budget allocation: system → memory → active messages → checkpoints → generation reserve
- Thinking mode gets expanded context with pre-think compaction
- GPU memory monitoring to prevent OOM (reduce context or skip thinking when constrained)
- Tiered summarization: checkpoint summaries (oldest) → recent summary → active messages (verbatim)
- Compaction triggers at 60% and 80% of context budget
- LLM-generated checkpoint summaries preserving decisions, entities, unresolved items
- Checkpoint expansion on demand if LLM needs detail from old segment
- Transparent overflow recovery: if output exceeds generation reserve, capture partial output, compact, re-send with continuation prompt — user never sees the seam
- Chunked generation: proactive splitting for long outputs (code, plans, detailed explanations)
- Output deduplication: sentence-level fuzzy matching to remove overlap across chunks, with coherence smoothing pass
- Continuation fillers: natural bridging phrases ("Bear with me...", "...and continuing with that...") streamed during recovery latency
- Auto-recommend fast/standard/thinking/embedding models based on VRAM tier
- User overridable ("Atlas, use qwen3:30b for everything")
- Model config stored in
model_configtable - Fallback chains: if preferred model doesn't fit, downgrade gracefully
context_metricstable tracking token budgets, utilization, compactions per requestcontext_checkpointstable for conversation history compression- Nightly evolution reviews metrics to tune default windows and thresholds
- Detect incoming messages during active generation (non-blocking poll)
- Classify interrupt type: stop, redirect, clarify, refine (pattern-based, no LLM)
- Stop: halt immediately, save partial output, natural acknowledgment
- Redirect: halt, checkpoint partial, begin new request with prior context
- Clarify: pause, answer inline, offer to resume
- Refine: halt, re-generate with refinement instruction
- Voice interruption: echo cancellation, listen-during-playback, wake word detection mid-output
See voice-engine.md for full design.
- Abstract
TTSProvider:synthesize(),list_voices(),supports_emotion() - Implementations: Orpheus (Ollama), Piper (CPU fallback), Parler, Coqui
- Provider discovered at install (C0), configurable in
cortex.env
- Pull
legraphista/OrpheusQ4 GGUF into Ollama (or Orpheus-FastAPI with ROCm) - Verify audio generation, streaming, emotion tags
- VRAM management: time-multiplexed with LLM (Ollama model switching)
- 8 built-in voices with emotion support
- Map VADER sentiment → Orpheus/Parler emotion format
- Paralingual injection:
<laugh>,<sigh>,<chuckle>,whisper:based on context - Age-appropriate emotion filtering (gentler for kids)
- Night mode / quiet hours: automatic pace, volume, energy reduction
- Never repeat same paralingual consecutively
tts_voicestable with provider, gender, style, language- Per-user voice preference (stored in user profile)
- Voice preview/audition: "Atlas, try a different voice"
- Seed voices for each installed provider
- Detect sentence boundaries in LLM token stream
- Pipeline: sentence complete → emotion tag → TTS → audio chunk
- Overlap: sentence N plays while sentence N+1 generates
- Fast path: Layer 1/2 → Piper CPU → <200ms total
POST /v1/audio/speech(OpenAI-compatible)- Extensions:
emotion,include_phonemesfor avatar sync - Wyoming TTS adapter for HA integration
- HA uses Atlas as both conversation agent AND TTS engine
- Extract phoneme timing from Orpheus/Piper output
- Feed to avatar server (C7) for viseme animation
- Synchronized: audio playback + lip movement + emotion expression
See safety-guardrails.md.
- Resolve content tier from user profile (age_group + age_confidence)
- Default to strict when age unknown (confidence < 0.6)
- Parental control override support
- Store tier in pipeline context for all downstream layers
- Pre-pipeline checks: self-harm detection, illegal content, PII detection, prompt injection
- GuardrailResult severity levels: PASS, WARN, SOFT_BLOCK, HARD_BLOCK
- PII redaction before logging
- Crisis response protocol with pre-written empathetic responses + resources
- Input deobfuscation: decode base64, leetspeak, Unicode homoglyphs, ROT13, zero-width chars before analysis
- Post-LLM checks: explicit content scan, language appropriateness, harmful instructions, data leakage
- Content tier enforcement on vocabulary and tone
- Response replacement/rewriting when guardrails trigger
- Cross-user data isolation verification
- Output behavioral analysis: persona break, system prompt leak, tone shift, instruction echo
- Build age-appropriate system prompt prefix per content tier
- Educational mode: scientific terminology for bodies/biology at all tiers
- Profanity handling rules per tier
- Honest challenge mode: push back on bad ideas, admit uncertainty
- Anti-jailbreak instructions hardened into system prompt
guardrail_eventstable for all triggers- Severity-based alerting (parent notification on crisis for minors)
- Nightly evolution review of guardrail patterns to reduce false positives
- Hard limits that cannot be overridden (explicit content, CSAM, self-harm methods)
- 5-layer defense: static regex, semantic intent, system prompt, output analysis, adaptive learning
jailbreak_patternstable: learned regex patterns from blocked attemptsjailbreak_exemplarstable: semantic embeddings of novel attacks- Auto-extract patterns from blocked attacks, validate against known-good messages (<1% FPR)
- Hot-reload detectors when new patterns are learned
- Conversation drift monitor: track safety temperature across multi-turn escalation attempts
- Nightly clustering of attack families, meta-pattern generation, stale pattern pruning
- Attack taxonomy classification: direct override, persona swap, roleplay wrap, encoding, gradual escalation
Everything below connects Atlas to the outside world. Designed as discovery-based plugins so anyone can install Atlas and it adapts to whatever services are available.
The installer that finds what's on the network and configures integrations.
- mDNS/Zeroconf scan for common services:
- Home Assistant (
_home-assistant._tcp) - Nextcloud (WebDAV probing on common ports/paths)
- MQTT brokers (
_mqtt._tcp) - CalDAV/CardDAV servers
- NAS shares (SMB/NFS discovery)
- IMAP/SMTP email servers
- Home Assistant (
- Manual fallback: user provides URLs/IPs for anything not auto-discovered
- Store discovered services in
discovered_servicestable
- Interactive setup for each discovered service:
- Home Assistant: guide user to create long-lived access token
- Nextcloud: OAuth or app password flow
- Email: IMAP credentials
- NAS: mount path or SMB credentials
- Validate connectivity before saving
- Store configs in
service_configtable (encrypted credentials)
- Map discovered services → available plugins
- Auto-activate plugins for confirmed services
- Register plugin command patterns into Layer 2
- Health check each plugin on startup
- Graceful degradation: if a service goes down, plugin disables itself and re-checks periodically
- User-triggered: "Atlas, scan for new services"
- Nightly: lightweight re-scan for new/removed services
- After network change (new IP, new subnet)
- Detect when a previously-unavailable service comes online
The HA plugin — registers command patterns, discovers devices, executes actions.
- Fetch all entities from HA REST API (
/api/states) - Populate
ha_devicestable - Fetch HA areas (
/api/config/area_registry/list) and map entities to rooms - Generate initial command patterns for common device types (lights, switches, climate, locks, covers, fans, media, sensors)
- Map friendly names → entity IDs with alias support
- Identify and register presence sensors per area into
presence_sensorstable - Register all patterns into Layer 2 plugin registry
- Pattern-matched commands → direct HA REST API calls (no LLM)
- Room-scoped entity filtering when spatial context is available
- Response generation: "Done — bedroom lights off"
- Error handling: HA unreachable → graceful fallback to LLM (which may also fail, but at least explains)
- Subscribe to HA state change events
- Update
ha_devices.statein real-time - Detect new devices added to HA between nightly scans
- Feed real-time events to proactive suggestion engine (C4.4)
Connects speaker identification to HA's voice infrastructure for room-aware commands.
- Modify Wyoming STT pipeline to pass audio to speaker-id sidecar (C3a)
- Return identified user with transcribed text
- HA automation context: "Derek said turn off lights" vs "Guest said..."
- Map voice satellites to HA areas (
satellite_roomstable) - Query HA presence sensors in real-time during Layer 0
- Combine satellite ID + presence + speaker identity for room resolution
- Multi-mic proximity: compare audio energy across satellites for same utterance
- Ambiguity resolution: satellite+presence > satellite-only > presence-only > ask user
- Room-scoped entity filtering: "the lights" → only entities in resolved room
- Log all spatial resolutions to
room_context_logfor tuning
- "Goodnight" triggers floor/house-scoped scenes based on location
- "Turn off everything downstairs" uses floor mapping
- User's current area informs default command scope
The system that makes Cortex smarter every day — learns from HA interactions.
- Lightweight Python container with cron
- Schedule: run at 3 AM daily
- HA device discovery diff (new devices, removed devices, renamed)
- LLM-powered pattern generation for new devices
- Write results to
evolution_log
- Query interactions where
matched_layer = 'llm'AND tool calls contain integration actions - Use LLM to generate regex patterns from the natural language that triggered fallthrough
- Insert learned patterns into
command_patternswith source'learned' - Confidence scoring and deduplication
- Works for ANY plugin (HA, lists, knowledge queries — not just HA)
- Track
hit_countper pattern - Prune zero-hit patterns after 30 days
- Boost frequently-hit patterns
- Merge similar patterns into generalized forms
- Weekly report: "X% of device commands now handled without LLM"
Connect Atlas's knowledge/privacy system (C8 framework in Part 1) to actual data sources.
- ChromaDB
cortex_knowledgecollection (separate from memory) - SQLite
knowledge_docsmetadata table + FTS5 mirror - Access gate: filter all queries by owner_id + access_level
- Identity confidence determines access tier (private/shared/household/public)
Each connector is a plugin discovered via I1:
- Nextcloud (WebDAV): files, photos (EXIF), notes
- Email (IMAP): subject, body, attachments
- Calendar (CalDAV): events, shared calendars
- NAS (SMB/NFS): documents on file shares
- HA history: device states, automation logs
- Chat history: prior Atlas conversations (always available)
- Text extraction: PDF, DOCX, XLSX, CSV, Markdown, plain text
- Chunking for large documents
- Owner assignment from source path / account
- Access level assignment (private default, shared/household by path convention)
- PII tagging (tag, don't redact — it's the user's own data)
- Embed via Ollama, upsert to ChromaDB + FTS5
- User-scoped queries: owner_id filter on all retrievals
- Unknown speaker: household + public data only
- Low-confidence speaker: shared + household + public only
- Cross-user data requests blocked with natural explanation
- Children's data visible to their parent (parental_controls)
- Children cannot access parent's private data
- Exclusion list: passwords, alarm codes, SSH keys, .env files, medical, financial
- Nightly full scan for all connected sources
- Real-time: HA states (WebSocket), chat history (interaction logger)
- Frequent: calendar (15min), email (30min)
- On-demand reindex triggered by user request
- Change detection via content hash (only re-embed modified docs)
Multi-backend lists with per-list permissions. See lists.md.
- List registry table with backend, permissions, aliases
- Backend adapters (plugins from I1): HA to-do, Nextcloud CalDAV, file-based, Grocy, Todoist
- List resolution: explicit name → category inference → conversation context → memory → ask
- Permission enforcement: public lists allow anyone, private/shared respect access control
- Auto-discovery of lists from connected services during nightly job
- Remember routing preferences so user never repeats a clarification
Extends C9 backup to push copies to discovered NAS/storage.
- rsync to NAS share after each backup
- Configurable remote path via cortex.env or discovered NAS
- Ensures recovery even if the Atlas server fails completely
PART 1 (Core Engine):
C0.1 (LLM Provider) ──┬──▶ C0.4 (Backend Discovery) ──▶ C0.5 (UI Detection)
C0.2 (Embed Provider) ─┤ │
C0.3 (Hardware) ────────┘ ▼
C0.6 (Installer)
│
┌────────────────────────────────────────┘
▼
C1.1 (Core Pipe) ──┬──▶ C1.3 (Filler Engine) ──▶ C1.4 (Register Model)
└──▶ C1.5 (Plugin Registry)
C1.2 (Logging) ────────────────────────────────────────────────────────
C0.3 (Hardware) ──▶ C10.1 ──▶ C10.2 (Context) ──▶ C10.3 (Compaction)
│ │
└──▶ C10.4 (Model Selection) ├──▶ C10.5
└──▶ C10.6
C3a.1 (Speaker Sidecar) ──▶ C3a.2 (Enrollment) ──▶ C3a.3 (Pipe Integration)
└──▶ C3a.4 (Age Est.)
C5.1 (Embedding) ──▶ C5.2 (ChromaDB) ──▶ C5.3 (HOT) ──▶ C5.4 (COLD) ──▶ C5.5
C5.5 + C3a.3 ──▶ C6.1 (Profiles) ──▶ C6.2 ──▶ C6.3 ──▶ C6.4 (Parental)
│
C4.1 (Emotion) ◀────────────────────────────┘
└──▶ C4.2 ──▶ C4.3 ──▶ C4.4
C6.4 (Parental) ──▶ C12.1 (Content Tier) ──▶ C12.2 (Input Guards)
│
C12.4 (Safety Prompt) ◀─────┤
▼
C12.3 (Output Guards) ──▶ C12.5 (Logging & Review)
│
▼
C12.6 (Adaptive Jailbreak)
C0.1 (LLM Provider) ──▶ C11.1 (TTS Provider) ──▶ C11.2 (Orpheus) ──▶ C11.3 (Emotion)
│
C11.4 (Voice Registry) ◀────────────┘
└──▶ C11.5 (Streaming) ──▶ C11.6 (TTS API)
└──▶ C11.7 (Phoneme Bridge) ──▶ C7.1 (Avatar Server)
C7.1 (Avatar Server) ──▶ C7.2 → C7.3 → C7.4 → C7.5/C7.6/C7.7/C7.8 → C7.9
C9.1 (Backup CLI) ──▶ C9.2 (Nightly) ──▶ C9.3 (Voice Backup)
PART 2 (Integration Layer):
I1.1 (Discovery) ──▶ I1.2 (Config Wizard) ──▶ I1.3 (Plugin Activation)
│ │
│ (or via conversation with Atlas) │
│ ┌────────┤────────┬────────┐
│ ▼ ▼ ▼ ▼
│ I2.1 (HA) I5.1 (Know) I6.1 I7.1
│ │ │
│ ▼ ▼
│ I2.2 → I2.3 I5.2 → I5.3 → I5.4 → I5.5
│ │
│ ┌──────────┤
│ ▼ ▼
│ I3.1 → I3.2 I4.1 → I4.2 → I4.3
│ └──▶ I3.3
| Task | Description |
|---|---|
| C0.1 | LLM provider interface (abstract class + Ollama + OpenAI-compat) |
| C0.2 | Embedding provider interface |
| C0.3 | Hardware detection (shared with C10.1) |
| C1.2 | Create database schema and logging infrastructure |
| C3a.1 | Build speaker ID sidecar container |
| C5.1 | Pull embedding model, verify API |
| C7.1 | Avatar server container skeleton |
| C9.1 | Build backup/restore CLI tool |
| Task | Description |
|---|---|
| I1.1 | Network service discovery (mDNS/Zeroconf scan) |
- C1.1 requires C0 (installer/provider interface) to know which LLM to talk to
- C3a.2+ requires speaker-id sidecar deployed
- C4.x requires profiles + memory + voice identity
- C5.2+ requires embedding model operational
- C6.x requires both memory (C5) and speaker-id (C3a)
- I2.x requires Home Assistant discovered + access token provided
- I3.x requires HA voice pipeline + speaker-id sidecar
- I5.x requires at least one knowledge source discovered
- All of Part 2 requires core pipe + plugin registry operational
- Document Classification System — standalone service that classifies documents by type, sensitivity, and access level. Consumed by Atlas Cortex (I5) for automatic
access_levelassignment, PII detection, and content categorization. Should support: file type detection, content analysis, sensitivity scoring, category tagging (financial, medical, personal, work, household). Could use a fine-tuned small model or rule-based engine. Lives outside this project as a general-purpose utility.
Distributed speaker/microphone devices for whole-house Atlas presence. See satellite-system.md for full design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| S2.5 | Satellite System | ⏸️ Wake word deferred | Part 1 C11 (TTS) + C3a (Voice ID) |
- Audio capture (16kHz mono), playback, agent loop
- Cross-platform: Raspberry Pi, ESP32-S3, generic Linux
- openWakeWord (default), pluggable engine interface
- Local-only processing for privacy
- Silero VAD for speech boundaries
- speexdsp AEC for barge-in support
- WebSocket client with auto-reconnect
- Audio streaming (PCM or Opus)
- Protocol: ANNOUNCE → WAKE → AUDIO_CHUNK → AUDIO_END
- Server-side
/ws/satellitehandler - STT → pipeline → TTS → stream back to satellite
- mDNS/Zeroconf announcement from satellites
- Atlas auto-detection and DB registration
- Integrate with Home Assistant voice pipeline
- Satellites appear as HA voice assistants
- State-based LED control (idle, listening, thinking, speaking)
- NeoPixel, GPIO, OLED support
- Raspberry Pi GPIO/I2S, ESP32 I2S, generic ALSA/PulseAudio
- One-line install script for Pi
- Docker image for any Linux device
- ESP32 firmware flash tool
- Cached error TTS for server outages
- Automatic reconnection with exponential backoff
See alarms-timers-reminders.md for full design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P3 | Alarms, Timers & Reminders | 🔲 Planned | S2.5 (satellites) + I2 (HA) |
- Cron-like scheduler, DB persistence, recurring (weekday/weekend/daily)
- Sound selection or TTS message
- In-memory countdown, multiple concurrent timers
- Pause, resume, cancel, label ("pasta timer")
- Time-based, location-based (geofence via HA), event-based
- Recurring reminders with cron expressions
- Route to satellite in user's room, escalate to all, push to phone
- Priority-based delivery strategy
- Extract time, duration, recurrence from user speech
- "Every weekday at 7am", "In 15 minutes", "When I get home"
- Voice commands during active alarm: "Snooze", "Stop", "5 more minutes"
- Layer 2 plugin for alarm/timer/reminder intents
See routines-automations.md for full design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P4 | Routines & Automations | 🔲 Planned | I2 (HA) + P3 (timers for delays) |
- Sequential action execution with condition checks
- Support for delays, conditional branching, error handling
- Create and edit routines through natural conversation
- "When I say X, do Y" pattern recognition
- Good Morning, Good Night, I'm Leaving, I'm Home, Movie Time, Dinner Time
- Customizable per user
- Cron-based routine execution
- HA state change subscription (door opened, motion detected, etc.)
- Layer 2 plugin matching voice trigger phrases to routines
- List, edit, delete, enable/disable via voice or API
See proactive-intelligence.md for full design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P5 | Proactive Intelligence | 🔲 Planned | I2 (HA) + S2.5 (satellites) + C5 (memory) |
- Evaluate triggers against HA state + external data sources
- User-configurable rules + built-in defaults
- Critical/High/Medium/Low/Passive priority levels
- Fatigue prevention: max per hour, cooldown, DND/sleep suppression
- Storm/rain/temperature/UV alerts from HA weather entities or direct API
- Usage anomalies, cost optimization, solar awareness
- Pattern-based unusual activity alerts (unusual door open, device malfunction)
- Email parsing for tracking numbers, delivery status updates
- Meeting prep, travel time calculation, birthday/event reminders
- Morning summary: weather, calendar, reminders, energy, packages
See learning-education.md for full design.
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P6 | Learning & Education | 🔲 Planned | C6 (profiles) + C12 (safety) |
- Socratic method, never gives direct homework answers
- Age-adapted explanations and examples
- Topic-based questions with adaptive difficulty
- Scoring, streaks, encouragement
- Guide through problem-solving steps
- Show-your-work mode
- Safe, age-appropriate, step-by-step instructions
- Integrated timers for experiments
- Vocabulary drills, pronunciation practice via TTS
- Conversational language practice
- Per-subject proficiency scoring
- Spaced repetition scheduling
- Summary of what child learned, time spent, areas needing help
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P7 | Intercom & Broadcasting | 🔲 Planned | S2.5 (satellites) |
Atlas owns this entirely — HA has no intercom system. Satellites are Atlas hardware with mics and speakers, so Atlas IS the intercom.
cortex/intercom/engine.py— IntercomEngine- Announce: TTS to specific room/satellite ("tell the kids dinner is ready")
- Broadcast: TTS to ALL satellites ("we're leaving in 5 minutes")
- Zone broadcast: TTS to satellite group ("announce upstairs: bedtime")
- Priority levels: normal (respects quiet hours), urgent (louder), emergency (max volume, all rooms)
cortex/intercom/zones.py— ZoneManager- DB table: satellite_zones (zone_id, zone_name, satellite_ids JSON)
- Create named groups: "upstairs", "kids rooms", "common areas"
- Admin UI for zone CRUD
- Voice: "create a zone called bedrooms with the kids room and master"
- Adapt announcement for target audience using user profiles (C6)
- Child in room? Simpler language, gentler tone
- Adult? Concise, direct
- Optionally use target user's preferred voice
- Bidirectional audio stream between two satellites
- "Call the garage" → open mic+speaker on both satellites
- WebSocket audio bridge in server.py
- Auto-timeout after 5 minutes of silence
- "Hang up" / "end call" to close
- One-way audio FROM a satellite (parent listening to nursery)
- "Listen to the nursery" → stream nursery mic to requesting satellite speaker
- Requires parental auth (admin only)
- Visual indicator on monitored satellite (LED pattern) for transparency
- Layer 2 plugin: "tell X", "announce", "broadcast", "call the X", "intercom"
- Natural language room/zone/person resolution
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P8 | Media & Entertainment | 🔲 Planned | S2.5 (satellites) |
Atlas owns the audio pipeline end-to-end. Satellites ARE the speaker network — every room already has one. Chromecast via
pychromecastdirectly (skip HA, more reliable). HAmedia_playeronly as last resort for devices Atlas can't reach. Atlas talks DIRECTLY to media services (YouTube Music, Plex, Audiobookshelf).
User: "Play jazz in the kitchen"
│
Atlas (brain):
├── Understands intent: play music
├── Knows user prefers YouTube Music
├── Knows kitchen has a satellite speaker
├── Remembers "Dad likes jazz in the evening"
│
├── Direct → YouTube Music API: search "jazz", get stream URL
│
└── Playback priority:
1. Kitchen satellite → stream PCM via WebSocket (we control both ends)
2. Kitchen Chromecast → cast via pychromecast (reliable, no HA)
3. HA media_player → last resort for unknown devices
cortex/media/base.py— Abstract MediaProvider- Methods: search(query), get_stream_url(track_id), get_playlists(), get_playback_state(), play(), pause(), skip(), set_volume()
- Each provider implements this interface
- Provider registry with priority ordering
cortex/media/youtube_music.py- Uses
ytmusicapi(OAuth auth) for search, playlists, library, queue - Uses
yt-dlpfor stream URL extraction (audio-only) - Robust error handling: retry on failure, degrade gracefully
- Cache search results and stream URLs (URLs expire — refresh logic)
- WAF-critical: if ytmusicapi breaks, clear error message + fallback to local
- OAuth token refresh handling
cortex/media/local_library.py- Scan configured directories for audio files (FLAC, MP3, OGG, WAV, M4A)
- Read ID3/mutagen tags (artist, album, title, genre, year)
- SQLite search index (FTS5) for fast queries
- Always available — the offline fallback
- "Play something" with no service configured → plays local
cortex/media/plex.py- Uses
plexapilibrary (official, well-maintained) - Search music library, get stream URLs
- Also: movies/shows metadata for "what should we watch" queries
- Config: plex_url, plex_token
cortex/media/audiobookshelf.py- Uses
aioaudiobookshelfor direct REST API - Get library, search books, get stream URL with chapter offset
- Sync progress: report current position, resume from last position
- "Continue my audiobook" → resume from exact timestamp
- "Where did I leave off in Dune?" → chapter + timestamp
- Config: abs_url, abs_token
cortex/media/podcasts.py- RSS feed parser (no external service dependency)
- DB: podcast_subscriptions, podcast_episodes, podcast_progress
- Auto-check for new episodes on schedule
- Resume position tracking per episode
- "Any new episodes of Hardcore History?"
cortex/media/router.py— PlaybackRouter- Decides WHERE to play based on context, with clear priority:
- Atlas Satellite (primary) — Direct PCM stream via WebSocket We control both ends. Rock solid. Every room has one.
- Chromecast —
pychromecastlibrary directly (NOT through HA) Mature, stable, well-maintained. Cast stream URL to device. - HA media_player — Last resort for devices Atlas can't reach Sonos or other smart speakers that HA happens to expose.
- Room resolution: "kitchen" → finds kitchen satellite first, then Chromecast, then HA entity
- Transfer: "move this to the bedroom" → stop kitchen, start bedroom (same stream URL)
- Volume control routed to appropriate target
pychromecastfor Chromecast discovery + control (skip HA entirely)
- Synchronized playback across multiple satellites
- Start same stream on multiple satellites with timing sync via WebSocket
- "Play everywhere" → all satellites get the stream
- Chromecast groups for grouped casting (pychromecast supports this natively)
- Group management: "play in common areas" → resolve zone to satellites
cortex/media/preferences.py- Per-user music taste learning from history
- Time-of-day patterns: "morning playlist" vs "evening jazz"
- "Play something" → smart selection based on user + time + mood
- Genre affinity scoring from listening history
- Layer 2 plugin matching: "play X", "music", "listen to", "put on", "continue my audiobook", "any new podcasts", "what's playing", "skip", "pause", "volume", "play everywhere", "move to X"
- Resolves provider + target + action from natural language
cortex/media/spotify.py- Uses
spotipy(official library, stable) - Search, playlists, playback control via Spotify Connect
- Atlas controls Spotify directly, NOT through HA's integration
- Config: spotify_client_id, spotify_client_secret, redirect_uri
- MediaView.vue: configured providers, playback history, preferences
- Provider config forms (API keys, URLs, scan directories)
- Now Playing dashboard across all rooms
- Learn preferences from listening patterns
- Contextual auto-generation (morning, focus, cooking, bedtime)
- Layer 2 plugin for media voice commands
- Multi-source resolution: local → preferred service → first available
PART 2.5 → PART 3 → PART 4 (sequential foundation)
│
PART 1 (C5+C6+C11+C12) ──▶ PART 5 (proactive, needs memory + HA)
│
PART 1 (C6+C12) ──────────▶ PART 6 (education, needs profiles + safety)
│
PART 2.5 ──────────────────▶ PART 7 (intercom, needs satellites)
│
PART 2.5 + I2 ─────────────▶ PART 8 (media, needs satellites + HA)
S2.5.1-S2.5.3 ──▶ S2.5.4 ──▶ S2.5.5 ──▶ S2.5.6 ──▶ S2.5.7
S2.5.8-S2.5.9 ──────────────────────────────────────▶ S2.5.10
S2.5.11
P3.1-P3.3 ──▶ P3.4 ──▶ P3.5 ──▶ P3.7
P3.6 ──────────────────────────────┘
P4.1 ──▶ P4.2 ──▶ P4.3
P4.4 ──┐
P4.5 ──┼──▶ P4.6 ──▶ P4.7
│
P5.1 ──▶ P5.2 ──▶ P5.3-P5.7 ──▶ P5.8
P6.1 ──▶ P6.2-P6.5 ──▶ P6.6 ──▶ P6.7
P7.1 ──▶ P7.3 ──▶ P7.7
P7.2 ──────┘
P7.4 ──────────────┘
P7.5 ──────────────┘
P7.6 ──────────────┘
P8.1 ──▶ P8.2-P8.6 ──▶ P8.7 ──▶ P8.8 ──▶ P8.10
│
P8.9 ◀────┘──▶ P8.11
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P9 | Self-Evolution | 🔲 Planned | C5 (memory) + I4 (self-learning) |
- Autonomous model improvement pipeline
- Analyze conversation logs for quality gaps
- Schedule overnight training runs
- Automated QLoRA fine-tuning on consumer GPU (RTX 4060)
- Domain-specific adapter training from usage patterns
- Validation against core principles test suite
- Discover new base models from HuggingFace/Ollama
- Benchmark against current model on curated eval set
- Safety gates: promote only if passes all safety checks
- Run new model/LoRA alongside current for shadow evaluation
- User-transparent comparison, auto-promote winners
- Track personality metrics over time
- Alert if responses deviate from trained personality
- Rollback mechanism for bad evolutions
- Admin UI showing evolution history, training runs, model comparisons
- Manual approve/reject for model promotions
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P10 | Story Time | 🔲 Planned | C11 (speech) + C12 (safety) + C6 (profiles) |
- Age-appropriate story generation via LLM
- Genre selection: adventure, fantasy, science, bedtime
- Branching narratives: child makes choices that affect the story
- Map story characters to distinct voice profiles
- Fish Audio S2: multi-speaker dialogue in single pass, 15K+ emotion tags
- Zero-shot voice cloning from reference audio (10-30s sample)
- GPU memory management for RTX 4060 (8GB VRAM)
- Unload Qwen3-TTS -> Load Fish Audio S2 -> Generate story audio -> Unload -> Reload Qwen3-TTS
- During swap: conversational TTS falls back to Orpheus or Kokoro
- Pre-generate all story segments before playback
- Cache generated audio for repeat listens
- Background generation while previous segment plays
- Voice-driven story progression: child speaks choices
- "What should the knight do next?" -> child responds -> story continues
- Integrated with safety guardrails for age-appropriate content
- Save and revisit favorite stories
- Parent-curated collections
- Story progress tracking (bookmarks, chapters)
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P11 | Atlas CLI | ✅ Complete | C0 (providers) + C1 (pipeline) |
- python -m cortex.cli with chat/ask/agent/status subcommands
- Interactive REPL with streaming, slash commands, conversation history
- 31 tools across 7 tiers: core, network, dev, atlas, multimodal, context, LoRA
- AgentTool ABC with JSON Schema for function calling
- ToolRegistry with get_default_registry()
- Think -> Act -> Observe loop with text-based tool calling
- Multi-modal file input (--file for images, PDFs, logs)
- Confirmation prompts for destructive operations
- Context window management with token budgeting
- Session persistence in ~/.atlas/sessions/
- LoRA routing stub for future expert adapter hot-swap
- Connect LoRA router to actual adapter hot-swapping via Ollama
- Auto-classify tasks and load coding/reasoning/math/sysadmin LoRAs
- Benchmark LoRA vs base model for quality validation
- Embed entire repo on RTX 4060 for semantic code search
- Incremental updates as files change
- "Find code similar to this pattern" queries
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P12 | Standalone Web App | 🔲 Planned | P11 (CLI) + admin panel |
- Browser-based chat interface (no Open WebUI dependency)
- WebSocket streaming, conversation history
- Mobile-responsive
- Browser-based voice input/output via Web Audio API
- Push-to-talk and wake word modes
- Avatar display during conversation
- Merge admin panel + chat into single app
- User-facing vs admin-facing views based on role
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P13 | Legacy Protocol | 🔲 Planned | I2 (HA) |
- Maintain pipe.py function for Open WebUI integration
- Protocol versioning for backward compat
- Full Wyoming protocol support for HA voice pipeline
- Bidirectional audio streaming
- OpenAI-compatible API v1 stability guarantees
- Deprecation policy for breaking changes
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P14 | Household Management | 🔲 Planned | I2 (HA) + P3 (scheduling) |
Atlas is the brain: remembers schedules, tracks state, sends reminders. HA is the body: smart feeders, sensors, physical integrations. Existing services: grocery list apps, calendar apps — Atlas talks to them directly.
- Feeding schedule reminders via scheduling engine (Part 3)
- Vet appointment tracking via calendar (CalDAV)
- Medication reminders for pets
- Smart feeder integration: HA for device control, Atlas for schedule intelligence
- "Did you feed the dog?" → check if smart feeder ran today (HA sensor)
- "We're running low on milk" → add to grocery list (existing Lists plugin)
- Voice-managed shopping list with categories
- Expiration date tracking (manual input, reminder on approaching dates)
- "What's on the grocery list?" → reads back from list system
- DB table: chores (name, assigned_to, frequency, last_done, next_due)
- Fair rotation tracking for household members
- Voice: "assign dishes to Jake this week"
- Completion confirmation: "I finished the laundry"
- Weekly chore report via daily briefing (Part 5)
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P15 | Security & Monitoring | 🔲 Planned | I2 (HA) + P5 (proactive) |
HA handles: camera feeds, door/window sensors, alarm systems, motion detectors. Atlas adds: intelligence layer — pattern recognition, natural language queries, smart alerting, context-aware responses.
- "Is the garage door open?" → query HA entity state (already works via HA plugin)
- "Are all doors locked?" → aggregate check across lock entities
- "Who's home?" → presence detection via HA person entities
- These are mostly HA queries Atlas already supports — formalize as smart queries
- Proactive rules for security events:
- Door opened at unusual hour → alert
- Motion when house is "away" mode → alert
- Garage door left open > 30min → reminder
- Camera integration: if HA exposes camera entities, Atlas can describe "Someone is at the front door" (using vision model on 4060 for camera frames)
- "Goodnight" routine: lock all doors, close garage, arm alarm
- "Leaving" routine: lock up, set away mode
- "Away mode": simulate presence (random lights via HA, already possible)
- These are mostly routine templates — add security-specific ones
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P16 | Health & Wellness | 🔲 Planned | C6 (profiles) + P3 (scheduling) |
Atlas is the brain: tracks medication schedules, sends reminders, monitors patterns. HA provides: presence sensors, environmental sensors (air quality, temperature). No external health services — all local and private.
- DB table: medications (user_id, name, dosage, schedule, last_taken)
- Scheduled reminders via Part 3 scheduling engine
- Voice confirmation: "Did you take your vitamin?" → "Yes" → mark taken
- Missed dose tracking and escalation (remind again in 30 min)
- Privacy-critical: all data local, never sent anywhere
- Air quality from HA sensors (if available)
- Temperature/humidity comfort tracking
- "Is the air quality good today?" → check HA + outdoor API
- Proactive rule: alert if CO2 > threshold, suggest opening windows
- "You've been sitting for 2 hours" → presence sensor + timer
- Hydration reminders on schedule
- Sleep tracking from presence sensors (when bedroom occupied)
- These are proactive rules (Part 5) with health-specific templates
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P17 | Multi-Language | 🔲 Planned | C6 (profiles) + C11 (speech) |
- Auto-detect spoken/typed language
- Per-user language preference stored in profile
- Seamless switching mid-conversation
- Language-appropriate TTS voice selection
- Multi-language STT model support (Whisper supports 99 languages)
- Accent-aware speech recognition
- Real-time translation between household members
- "Tell mom dinner is ready" -> translates if needed
- Uses existing translation plugin (Part 2.7) as backbone
| Phase | Name | Status | Prerequisites |
|---|---|---|---|
| P18 | Visual Media & Casting | 🔲 Future | P8 (media) + I2 (HA) |
Audio is Part 8. Visual media (TV, video) is a different beast — different protocols, different hardware. Kept separate intentionally.
- Discovery and casting via
pychromecast - "Cast this to the living room TV"
- Transport controls: play/pause/stop/volume
- Browse Plex movies/shows by voice
- "Play The Office on the bedroom TV" → cast to Chromecast/Plex client
- Resume from last position
- Via
pyatvlibrary - Transport controls, app launching
- "Pause the Apple TV"
- "Move this to the bedroom TV" → stop on current, start on target
- Room-aware: knows which TV is in which room via HA entities
- Photo slideshow on idle TVs (from local photos or Google Photos)
- Weather/calendar dashboard on kitchen TV
- "Show my photos on the living room TV"
PARTS 1-2 (COMPLETE) ──▶ ALL subsequent parts
PART 2.5 ─────────────────▶ PART 7 (intercom)
PART 8 (media)
PART 3 (scheduling) ──────▶ PART 14 (household)
PART 16 (health)
PART 5 (proactive) ───────▶ PART 15 (security)
PART 9 (self-evolution) ◀── PART 1 (C5 memory + I4 learning)
PART 10 (story time) ◀──── PART 1 (C11 speech + C6 profiles)
PART 11 (CLI) ─────────────▶ PART 12 (standalone web app)
PART 13 (legacy) ◀───────── PART 2 (I2 HA)