-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Overview
Enable users to create personalized dream narratives that run entirely in the browser, with no server dependencies.
User Stories
- As a user, I want to describe my ideal dream scenario in a text box, so I can experience a personalized guided dream.
- As a user, I want my custom dreams saved locally, so I can reuse them without regenerating.
- As a user, I want the generation to work offline after initial load, so I can create dreams anywhere.
Technical Architecture
Client-Side Stack (All in-browser)
- LLM: SmolLM or similar lightweight model via WebLLM/Transformers.js
- TTS: Web Speech API or Piper TTS (WASM)
- Music: Existing procedural synthesis (already client-side capable)
- Storage: IndexedDB for saving generated dreams
Generation Pipeline
User Input → LLM (narrative sections) → Add [PAUSE] markers → TTS → Mix with music → Save to IndexedDB
Implementation Phases
Phase 1: MVP - Text-to-Narrative
- Integrate SmolLM via WebLLM for narrative generation
- Create prompt templates for dream narrative style
- UI: Simple text input + "Generate" button
- Generate narrative in sections (intro, scenes, transitions, outro)
- Auto-insert [PAUSE] markers between sections
- Display generated narrative for review/editing
Phase 2: Audio Generation
- Integrate browser TTS (Web Speech API as fallback, Piper WASM for quality)
- Port music generation to client-side (already uses numpy-like operations)
- Combine narration + music in browser (Web Audio API)
- Export as playable audio blob
Phase 3: Persistence & Polish
- Save custom dreams to IndexedDB
- Show custom dreams in library alongside pre-made dreams
- Allow editing/regenerating sections
- Share custom dreams (export/import JSON)
Key Technical Decisions
| Decision | Choice | Rationale |
|---|---|---|
| LLM Runtime | WebLLM | Best performance for in-browser LLMs |
| Model | SmolLM-360M or similar | Small enough for mobile, good enough for narratives |
| TTS | Piper WASM (primary), Web Speech API (fallback) | Quality vs. compatibility tradeoff |
| Storage | IndexedDB | Large audio file support, offline access |
Complexity Estimate
- Phase 1: Medium (2-3 weeks) - LLM integration is the main challenge
- Phase 2: High (3-4 weeks) - Audio pipeline in browser is complex
- Phase 3: Low (1 week) - Standard CRUD operations
References
- SmolLM demo: https://context-lab.com/llm-course/demos/chatbot-evolution/
- WebLLM: https://github.com/mlc-ai/web-llm
- Piper TTS: https://github.com/rhasspy/piper
Split from #115 - targeted dream content (pre-made narratives completed)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels