"The AI that teaches you to think."
ZED is a voice-first AI study assistant that uses the Socratic method to build critical thinking skills. Instead of giving answers, ZED asks guiding questions, challenges your understanding, and pushes you to master concepts through active reasoning.
ZED follows the xRx Architecture pattern, a clean separation of concerns for voice AI agents:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ZED ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββββββββββ βββββββββββββββ β
β β β β β β β β
β β INPUT βββββΆβ REASONING βββββΆβ OUTPUT β β
β β (Ears) β β (Brain) β β (Mouth) β β
β β β β β β β β
β βββββββββββββββ βββββββββββββββββββββββ βββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β Whisper β β Llama β β ElevenLabsβ β
β β (STT) β β (LLM) β β (TTS) β β
β β Groq β β Groq β β β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β MEMORY β β
β β (Knowledge) β β
β β β β
β β βββββββββββββ β β
β β β ChromaDB β β β
β β β (RAG) β β β
β β βββββββββββββ β β
β β β β
β β βββββββββββββ β β
β β β Canvas β β β
β β β (ETL) β β β
β β βββββββββββββ β β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Layer | File | Responsibility |
|---|---|---|
| INPUT | ears.py |
Captures audio, transcribes speech β text (Groq Whisper) |
| REASONING | brain.py |
Socratic State Machine, RAG retrieval, LLM streaming (Groq Llama) |
| OUTPUT | mouth.py |
Converts text β speech, plays audio (ElevenLabs) |
| MEMORY | knowledge.py |
Vector embeddings, ChromaDB, semantic search |
| ETL | canvas_sync.py |
Downloads PDFs from Canvas LMS, organizes by course |
ZED implements a 3-State Socratic Tutor that adapts to the user's understanding:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SOCRATIC STATE MACHINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ β
β β β β
β β STATE 1: GYM ββββββββββββββββββββββββββββββββββββ β
β β (Learning) β β β
β β β β β
β ββββββββββ¬βββββββββ β β
β β β β
β β User gets it right β β
β βΌ β β
β βββββββββββββββββββ β β
β β β β β
β β STATE 2: COOL- β β β
β β DOWN (Validate) β β β
β β β β β
β ββββββββββ¬βββββββββ β β
β β β β
β β Immediately pivot β β
β βΌ β β
β βββββββββββββββββββ β β
β β β User struggles β β
β β STATE 3: ββββββββββββββββββββββββββββββββββββ β
β β CHALLENGE β β
β β (Edge Cases) β β
β β β β
β ββββββββββ¬βββββββββ β
β β β
β β "Thank you ZED" / "I'm done" β
β βΌ β
β βββββββββββββββββββ β
β β [HANGUP] β β
β β Session End β β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| State | Trigger | ZED's Action |
|---|---|---|
| GYM | User is wrong/learning | Ask scaffolding questions, reference slides |
| COOL-DOWN | User answers correctly | Validate briefly ("Exactly."), then immediately pivot |
| CHALLENGE | User shows understanding | Push with edge cases ("What if variance is 0?") |
| Exception: Confused | "I don't understand" | Brief explanation (2-3 sentences), then check understanding |
| Exception: Tired | "I'm done", "Thank you ZED" | Acknowledge, validate session, yield [HANGUP] |
ZED operates like a smart speaker with ASLEEP/AWAKE states:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WAKE WORD STATE MACHINE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ "Hey ZED" βββββββββββββ β
β β β ββββββββββββββββββββββββββΆ β β β
β β ASLEEP β β AWAKE β β
β β π΄ Ignore β ββββββββββββββββββββββββββ β π’ Listenβ β
β β β [HANGUP] / Timeout β β β
β βββββββββββββββ βββββββββββββ β
β β
β β’ WebSocket stays open β
β β’ Only state changes, not connection β
β β’ Frontend receives status updates β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Technology | Purpose | Why |
|---|---|---|
| FastAPI | WebSocket server | Async, fast, modern Python |
| Groq | LLM & STT inference | Fastest inference (Llama 3.3 70B, Whisper) |
| ElevenLabs | Text-to-Speech | Natural, low-latency voice |
| ChromaDB | Vector database | Local, lightweight, persistent |
| Sentence-Transformers | Embeddings | all-MiniLM-L6-v2 for semantic search |
| PyMuPDF | PDF parsing | Fast, accurate text extraction |
| canvasapi | Canvas LMS integration | Download course materials automatically |
| Technology | Purpose | Why |
|---|---|---|
| React 18 | UI framework | Component-based, hooks |
| Vite | Build tool | Fast HMR, modern bundling |
| TypeScript | Type safety | Catch errors at compile time |
| Tailwind CSS | Styling | Utility-first, rapid prototyping |
| Framer Motion | Animations | Declarative, performant |
| Web Audio API | Voice Activity Detection | Browser-native VAD |
| MediaRecorder API | Audio capture | Browser-native recording |
| Component | Technology |
|---|---|
| Protocol | WebSocket (real-time bidirectional) |
| Audio Format | WebM/WAV β MP3 |
| Vector Store | ChromaDB (SQLite backend) |
| Session State | In-memory (per WebSocket) |
groq/
βββ backend/
β βββ server.py # WebSocket server, wake word gatekeeper
β βββ app/
β β βββ main.py # CLI orchestrator (terminal mode)
β β βββ services/
β β βββ brain.py # Socratic State Machine, LLM
β β βββ knowledge.py # RAG pipeline, ChromaDB
β β βββ ears.py # Audio recording, Whisper STT
β β βββ mouth.py # ElevenLabs TTS, audio playback
β β βββ canvas_sync.py # Canvas LMS PDF downloader
β βββ data/
β β βββ chroma_db/ # Vector embeddings (persistent)
β β βββ downloads/ # PDFs organized by course
β β βββ wake_words/ # Porcupine wake word models
β βββ requirements.txt
β
βββ frontend/
β βββ src/
β β βββ App.tsx # Main app, phase management
β β βββ components/
β β β βββ MainScene.tsx # Voice UI, conversation panel
β β β βββ LoginScene.tsx # Canvas login
β β βββ hooks/
β β βββ useVoiceInput.ts # VAD, WebSocket, audio handling
β βββ index.html
β βββ package.json
β
βββ README.md
- Python 3.11+
- Node.js 18+
- API Keys:
GROQ_API_KEY,ELEVEN_API_KEY,CANVAS_API_KEY(optional)
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Create .env file
cat > .env << EOF
GROQ_API_KEY=your_groq_key
ELEVEN_API_KEY=your_elevenlabs_key
CANVAS_API_KEY=your_canvas_key # Optional
CANVAS_API_URL=https://your-institution.instructure.com
EOF
# Run server
python server.pycd frontend
npm install
# Create .env file
cat > .env << EOF
VITE_WS_URL=ws://localhost:8000/ws
VITE_API_URL=http://localhost:8000
EOF
# Run dev server
npm run dev- Open
http://localhost:5173in your browser - Allow microphone access
- Say "Hey ZED" to wake up
- Ask your question
- Say "Thank you ZED" or "I'm done" to end
| Variable | Required | Default | Description |
|---|---|---|---|
GROQ_API_KEY |
β | - | Groq API key for Whisper + Llama |
ELEVEN_API_KEY |
β | - | ElevenLabs API key for TTS |
ELEVEN_VOICE_ID |
β | 21m00Tcm4TlvDq8ikWAM |
ElevenLabs voice (Rachel) |
CANVAS_API_KEY |
β | - | Canvas LMS API token |
CANVAS_API_URL |
β | - | Canvas instance URL |
GROQ_MODEL |
β | llama-3.3-70b-versatile |
LLM model |
GROQ_TEMPERATURE |
β | 0.4 |
LLM temperature |
RAG_THRESHOLD |
β | 0.35 |
Minimum relevance score |
SKIP_RAG |
β | false |
Bypass RAG for testing |
| Variable | Required | Default | Description |
|---|---|---|---|
VITE_WS_URL |
β | ws://localhost:8000/ws |
WebSocket endpoint |
VITE_API_URL |
β | http://localhost:8000 |
REST API endpoint |
User speaks "What is variance?"
β
βΌ
βββββββββββββββββββ
β Browser β
β MediaRecorder βββββ WebM audio blob βββββΆ WebSocket
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β server.py β
β (Gatekeeper) βββββ is_awake? βββββΆ If FALSE, ignore
βββββββββββββββββββ
β TRUE
βΌ
βββββββββββββββββββ
β Groq Whisper β
β (STT) βββββ "What is variance?" βββββΆ
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β brain.py β
β (Reasoning) β
β β
β 1. RAG search βββββ ChromaDB βββββΆ [relevant chunks]
β 2. Build promptβ
β 3. Stream LLM βββββ Groq Llama βββββΆ tokens
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β ElevenLabs β
β (TTS) βββββ MP3 audio βββββΆ WebSocket
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Browser β
β Audio.play() βββββ π ZED speaks
βββββββββββββββββββ
- Socratic, not Spoon-feeding: ZED asks questions, never gives direct answers
- Voice-first: Optimized for spoken interaction, not typing
- Low Latency: Streaming tokens + TTS for instant feedback
- Context-aware: RAG pulls relevant course materials
- Relentless: Keeps pushing until you truly understand
- Graceful: Respects when you're done, validates your effort
MIT
Built with π§ and β for students who want to think, not just memorize.