Skip to content

Latest commit

Β 

History

History
362 lines (302 loc) Β· 17.6 KB

File metadata and controls

362 lines (302 loc) Β· 17.6 KB

ZED β€” Socratic AI Study Coach

"The AI that teaches you to think."

ZED is a voice-first AI study assistant that uses the Socratic method to build critical thinking skills. Instead of giving answers, ZED asks guiding questions, challenges your understanding, and pushes you to master concepts through active reasoning.


πŸ—οΈ Architecture: xRx (Input β†’ Reasoning β†’ Output)

ZED follows the xRx Architecture pattern, a clean separation of concerns for voice AI agents:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           ZED ARCHITECTURE                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚   β”‚             β”‚    β”‚                     β”‚    β”‚             β”‚         β”‚
β”‚   β”‚    INPUT    │───▢│     REASONING       │───▢│   OUTPUT    β”‚         β”‚
β”‚   β”‚   (Ears)    β”‚    β”‚     (Brain)         β”‚    β”‚   (Mouth)   β”‚         β”‚
β”‚   β”‚             β”‚    β”‚                     β”‚    β”‚             β”‚         β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚         β”‚                     β”‚                        β”‚                β”‚
β”‚         β–Ό                     β–Ό                        β–Ό                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚  Whisper  β”‚        β”‚   Llama   β”‚            β”‚ ElevenLabsβ”‚          β”‚
β”‚   β”‚   (STT)   β”‚        β”‚   (LLM)   β”‚            β”‚   (TTS)   β”‚          β”‚
β”‚   β”‚   Groq    β”‚        β”‚   Groq    β”‚            β”‚           β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                              β”‚                                          β”‚
β”‚                              β–Ό                                          β”‚
β”‚                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                 β”‚
β”‚                     β”‚    MEMORY       β”‚                                 β”‚
β”‚                     β”‚  (Knowledge)    β”‚                                 β”‚
β”‚                     β”‚                 β”‚                                 β”‚
β”‚                     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                                 β”‚
β”‚                     β”‚  β”‚ ChromaDB  β”‚  β”‚                                 β”‚
β”‚                     β”‚  β”‚  (RAG)    β”‚  β”‚                                 β”‚
β”‚                     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                                 β”‚
β”‚                     β”‚                 β”‚                                 β”‚
β”‚                     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                                 β”‚
β”‚                     β”‚  β”‚  Canvas   β”‚  β”‚                                 β”‚
β”‚                     β”‚  β”‚  (ETL)    β”‚  β”‚                                 β”‚
β”‚                     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                                 β”‚
β”‚                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                 β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

xRx Components

Layer File Responsibility
INPUT ears.py Captures audio, transcribes speech β†’ text (Groq Whisper)
REASONING brain.py Socratic State Machine, RAG retrieval, LLM streaming (Groq Llama)
OUTPUT mouth.py Converts text β†’ speech, plays audio (ElevenLabs)
MEMORY knowledge.py Vector embeddings, ChromaDB, semantic search
ETL canvas_sync.py Downloads PDFs from Canvas LMS, organizes by course

🧠 Socratic State Machine

ZED implements a 3-State Socratic Tutor that adapts to the user's understanding:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SOCRATIC STATE MACHINE                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                           β”‚
β”‚   β”‚                 β”‚                                           β”‚
β”‚   β”‚   STATE 1: GYM  │◀─────────────────────────────────┐        β”‚
β”‚   β”‚   (Learning)    β”‚                                  β”‚        β”‚
β”‚   β”‚                 β”‚                                  β”‚        β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β”‚        β”‚
β”‚            β”‚                                           β”‚        β”‚
β”‚            β”‚ User gets it right                        β”‚        β”‚
β”‚            β–Ό                                           β”‚        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                  β”‚        β”‚
β”‚   β”‚                 β”‚                                  β”‚        β”‚
β”‚   β”‚ STATE 2: COOL-  β”‚                                  β”‚        β”‚
β”‚   β”‚ DOWN (Validate) β”‚                                  β”‚        β”‚
β”‚   β”‚                 β”‚                                  β”‚        β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β”‚        β”‚
β”‚            β”‚                                           β”‚        β”‚
β”‚            β”‚ Immediately pivot                         β”‚        β”‚
β”‚            β–Ό                                           β”‚        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                  β”‚        β”‚
β”‚   β”‚                 β”‚     User struggles               β”‚        β”‚
β”‚   β”‚ STATE 3:        β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚   β”‚ CHALLENGE       β”‚                                           β”‚
β”‚   β”‚ (Edge Cases)    β”‚                                           β”‚
β”‚   β”‚                 β”‚                                           β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                           β”‚
β”‚            β”‚                                                     β”‚
β”‚            β”‚ "Thank you ZED" / "I'm done"                       β”‚
β”‚            β–Ό                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                           β”‚
β”‚   β”‚   [HANGUP]      β”‚                                           β”‚
β”‚   β”‚   Session End   β”‚                                           β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                           β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

State Behaviors

State Trigger ZED's Action
GYM User is wrong/learning Ask scaffolding questions, reference slides
COOL-DOWN User answers correctly Validate briefly ("Exactly."), then immediately pivot
CHALLENGE User shows understanding Push with edge cases ("What if variance is 0?")
Exception: Confused "I don't understand" Brief explanation (2-3 sentences), then check understanding
Exception: Tired "I'm done", "Thank you ZED" Acknowledge, validate session, yield [HANGUP]

πŸŽ™οΈ Wake Word Session Management

ZED operates like a smart speaker with ASLEEP/AWAKE states:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  WAKE WORD STATE MACHINE                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          "Hey ZED"         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚             β”‚ ─────────────────────────▢ β”‚           β”‚ β”‚
β”‚   β”‚   ASLEEP    β”‚                            β”‚   AWAKE   β”‚ β”‚
β”‚   β”‚  πŸ”΄ Ignore  β”‚ ◀───────────────────────── β”‚  🟒 Listenβ”‚ β”‚
β”‚   β”‚             β”‚     [HANGUP] / Timeout     β”‚           β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                             β”‚
β”‚   β€’ WebSocket stays open                                    β”‚
β”‚   β€’ Only state changes, not connection                      β”‚
β”‚   β€’ Frontend receives status updates                        β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Tech Stack

Backend (Python)

Technology Purpose Why
FastAPI WebSocket server Async, fast, modern Python
Groq LLM & STT inference Fastest inference (Llama 3.3 70B, Whisper)
ElevenLabs Text-to-Speech Natural, low-latency voice
ChromaDB Vector database Local, lightweight, persistent
Sentence-Transformers Embeddings all-MiniLM-L6-v2 for semantic search
PyMuPDF PDF parsing Fast, accurate text extraction
canvasapi Canvas LMS integration Download course materials automatically

Frontend (TypeScript)

Technology Purpose Why
React 18 UI framework Component-based, hooks
Vite Build tool Fast HMR, modern bundling
TypeScript Type safety Catch errors at compile time
Tailwind CSS Styling Utility-first, rapid prototyping
Framer Motion Animations Declarative, performant
Web Audio API Voice Activity Detection Browser-native VAD
MediaRecorder API Audio capture Browser-native recording

Infrastructure

Component Technology
Protocol WebSocket (real-time bidirectional)
Audio Format WebM/WAV β†’ MP3
Vector Store ChromaDB (SQLite backend)
Session State In-memory (per WebSocket)

πŸ“ Project Structure

groq/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ server.py              # WebSocket server, wake word gatekeeper
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py            # CLI orchestrator (terminal mode)
β”‚   β”‚   └── services/
β”‚   β”‚       β”œβ”€β”€ brain.py       # Socratic State Machine, LLM
β”‚   β”‚       β”œβ”€β”€ knowledge.py   # RAG pipeline, ChromaDB
β”‚   β”‚       β”œβ”€β”€ ears.py        # Audio recording, Whisper STT
β”‚   β”‚       β”œβ”€β”€ mouth.py       # ElevenLabs TTS, audio playback
β”‚   β”‚       └── canvas_sync.py # Canvas LMS PDF downloader
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ chroma_db/         # Vector embeddings (persistent)
β”‚   β”‚   β”œβ”€β”€ downloads/         # PDFs organized by course
β”‚   β”‚   └── wake_words/        # Porcupine wake word models
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx            # Main app, phase management
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ MainScene.tsx  # Voice UI, conversation panel
β”‚   β”‚   β”‚   └── LoginScene.tsx # Canvas login
β”‚   β”‚   └── hooks/
β”‚   β”‚       └── useVoiceInput.ts # VAD, WebSocket, audio handling
β”‚   β”œβ”€β”€ index.html
β”‚   └── package.json
β”‚
└── README.md

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • API Keys: GROQ_API_KEY, ELEVEN_API_KEY, CANVAS_API_KEY (optional)

Backend

cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Create .env file
cat > .env << EOF
GROQ_API_KEY=your_groq_key
ELEVEN_API_KEY=your_elevenlabs_key
CANVAS_API_KEY=your_canvas_key  # Optional
CANVAS_API_URL=https://your-institution.instructure.com
EOF

# Run server
python server.py

Frontend

cd frontend
npm install

# Create .env file
cat > .env << EOF
VITE_WS_URL=ws://localhost:8000/ws
VITE_API_URL=http://localhost:8000
EOF

# Run dev server
npm run dev

Usage

  1. Open http://localhost:5173 in your browser
  2. Allow microphone access
  3. Say "Hey ZED" to wake up
  4. Ask your question
  5. Say "Thank you ZED" or "I'm done" to end

πŸ”§ Environment Variables

Backend (backend/.env)

Variable Required Default Description
GROQ_API_KEY βœ… - Groq API key for Whisper + Llama
ELEVEN_API_KEY βœ… - ElevenLabs API key for TTS
ELEVEN_VOICE_ID ❌ 21m00Tcm4TlvDq8ikWAM ElevenLabs voice (Rachel)
CANVAS_API_KEY ❌ - Canvas LMS API token
CANVAS_API_URL ❌ - Canvas instance URL
GROQ_MODEL ❌ llama-3.3-70b-versatile LLM model
GROQ_TEMPERATURE ❌ 0.4 LLM temperature
RAG_THRESHOLD ❌ 0.35 Minimum relevance score
SKIP_RAG ❌ false Bypass RAG for testing

Frontend (frontend/.env)

Variable Required Default Description
VITE_WS_URL βœ… ws://localhost:8000/ws WebSocket endpoint
VITE_API_URL ❌ http://localhost:8000 REST API endpoint

πŸ“Š Data Flow

User speaks "What is variance?"
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser        β”‚
β”‚  MediaRecorder  │──── WebM audio blob ────▢ WebSocket
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  server.py      β”‚
β”‚  (Gatekeeper)   │──── is_awake? ────▢ If FALSE, ignore
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ TRUE
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Groq Whisper   β”‚
β”‚  (STT)          │──── "What is variance?" ────▢
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  brain.py       β”‚
β”‚  (Reasoning)    β”‚
β”‚                 β”‚
β”‚  1. RAG search  │──── ChromaDB ────▢ [relevant chunks]
β”‚  2. Build promptβ”‚
β”‚  3. Stream LLM  │──── Groq Llama ────▢ tokens
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ElevenLabs     β”‚
β”‚  (TTS)          │──── MP3 audio ────▢ WebSocket
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser        β”‚
β”‚  Audio.play()   │──── πŸ”Š ZED speaks
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Design Principles

  1. Socratic, not Spoon-feeding: ZED asks questions, never gives direct answers
  2. Voice-first: Optimized for spoken interaction, not typing
  3. Low Latency: Streaming tokens + TTS for instant feedback
  4. Context-aware: RAG pulls relevant course materials
  5. Relentless: Keeps pushing until you truly understand
  6. Graceful: Respects when you're done, validates your effort

πŸ“ License

MIT


Built with 🧠 and β˜• for students who want to think, not just memorize.