A full-stack application that combines YouTube video transcripts with AI-powered tutoring, featuring a quad-view interface with video player, live transcript, AI chat, and voice interface.
- Backend: FastAPI (Python) with Gemini AI integration
- Frontend: React + TypeScript with Vite
- AI Services: Google Gemini, ElevenLabs TTS
- Python 3.13+
- Node.js 18+
- npm or yarn
# Using uv (recommended):
uv sync
# To add new packages:
uv add <package-name>npm installCreate a .env file in the root directory with the following variables:
Backend (.env):
GEMINI_API_KEY=your_gemini_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here # For TTS audio generation
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM # Optional, defaults to Rachel voice
ELEVENLABS_SST_API_KEY=your_elevenlabs_api_key_hereFrontend (.env or .env.local):
VITE_API_BASE_URL=http://localhost:8000 # Backend API URL (defaults to http://localhost:8000)
VITE_GEMINI_API_KEY=your_gemini_api_key_here # Optional, only if using direct Gemini calls
VITE_ELEVENLABS_API_KEY=your_elevenlabs_api_key_here # For voice cloning (frontend feature)
VITE_YOUTUBE_API_KEY=your_youtube_api_key_here # Optional, for enhanced YouTube featuresNote: The frontend is now connected to the backend API. When you load a video, it automatically initializes a backend session. The chat bot uses the backend's
/api/askendpoint which provides transcript-based context.
Terminal 1 - Backend:
python main.py
# Or: uvicorn main:app --reloadThe backend will run on http://localhost:8000
Terminal 2 - Frontend:
npm run devThe frontend will run on http://localhost:5173 (or another port if 5173 is taken)
GET /- Health checkPOST /api/init-video- Initialize a video session{ "video_url": "https://www.youtube.com/watch?v=..." }POST /api/ask- Ask a question to the AI tutor{ "session_id": "uuid-here", "question": "What is this video about?" }
- Video Player: YouTube video playback with transcript integration
- Live Transcript: Real-time transcript display
- AI Chat Bot: Context-aware chat using Gemini AI
- Voice Interface: Voice cloning and text-to-speech with ElevenLabs
- Quad-View Layout: Four-panel interface for optimal learning experience
# Run with auto-reload
uvicorn main:app --reload# Run dev server
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview- The backend uses in-memory session storage (sessions are lost on server restart)
- The
call_elevenlabs_ttsfunction generates audio files and saves them tostatic/audio/ - Audio files are served via FastAPI's static file mounting at
/static/audio/ - Frontend and backend can work independently, but full integration requires connecting frontend to backend API endpoints
- Module not found errors: Make sure all dependencies are installed
- API key errors: Verify your
.envfile has all required keys - CORS issues: The backend should handle CORS, but if issues occur, check FastAPI CORS middleware
- Port conflicts: Change ports in
main.py(backend) orvite.config.ts(frontend)