Skip to content

Simo-03/TwinTutor

Repository files navigation

TwinTutor - Voice AI Challenge

A full-stack application that combines YouTube video transcripts with AI-powered tutoring, featuring a quad-view interface with video player, live transcript, AI chat, and voice interface.

🏗️ Architecture

  • Backend: FastAPI (Python) with Gemini AI integration
  • Frontend: React + TypeScript with Vite
  • AI Services: Google Gemini, ElevenLabs TTS

📋 Prerequisites

  • Python 3.13+
  • Node.js 18+
  • npm or yarn

🚀 Setup Instructions

1. Install Backend Dependencies

# Using uv (recommended):
uv sync

# To add new packages:
uv add <package-name>

2. Install Frontend Dependencies

npm install

3. Environment Variables

Create a .env file in the root directory with the following variables:

Backend (.env):

GEMINI_API_KEY=your_gemini_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here  # For TTS audio generation
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM  # Optional, defaults to Rachel voice
ELEVENLABS_SST_API_KEY=your_elevenlabs_api_key_here

Frontend (.env or .env.local):

VITE_API_BASE_URL=http://localhost:8000  # Backend API URL (defaults to http://localhost:8000)
VITE_GEMINI_API_KEY=your_gemini_api_key_here  # Optional, only if using direct Gemini calls
VITE_ELEVENLABS_API_KEY=your_elevenlabs_api_key_here  # For voice cloning (frontend feature)
VITE_YOUTUBE_API_KEY=your_youtube_api_key_here  # Optional, for enhanced YouTube features

Note: The frontend is now connected to the backend API. When you load a video, it automatically initializes a backend session. The chat bot uses the backend's /api/ask endpoint which provides transcript-based context.

4. Run the Application

Terminal 1 - Backend:

python main.py
# Or: uvicorn main:app --reload

The backend will run on http://localhost:8000

Terminal 2 - Frontend:

npm run dev

The frontend will run on http://localhost:5173 (or another port if 5173 is taken)

📡 API Endpoints

Backend API (FastAPI)

  • GET / - Health check
  • POST /api/init-video - Initialize a video session
    {
      "video_url": "https://www.youtube.com/watch?v=..."
    }
  • POST /api/ask - Ask a question to the AI tutor
    {
      "session_id": "uuid-here",
      "question": "What is this video about?"
    }

🎯 Features

  • Video Player: YouTube video playback with transcript integration
  • Live Transcript: Real-time transcript display
  • AI Chat Bot: Context-aware chat using Gemini AI
  • Voice Interface: Voice cloning and text-to-speech with ElevenLabs
  • Quad-View Layout: Four-panel interface for optimal learning experience

🔧 Development

Backend Development

# Run with auto-reload
uvicorn main:app --reload

Frontend Development

# Run dev server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

📝 Notes

  • The backend uses in-memory session storage (sessions are lost on server restart)
  • The call_elevenlabs_tts function generates audio files and saves them to static/audio/
  • Audio files are served via FastAPI's static file mounting at /static/audio/
  • Frontend and backend can work independently, but full integration requires connecting frontend to backend API endpoints

🐛 Troubleshooting

  1. Module not found errors: Make sure all dependencies are installed
  2. API key errors: Verify your .env file has all required keys
  3. CORS issues: The backend should handle CORS, but if issues occur, check FastAPI CORS middleware
  4. Port conflicts: Change ports in main.py (backend) or vite.config.ts (frontend)

About

Voice-enabled AI tutoring platform that extracts YouTube video transcripts and provides interactive, context-aware tutoring through a multi-panel interface with video, transcripts, chat, and voice features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors