Skip to content

A realtime voice-based AI tutor web app that generates educational slides using Claude 3.5 based on topic, subject, and grade level. It incorporates human feedback to finalize content, delivers voice-guided explanations, supports bi-directional speech-to-speech Q&A, and uses OpenAI function calling for intelligent slide navigation and context switc

Notifications You must be signed in to change notification settings

omgupta-iitk/AI_tutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

I designed and built a voice-based AI tutor web application as a freelance project for an Australian startup. The product was an interactive learning platform where students could engage in realtime, spoken dialogue with an AI tutor that generated and explained educational content.


🎯 Core Features:

  1. Slide generation via prompt-based input using Claude 3.5.
  2. Human-in-the-loop feedback before finalizing slides.
  3. Realtime voice tutoring referencing the slides.
  4. Bi-directional speech-to-speech Q&A with context switching.
  5. Context-aware slide navigation using function calling.

🔧 Architecture & Tools Used:

  • Frontend: Built with React, compiled into static assets (HTML/CSS/JS).
  • Backend: A Node.js service with WebSocket support to stream audio packets in realtime for seamless two-way communication.
  • AI Services:
    • Claude 3.5 Sonnet for generating slide scaffolds based on subject, topic, and grade.
    • OpenAI GPT-4o (preview) for realtime Q&A, explanation, and voice-based tutoring.
    • OpenAI Function Calling to dynamically shift context, e.g., navigating to relevant slides mid-conversation and returning to the original flow.

🚀 Scalability Strategy:

  • Frontend hosted on AWS CloudFront to ensure low latency and high availability.
  • Backend deployed on AWS EC2 Auto Scaling Groups behind a Load Balancer to handle concurrent sessions and scale based on CPU/memory metrics.
  • Used S3 for static asset storage and Route 53 for DNS management.
  • Optionally containerized backend with Docker, and can be extended with ECS or EKS for orchestration if scaling demands grow.

This architecture ensured low-latency audio streaming, cost-efficient scaling, and high availability—all critical for a smooth, interactive learning experience.

Here is the working video of the same: (Note: Use headphones with high volume to listen to the doubts asked by me at the end and check how the tutor changes the slide according to the question asked)

final3.mp4

About

A realtime voice-based AI tutor web app that generates educational slides using Claude 3.5 based on topic, subject, and grade level. It incorporates human feedback to finalize content, delivers voice-guided explanations, supports bi-directional speech-to-speech Q&A, and uses OpenAI function calling for intelligent slide navigation and context switc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published