Skip to content

Shreyyy07/Vocalyst-Working-Tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Vocalyst

AI-Powered Communication Coaching Platform

Elevate your public speaking and presentation skills through intelligent multimodal analysis

GitHub Stars GitHub Forks License Docker

Features β€’ Quick Start β€’ Docker Deployment β€’ Technologies β€’ Roadmap


πŸ“– About Vocalyst

Vocalyst is an advanced AI-driven communication coaching platform that revolutionizes how individuals improve their public speaking and presentation skills. Unlike traditional tools that focus solely on text analysis, Vocalyst provides comprehensive, real-time multimodal feedback by analyzing:

  • πŸ—£οΈ Voice Analysis - Speech fluency, pacing, tone, and delivery
  • 😊 Facial Expressions - Emotional engagement and confidence levels
  • πŸ‘οΈ Eye Contact - Gaze tracking and audience engagement
  • πŸ“ Content Structure - Logical coherence, vocabulary, and engagement

Communication is more than just wordsβ€”it's about how you sound, how you look, and how you structure your message. Vocalyst bridges the gap left by conventional tools by offering a unified, intelligent solution for holistic communication improvement.


✨ Features

🎯 Core Capabilities

1. Multimodal Practice Sessions

  • Real-time Feedback during presentations
  • Multiple Practice Modes: General, Persuasive, Emotive, Debate, Storytelling
  • Camera & Audio Integration for comprehensive analysis
  • Live Metrics Display with instant feedback

2. Advanced Speech Analysis

  • Speech Transcription using OpenAI Whisper
  • Filler Word Detection (um, uh, like, etc.) with frequency tracking
  • Words Per Minute (WPM) measurement
  • Clarity Scoring based on pronunciation and enunciation
  • Vocabulary Sophistication tracking

3. Visual & Emotional Intelligence

  • Eye Contact Tracking via MediaPipe
  • Facial Expression Analysis using DeepFace
  • Emotion Detection (neutral, happy, sad, angry, fear, surprise)
  • Engagement Estimation through facial cues
  • Real-time Visual Feedback during sessions

4. AI-Powered Insights

  • Dynamic Session Insights generated by Google Gemini AI
  • Personalized Recommendations based on performance
  • Trend Analysis (improving/declining/stable metrics)
  • Strengths & Weaknesses identification
  • Gamification with level/XP system

5. Comprehensive Analytics

  • Performance Dashboard with historical trends
  • Session History with detailed breakdowns
  • Progress Tracking over time
  • Practice Mode Analytics with distribution charts
  • Emotional Expression Patterns
  • Reset Functionality with data archiving

6. Text-to-Speech Integration

  • Multiple Voice Options (8 high-quality AI voices)
  • Speed Control for customized playback
  • ElevenLabs & Neuphonic integration
  • Practice Prompts generation

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose (recommended) OR
  • Node.js (v18+) and Python (v3.11+)

Option 1: Docker Deployment (Recommended)

  1. Clone the repository

    git clone https://github.com/Shreyyy07/Vocalyst-Main.git
    cd Vocalyst-Main
  2. Set up environment variables

    cp .env.example .env
    # Edit .env with your API keys
  3. Start with Docker Compose

    docker-compose up
  4. Access the application

Option 2: Local Development

  1. Clone and install dependencies

    git clone https://github.com/Shreyyy07/Vocalyst-Main.git
    cd Vocalyst-Main
    
    # Install Python dependencies
    pip install -r requirements.txt
    
    # Install Node.js dependencies
    npm install
  2. Set up environment variables

    cp .env.example .env
    # Add your API keys to .env
  3. Run both servers

    npm run dev

    Or run separately:

    # Terminal 1 - Frontend
    npm run next-dev
    
    # Terminal 2 - Backend
    npm run flask-dev

🐳 Docker Deployment

Architecture

Vocalyst uses a multi-container Docker setup:

  • Frontend Container: Next.js production build (Port 3000)
  • Backend Container: Flask API with ML models (Port 5328)
  • Shared Network: Bridge network for inter-container communication
  • Persistent Volumes: Session data and uploads

Configuration

Docker Commands

# Build containers
docker-compose build

# Start services
docker-compose up

# Start in detached mode
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs -f

# Rebuild and restart
docker-compose down && docker-compose build && docker-compose up

Data Persistence

  • Session Data: ./api/data - Stores practice session analytics
  • Uploads: ./api/uploads - Stores recordings and emotion data
  • Archives: ./api/data/archive - Archived session data after reset

πŸ’» Usage

Starting a Practice Session

  1. Navigate to Practice

    • Click "Practice" in the navigation menu
    • Select a practice mode (General, Persuasive, Emotive, etc.)
  2. Record Your Presentation

    • Allow camera and microphone permissions
    • Click "Start Recording"
    • Speak naturally while the system analyzes
  3. Receive Real-time Feedback

    • Monitor live WPM, clarity, and filler word metrics
    • View eye contact and emotion tracking
    • Get instant visual feedback
  4. Review Detailed Analysis

    • View comprehensive post-session breakdown
    • Get AI-generated personalized insights
    • See scores for fluency, coherence, and engagement
    • Receive actionable recommendations

Analytics & Insights

Analytics Dashboard (/analytics):

  • View aggregated performance metrics
  • Track practice mode distribution
  • Monitor emotional expression patterns
  • Review recent session history
  • Reset analytics with data archiving

AI-Powered Insights (/get-insights):

  • Gamified progress tracking (Level/XP system)
  • Skill breakdown radar chart
  • Dynamic strengths and weaknesses
  • Performance trends (improving/declining/stable)
  • Personalized recommendations
  • Reset progress functionality

πŸ› οΈ Technologies Used

Frontend Stack

Technology Purpose
Next.js 14 React framework with SSR and production optimization
React 18 UI component library with hooks
TypeScript Type-safe JavaScript development
Tailwind CSS Utility-first styling framework
Framer Motion Smooth animations and transitions
Recharts Data visualization and charts
Lucide React Modern icon library

Backend Stack

Technology Purpose
Flask Python web framework for API
Flask-CORS Cross-origin resource sharing
Google Gemini AI Dynamic insights generation
Neuphonic Enhanced TTS and speech processing
ElevenLabs High-quality text-to-speech

AI/ML Models

Model Purpose Performance
OpenAI Whisper Speech-to-text transcription State-of-the-art accuracy
RoBERTa (large) Logical coherence detection High performance
XGBoost Speech fluency classification 93% F1 Score
Google Gemini Pro AI insights generation Real-time analysis

Audio Processing

  • Librosa - Audio feature extraction (MFCC, ZCR, energy)
  • Neuphonic - Enhanced speech signal processing
  • OpenAI Whisper - Accurate speech transcription
  • SoundDevice - Real-time audio capture

Computer Vision

  • MediaPipe - Face landmark detection (68 points)
  • OpenCV - Video processing and frame analysis
  • DeepFace - Facial expression and emotion recognition
  • GazeTracking - Eye contact estimation

Deployment

  • Docker - Containerization platform
  • Docker Compose - Multi-container orchestration
  • Gunicorn - Production WSGI server
  • Next.js Production Build - Optimized frontend

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    VOCALYST PLATFORM                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                         β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Frontend      β”‚       β”‚    Backend     β”‚
            β”‚  Container      │◄─────►│   Container    β”‚
            β”‚  (Next.js)      β”‚  REST β”‚   (Flask)      β”‚
            β”‚  Port: 3000     β”‚  API  β”‚  Port: 5328    β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                         β”‚                         β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  SPEECH MODULE  β”‚       β”‚  VISUAL MODULE β”‚       β”‚   AI INSIGHTS  β”‚
            β”‚   (Whisper)     β”‚       β”‚  (MediaPipe)   β”‚       β”‚   (Gemini)     β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                         β”‚                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                     β”‚   β”‚                   β”‚    β”‚                   β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚Filler   β”‚          β”‚  WPM    β”‚            β”‚DeepFace β”‚            β”‚Dynamic  β”‚
    β”‚Detectionβ”‚          β”‚Tracking β”‚            β”‚Emotions β”‚            β”‚Insights β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   ANALYTICS DASHBOARD   β”‚
                    β”‚  - Session History      β”‚
                    β”‚  - Trends & Progress    β”‚
                    β”‚  - Recommendations      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

Vocalyst-Main/
β”œβ”€β”€ api/                          # Backend Flask API
β”‚   β”œβ”€β”€ index.py                  # Main API endpoints
β”‚   β”œβ”€β”€ simple_tts.py            # TTS subprocess handler
β”‚   β”œβ”€β”€ tonality.py              # Tonality analysis
β”‚   β”œβ”€β”€ data/                    # Session data storage
β”‚   β”‚   β”œβ”€β”€ sessions.json        # Practice sessions
β”‚   β”‚   └── archive/             # Archived data
β”‚   β”œβ”€β”€ uploads/                 # User recordings
β”‚   β”œβ”€β”€ Dockerfile               # Backend container config
β”‚   └── requirements.txt         # Python dependencies
β”‚
β”œβ”€β”€ app/                         # Next.js frontend
β”‚   β”œβ”€β”€ analytics/               # Analytics dashboard
β”‚   β”œβ”€β”€ get-insights/           # AI insights page
β”‚   β”œβ”€β”€ practice/               # Practice session interface
β”‚   β”œβ”€β”€ camera/                 # Camera capture
β”‚   β”œβ”€β”€ tts/                    # Text-to-speech lab
β”‚   β”œβ”€β”€ Dockerfile              # Frontend container config
β”‚   └── page.tsx                # Landing page
β”‚
β”œβ”€β”€ components/                  # Reusable React components
β”‚   └── ui/                     # UI component library
β”‚
β”œβ”€β”€ docker-compose.yml          # Multi-container orchestration
β”œβ”€β”€ .env.example               # Environment variables template
β”œβ”€β”€ .dockerignore              # Docker ignore rules
β”œβ”€β”€ .gitignore                 # Git ignore rules
β”œβ”€β”€ package.json              # Node.js dependencies
β”œβ”€β”€ requirements.txt          # Python dependencies
└── README.md                # This file

🎯 Key Features Breakdown

Session Analysis

  • Real-time Metrics: Live WPM, clarity, filler word tracking
  • Post-Session Breakdown: Comprehensive analysis with scores
  • AI Insights: Unique, personalized feedback per session
  • Historical Tracking: Progress monitoring over time

Analytics Dashboard

  • Aggregated Metrics: Average WPM, filler %, clarity, duration
  • Practice Mode Distribution: Visual breakdown by category
  • Emotional Patterns: Emotion distribution across sessions
  • Recent Sessions: Quick access to session history
  • Reset Functionality: Archive and clear data

AI-Powered Insights

  • Dynamic Analysis: Real-time strengths/weaknesses calculation
  • Trend Detection: Improving/declining/stable metrics
  • Personalized Recommendations: Actionable improvement tips
  • Gamification: Level/XP system for motivation
  • Skill Visualization: Radar chart for skill breakdown

πŸ—ΊοΈ Roadmap

βœ… Completed Features

  • Docker containerization and deployment
  • Dynamic AI insights with Gemini API
  • Analytics reset with data archiving
  • Enhanced insights page with gamification
  • Skill breakdown radar chart
  • Performance trend analysis
  • Multi-voice TTS integration

πŸš€ Upcoming Features

  • Multilingual Support - 20+ languages for global accessibility
  • Mobile Application - iOS and Android native apps
  • Real-Time Coaching - Live suggestions during presentations
  • Team Collaboration - Multi-user sessions and peer feedback
  • Custom Training Modules - Industry-specific templates
  • Integration APIs - Zoom, Teams, Meet connectivity
  • Advanced Emotion AI - Context-aware sentiment analysis
  • Voice Cloning - Personalized TTS with user's voice
  • Presentation Templates - Pre-built scenarios and scripts
  • Export Reports - PDF/PowerPoint presentation reports

πŸ” Security & Privacy

  • Local Processing: All ML models run locally in Docker containers
  • No Data Sharing: Session data stays on your machine
  • Environment Variables: Secure API key management
  • Data Archiving: Safe reset with backup functionality
  • CORS Protection: Configured cross-origin policies

🀝 Contributing

We welcome contributions! Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 for Python, ESLint for TypeScript
  • Write meaningful commit messages
  • Add comments for complex logic
  • Update documentation as needed

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘₯ Author


⭐ Star this repository if you find it helpful!

πŸ› Found a bug? Open an issue

πŸ’‘ Have a feature idea? Start a discussion

Made with ❀️ by Shreyyy07

About

Voclayst is an AI-driven communication coaching platform designed to enhance public speaking and presentation skills through multimodal analysis. It evaluates speech delivery, facial expressions, and content structure, offering personalized feedback to improve fluency, confidence, and clarity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors