🎙️ Vocalyst

AI-Powered Communication Coaching Platform

Elevate your public speaking and presentation skills through intelligent multimodal analysis

Features • Quick Start • Docker Deployment • Technologies • Roadmap

📖 About Vocalyst

Vocalyst is an advanced AI-driven communication coaching platform that revolutionizes how individuals improve their public speaking and presentation skills. Unlike traditional tools that focus solely on text analysis, Vocalyst provides comprehensive, real-time multimodal feedback by analyzing:

🗣️ Voice Analysis - Speech fluency, pacing, tone, and delivery
😊 Facial Expressions - Emotional engagement and confidence levels
👁️ Eye Contact - Gaze tracking and audience engagement
📝 Content Structure - Logical coherence, vocabulary, and engagement

Communication is more than just words—it's about how you sound, how you look, and how you structure your message. Vocalyst bridges the gap left by conventional tools by offering a unified, intelligent solution for holistic communication improvement.

✨ Features

🎯 Core Capabilities

1. Multimodal Practice Sessions

Real-time Feedback during presentations
Multiple Practice Modes: General, Persuasive, Emotive, Debate, Storytelling
Camera & Audio Integration for comprehensive analysis
Live Metrics Display with instant feedback

2. Advanced Speech Analysis

Speech Transcription using OpenAI Whisper
Filler Word Detection (um, uh, like, etc.) with frequency tracking
Words Per Minute (WPM) measurement
Clarity Scoring based on pronunciation and enunciation
Vocabulary Sophistication tracking

3. Visual & Emotional Intelligence

Eye Contact Tracking via MediaPipe
Facial Expression Analysis using DeepFace
Emotion Detection (neutral, happy, sad, angry, fear, surprise)
Engagement Estimation through facial cues
Real-time Visual Feedback during sessions

4. AI-Powered Insights

Dynamic Session Insights generated by Google Gemini AI
Personalized Recommendations based on performance
Trend Analysis (improving/declining/stable metrics)
Strengths & Weaknesses identification
Gamification with level/XP system

5. Comprehensive Analytics

Performance Dashboard with historical trends
Session History with detailed breakdowns
Progress Tracking over time
Practice Mode Analytics with distribution charts
Emotional Expression Patterns
Reset Functionality with data archiving

6. Text-to-Speech Integration

Multiple Voice Options (8 high-quality AI voices)
Speed Control for customized playback
ElevenLabs & Neuphonic integration
Practice Prompts generation

🚀 Quick Start

Prerequisites

Docker & Docker Compose (recommended) OR
Node.js (v18+) and Python (v3.11+)

Option 1: Docker Deployment (Recommended)

Clone the repository

git clone https://github.com/Shreyyy07/Vocalyst-Main.git
cd Vocalyst-Main

Set up environment variables

cp .env.example .env
# Edit .env with your API keys

Start with Docker Compose
```
docker-compose up
```
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:5328

Option 2: Local Development

Clone and install dependencies

git clone https://github.com/Shreyyy07/Vocalyst-Main.git
cd Vocalyst-Main

# Install Python dependencies
pip install -r requirements.txt

# Install Node.js dependencies
npm install

Set up environment variables

cp .env.example .env
# Add your API keys to .env

Run both servers

npm run dev

Or run separately:

# Terminal 1 - Frontend
npm run next-dev

# Terminal 2 - Backend
npm run flask-dev

🐳 Docker Deployment

Architecture

Vocalyst uses a multi-container Docker setup:

Frontend Container: Next.js production build (Port 3000)
Backend Container: Flask API with ML models (Port 5328)
Shared Network: Bridge network for inter-container communication
Persistent Volumes: Session data and uploads

Configuration

Docker Commands

# Build containers
docker-compose build

# Start services
docker-compose up

# Start in detached mode
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs -f

# Rebuild and restart
docker-compose down && docker-compose build && docker-compose up

Data Persistence

Session Data: ./api/data - Stores practice session analytics
Uploads: ./api/uploads - Stores recordings and emotion data
Archives: ./api/data/archive - Archived session data after reset

💻 Usage

Starting a Practice Session

Navigate to Practice
- Click "Practice" in the navigation menu
- Select a practice mode (General, Persuasive, Emotive, etc.)
Record Your Presentation
- Allow camera and microphone permissions
- Click "Start Recording"
- Speak naturally while the system analyzes
Receive Real-time Feedback
- Monitor live WPM, clarity, and filler word metrics
- View eye contact and emotion tracking
- Get instant visual feedback
Review Detailed Analysis
- View comprehensive post-session breakdown
- Get AI-generated personalized insights
- See scores for fluency, coherence, and engagement
- Receive actionable recommendations

Analytics & Insights

Analytics Dashboard (/analytics):

View aggregated performance metrics
Track practice mode distribution
Monitor emotional expression patterns
Review recent session history
Reset analytics with data archiving

AI-Powered Insights (/get-insights):

Gamified progress tracking (Level/XP system)
Skill breakdown radar chart
Dynamic strengths and weaknesses
Performance trends (improving/declining/stable)
Personalized recommendations
Reset progress functionality

🛠️ Technologies Used

Frontend Stack

Technology	Purpose
Next.js 14	React framework with SSR and production optimization
React 18	UI component library with hooks
TypeScript	Type-safe JavaScript development
Tailwind CSS	Utility-first styling framework
Framer Motion	Smooth animations and transitions
Recharts	Data visualization and charts
Lucide React	Modern icon library

Backend Stack

Technology	Purpose
Flask	Python web framework for API
Flask-CORS	Cross-origin resource sharing
Google Gemini AI	Dynamic insights generation
Neuphonic	Enhanced TTS and speech processing
ElevenLabs	High-quality text-to-speech

AI/ML Models

Model	Purpose	Performance
OpenAI Whisper	Speech-to-text transcription	State-of-the-art accuracy
RoBERTa (large)	Logical coherence detection	High performance
XGBoost	Speech fluency classification	93% F1 Score
Google Gemini Pro	AI insights generation	Real-time analysis

Audio Processing

Librosa - Audio feature extraction (MFCC, ZCR, energy)
Neuphonic - Enhanced speech signal processing
OpenAI Whisper - Accurate speech transcription
SoundDevice - Real-time audio capture

Computer Vision

MediaPipe - Face landmark detection (68 points)
OpenCV - Video processing and frame analysis
DeepFace - Facial expression and emotion recognition
GazeTracking - Eye contact estimation

Deployment

Docker - Containerization platform
Docker Compose - Multi-container orchestration
Gunicorn - Production WSGI server
Next.js Production Build - Optimized frontend

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    VOCALYST PLATFORM                            │
└─────────────────────────────────────────────────────────────────┘
                                 │
                    ┌────────────┴────────────┐
                    │                         │
            ┌───────▼────────┐       ┌───────▼────────┐
            │   Frontend      │       │    Backend     │
            │  Container      │◄─────►│   Container    │
            │  (Next.js)      │  REST │   (Flask)      │
            │  Port: 3000     │  API  │  Port: 5328    │
            └─────────────────┘       └────────────────┘
                                              │
                    ┌─────────────────────────┼─────────────────────────┐
                    │                         │                         │
            ┌───────▼────────┐       ┌───────▼────────┐       ┌───────▼────────┐
            │  SPEECH MODULE  │       │  VISUAL MODULE │       │   AI INSIGHTS  │
            │   (Whisper)     │       │  (MediaPipe)   │       │   (Gemini)     │
            └─────────────────┘       └────────────────┘       └────────────────┘
                    │                         │                         │
         ┌──────────┴──────────┐   ┌─────────┴─────────┐    ┌─────────┴─────────┐
         │                     │   │                   │    │                   │
    ┌────▼────┐          ┌────▼────┐            ┌────▼────┐            ┌────▼────┐
    │Filler   │          │  WPM    │            │DeepFace │            │Dynamic  │
    │Detection│          │Tracking │            │Emotions │            │Insights │
    └─────────┘          └─────────┘            └─────────┘            └─────────┘
                                 │
                    ┌────────────▼────────────┐
                    │   ANALYTICS DASHBOARD   │
                    │  - Session History      │
                    │  - Trends & Progress    │
                    │  - Recommendations      │
                    └─────────────────────────┘

📁 Project Structure

Vocalyst-Main/
├── api/                          # Backend Flask API
│   ├── index.py                  # Main API endpoints
│   ├── simple_tts.py            # TTS subprocess handler
│   ├── tonality.py              # Tonality analysis
│   ├── data/                    # Session data storage
│   │   ├── sessions.json        # Practice sessions
│   │   └── archive/             # Archived data
│   ├── uploads/                 # User recordings
│   ├── Dockerfile               # Backend container config
│   └── requirements.txt         # Python dependencies
│
├── app/                         # Next.js frontend
│   ├── analytics/               # Analytics dashboard
│   ├── get-insights/           # AI insights page
│   ├── practice/               # Practice session interface
│   ├── camera/                 # Camera capture
│   ├── tts/                    # Text-to-speech lab
│   ├── Dockerfile              # Frontend container config
│   └── page.tsx                # Landing page
│
├── components/                  # Reusable React components
│   └── ui/                     # UI component library
│
├── docker-compose.yml          # Multi-container orchestration
├── .env.example               # Environment variables template
├── .dockerignore              # Docker ignore rules
├── .gitignore                 # Git ignore rules
├── package.json              # Node.js dependencies
├── requirements.txt          # Python dependencies
└── README.md                # This file

🎯 Key Features Breakdown

Session Analysis

Real-time Metrics: Live WPM, clarity, filler word tracking
Post-Session Breakdown: Comprehensive analysis with scores
AI Insights: Unique, personalized feedback per session
Historical Tracking: Progress monitoring over time

Analytics Dashboard

Aggregated Metrics: Average WPM, filler %, clarity, duration
Practice Mode Distribution: Visual breakdown by category
Emotional Patterns: Emotion distribution across sessions
Recent Sessions: Quick access to session history
Reset Functionality: Archive and clear data

AI-Powered Insights

Dynamic Analysis: Real-time strengths/weaknesses calculation
Trend Detection: Improving/declining/stable metrics
Personalized Recommendations: Actionable improvement tips
Gamification: Level/XP system for motivation
Skill Visualization: Radar chart for skill breakdown

🗺️ Roadmap

✅ Completed Features

Docker containerization and deployment
Dynamic AI insights with Gemini API
Analytics reset with data archiving
Enhanced insights page with gamification
Skill breakdown radar chart
Performance trend analysis
Multi-voice TTS integration

🚀 Upcoming Features

🔐 Security & Privacy

Local Processing: All ML models run locally in Docker containers
No Data Sharing: Session data stays on your machine
Environment Variables: Secure API key management
Data Archiving: Safe reset with backup functionality
CORS Protection: Configured cross-origin policies

🤝 Contributing

We welcome contributions! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 for Python, ESLint for TypeScript
Write meaningful commit messages
Add comments for complex logic
Update documentation as needed

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Author

Shreyyy07 - GitHub Profile

⭐ Star this repository if you find it helpful!

🐛 Found a bug? Open an issue

💡 Have a feature idea? Start a discussion

Made with ❤️ by Shreyyy07

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
api		api
app		app
components		components
experiments_models		experiments_models
lib		lib
public		public
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
COMMIT_MESSAGE.md		COMMIT_MESSAGE.md
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
face_detection_yunet_2023mar.onnx		face_detection_yunet_2023mar.onnx
face_landmarker_v2_with_blendshapes.task		face_landmarker_v2_with_blendshapes.task
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
requirements.txt		requirements.txt
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

🎙️ Vocalyst

AI-Powered Communication Coaching Platform

📖 About Vocalyst

✨ Features

🎯 Core Capabilities

1. Multimodal Practice Sessions

2. Advanced Speech Analysis

3. Visual & Emotional Intelligence

4. AI-Powered Insights

5. Comprehensive Analytics

6. Text-to-Speech Integration

🚀 Quick Start

Prerequisites

Option 1: Docker Deployment (Recommended)

Option 2: Local Development

🐳 Docker Deployment

Architecture

Configuration

Docker Commands

Data Persistence

💻 Usage

Starting a Practice Session

Analytics & Insights

🛠️ Technologies Used

Frontend Stack

Backend Stack

AI/ML Models

Audio Processing

Computer Vision

Deployment

🏗️ System Architecture

📁 Project Structure

🎯 Key Features Breakdown

Session Analysis

Analytics Dashboard

AI-Powered Insights

🗺️ Roadmap

✅ Completed Features

🚀 Upcoming Features

🔐 Security & Privacy

🤝 Contributing

Development Guidelines

📄 License

👥 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages