Phoneme Recognition App

Unified web application for phoneme-level speech recognition using Wav2Vec2.

🎯 Features

📊 Post-Recording Analysis

Record audio directly in browser or upload files
Complete phoneme analysis with timing and confidence
IPA transcription with detailed breakdown
Interactive phoneme grid with visual display

⚡ Real-time Streaming

Live phoneme detection as you speak
WebSocket streaming for low-latency processing
Real-time IPA transcription updates
Confidence-based phoneme bubbles with color coding
Session statistics (phonemes/second, total count)

🚀 Quick Start

Setup environment:

source venv/bin/activate
pip install -r requirements.txt

Run the web app:
```
python phoneme_app.py
```
Open browser: http://localhost:5001

📁 Files

Core Application

phoneme_app.py - Main Flask web application
templates/index.html - Web UI
phoneme_vocab.json - Phoneme vocabulary (392 phonemes)
requirements.txt - Python dependencies

Utilities

extract_phoneme_vocab.py - Downloads phoneme vocabulary from HuggingFace
test_real_audio.py - Command-line tool for testing audio files

Sample Data

sample.mp3 - Test audio file ("SHE SELLS SEA SHELLS")

🎤 Usage

Mode Selection

The app has two modes accessible via tabs:

📊 Post-Recording Analysis - Analyze complete recordings
⚡ Real-time Streaming - Live phoneme detection

Post-Recording Mode

Click "Start Recording" or "Upload Audio File"
Speak clearly into microphone or select audio file
Click "Stop" then "Analyze" (for recordings)
View complete phoneme breakdown with IPA transcription

Real-time Mode

Switch to "Real-time Streaming" tab
Click "Start Real-time"
Speak and see live phonemes appear as bubbles
View real-time IPA transcription and statistics

🔬 Technical Details

Model: facebook/wav2vec2-lv-60-espeak-cv-ft
Vocabulary: 392 eSpeak phonemes
Input: 16kHz audio
Output: IPA phoneme transcription with timing and confidence

📊 Example Output

Input: "She sells sea shells" Phonemes: ʃ i s a l s iː ʃ a n s IPA: /ʃ i s a l s iː ʃ a n s/

🛠️ Troubleshooting

Ensure microphone permissions are granted
Use Chrome/Firefox for best browser compatibility
Audio files should be clear speech for best results

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
.gcloudignore		.gcloudignore
.gitignore		.gitignore
README.md		README.md
app.yaml		app.yaml
backup_phoneme.py		backup_phoneme.py
english_phonemes.json		english_phonemes.json
extract_phoneme_vocab.py		extract_phoneme_vocab.py
filter_english_phonemes.py		filter_english_phonemes.py
phoneme_app.py		phoneme_app.py
phoneme_vocab.json		phoneme_vocab.json
requirements.txt		requirements.txt
test_real_audio.py		test_real_audio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phoneme Recognition App

🎯 Features

📊 Post-Recording Analysis

⚡ Real-time Streaming

🚀 Quick Start

📁 Files

Core Application

Utilities

Sample Data

🎤 Usage

Mode Selection

Post-Recording Mode

Real-time Mode

🔬 Technical Details

📊 Example Output

🛠️ Troubleshooting

About

Uh oh!

Releases

Packages

Languages

MyEdLab/agastya-phoneme-recognition

Folders and files

Latest commit

History

Repository files navigation

Phoneme Recognition App

🎯 Features

📊 Post-Recording Analysis

⚡ Real-time Streaming

🚀 Quick Start

📁 Files

Core Application

Utilities

Sample Data

🎤 Usage

Mode Selection

Post-Recording Mode

Real-time Mode

🔬 Technical Details

📊 Example Output

🛠️ Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages