Unified web application for phoneme-level speech recognition using Wav2Vec2.
- Record audio directly in browser or upload files
- Complete phoneme analysis with timing and confidence
- IPA transcription with detailed breakdown
- Interactive phoneme grid with visual display
- Live phoneme detection as you speak
- WebSocket streaming for low-latency processing
- Real-time IPA transcription updates
- Confidence-based phoneme bubbles with color coding
- Session statistics (phonemes/second, total count)
-
Setup environment:
source venv/bin/activate pip install -r requirements.txt
-
Run the web app:
python phoneme_app.py
-
Open browser: http://localhost:5001
phoneme_app.py
- Main Flask web applicationtemplates/index.html
- Web UIphoneme_vocab.json
- Phoneme vocabulary (392 phonemes)requirements.txt
- Python dependencies
extract_phoneme_vocab.py
- Downloads phoneme vocabulary from HuggingFacetest_real_audio.py
- Command-line tool for testing audio files
sample.mp3
- Test audio file ("SHE SELLS SEA SHELLS")
The app has two modes accessible via tabs:
- 📊 Post-Recording Analysis - Analyze complete recordings
- ⚡ Real-time Streaming - Live phoneme detection
- Click "Start Recording" or "Upload Audio File"
- Speak clearly into microphone or select audio file
- Click "Stop" then "Analyze" (for recordings)
- View complete phoneme breakdown with IPA transcription
- Switch to "Real-time Streaming" tab
- Click "Start Real-time"
- Speak and see live phonemes appear as bubbles
- View real-time IPA transcription and statistics
- Model:
facebook/wav2vec2-lv-60-espeak-cv-ft
- Vocabulary: 392 eSpeak phonemes
- Input: 16kHz audio
- Output: IPA phoneme transcription with timing and confidence
Input: "She sells sea shells"
Phonemes: ʃ i s a l s iː ʃ a n s
IPA: /ʃ i s a l s iː ʃ a n s/
- Ensure microphone permissions are granted
- Use Chrome/Firefox for best browser compatibility
- Audio files should be clear speech for best results