Empowering communication by mitigating accent-based discrimination through AI-driven solutions.
CogLingo.Demo.mp4
CogLingo is an innovative tool addressing accent bias and discrimination by detecting and correcting mispronunciations. By leveraging cutting-edge AI, CogLingo promotes inclusivity and supports confident communication for all.
- Prejudice against non-native accents leads to inequities in:
- Hiring practices
- Career advancement
- Housing opportunities
- Access to education
- Existing tools fail to provide phoneme-level mispronunciation corrections.
To create an accessible, intuitive solution to reduce accent-based discrimination and empower individuals to embrace their unique ways of speaking.
- 92%+ Accuracy: Fine-tuned the Wav2Vec 2.0 model for detecting phoneme-level mispronunciations with industry-leading precision.
- Phonetically Diverse Dataset: Trained on 6,300 sentences spoken by 630 speakers from 8 U.S. dialect regions using the TIMIT dataset.
- Dynamic Feedback: Developed robust tools for mismatch identification and confidence scoring, ensuring actionable and reliable user insights.
- Beautiful, responsive and intuitive UI empowering effective learning
- Dataset: TIMIT (Texas Instruments/MIT)
- 6,300 phonetically rich sentences spoken by speakers from diverse dialects.
- Process:
- Pre-process phonemes and align them with audio-text pairs.
- Split data into training and validation sets.
- Fine-tuned Wav2Vec 2.0 to transcribe audio into phonemes.
- Example output for "She had your dark suit in greasy wash water all year.":
Phoneme Output:
sh-iy-hv-ae-dcl-d-y-e r-dcl-d-aa-r-kcl-k-s-u x-tcl-ih-n-gcl-g-r-iy-z-iy
Visualization:
Dynamic Time Warping (DTW) Visualization |
Phoneme Extraction Visualization |
- Phoneme Processing:
- Extract and compare phoneme sequences using Dynamic Time Warping (DTW).
- Identify mismatches with real-time confidence scoring.
- Actionable Feedback:
- Matches and mismatches clearly highlighted.
- Specific guidance on improving pronunciation.
Example Feedback for User Sentence:
"December and January are nice months to spend in Miami."
- Features:
- Over 500 phonetically diverse prompts to challenge and improve users’ pronunciation skills.
- Audio examples for each prompt.
- Side-by-side display of user phoneme input and expected output.
- Enhanced Feedback: Provide specific directions for phoneme articulation.
- Multilingual Support: Expand beyond English for broader accessibility.
- Integration with AR/VR: Incorporate emerging technologies for immersive learning.
- Personalized Learning Paths: Tailor exercises based on individual user progress.
- Advanced Analytics: Use state-of-the-art ML techniques to refine model accuracy further.
- Model: Wav2Vec 2.0 (ASR Model)
- Data: TIMIT dataset
- Techniques: Phoneme extraction/analysis, Dynamic Time Warping, fine-tuning
- Tools: PyTorch (GPU-enabled training), Gradio
- Clone the repository:
git clone https://github.com/SamGu-NRX/CogLingo.git
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python main.py
This project is licensed under the MIT License.



