Skip to content

Empowering communication by mitigating accent-based discrimination through AI.

License

Notifications You must be signed in to change notification settings

SamGu-NRX/CogLingo

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cog*Lingo: Breaking Accent Bias with AI

Empowering communication by mitigating accent-based discrimination through AI-driven solutions.

CogLingo.Demo.mp4

🚀 Project Overview

CogLingo is an innovative tool addressing accent bias and discrimination by detecting and correcting mispronunciations. By leveraging cutting-edge AI, CogLingo promotes inclusivity and supports confident communication for all.

Key Problem

  • Prejudice against non-native accents leads to inequities in:
    • Hiring practices
    • Career advancement
    • Housing opportunities
    • Access to education
  • Existing tools fail to provide phoneme-level mispronunciation corrections.

Our Goal

To create an accessible, intuitive solution to reduce accent-based discrimination and empower individuals to embrace their unique ways of speaking.


🏆 Our Achievements

  • 92%+ Accuracy: Fine-tuned the Wav2Vec 2.0 model for detecting phoneme-level mispronunciations with industry-leading precision.
  • Phonetically Diverse Dataset: Trained on 6,300 sentences spoken by 630 speakers from 8 U.S. dialect regions using the TIMIT dataset.
  • Dynamic Feedback: Developed robust tools for mismatch identification and confidence scoring, ensuring actionable and reliable user insights.
  • Beautiful, responsive and intuitive UI empowering effective learning

🧠 How It Works

CogLingo Design Flowchart

1. Data Collection

  • Dataset: TIMIT (Texas Instruments/MIT)
    • 6,300 phonetically rich sentences spoken by speakers from diverse dialects.
  • Process:
    • Pre-process phonemes and align them with audio-text pairs.
    • Split data into training and validation sets.

2. Automated Speech Recognition (ASR)

  • Fine-tuned Wav2Vec 2.0 to transcribe audio into phonemes.
  • Example output for "She had your dark suit in greasy wash water all year.":

Phoneme Output:
sh-iy-hv-ae-dcl-d-y-e r-dcl-d-aa-r-kcl-k-s-u x-tcl-ih-n-gcl-g-r-iy-z-iy

Visualization:

DTW Visualization

Dynamic Time Warping (DTW) Visualization

Phoneme Visualization

Phoneme Extraction Visualization


3. Phoneme Analysis, Scoring, and Feedback

  • Phoneme Processing:
    • Extract and compare phoneme sequences using Dynamic Time Warping (DTW).
    • Identify mismatches with real-time confidence scoring.
  • Actionable Feedback:
    • Matches and mismatches clearly highlighted.
    • Specific guidance on improving pronunciation.

Example Feedback for User Sentence:
"December and January are nice months to spend in Miami."

  • Matches: ✅

  • Mismatches: ❌

  • Detailed phoneme guidance for targeted improvements.

    User Feedback Example


💻 User Interface

  • Features:
    • Over 500 phonetically diverse prompts to challenge and improve users’ pronunciation skills.
    • Audio examples for each prompt.
    • Side-by-side display of user phoneme input and expected output.

🌟 Future Directions

  • Enhanced Feedback: Provide specific directions for phoneme articulation.
  • Multilingual Support: Expand beyond English for broader accessibility.
  • Integration with AR/VR: Incorporate emerging technologies for immersive learning.
  • Personalized Learning Paths: Tailor exercises based on individual user progress.
  • Advanced Analytics: Use state-of-the-art ML techniques to refine model accuracy further.

⚙️ Tech Stack

  • Model: Wav2Vec 2.0 (ASR Model)
  • Data: TIMIT dataset
  • Techniques: Phoneme extraction/analysis, Dynamic Time Warping, fine-tuning
  • Tools: PyTorch (GPU-enabled training), Gradio

🧩 Getting Started

  1. Clone the repository:
    git clone https://github.com/SamGu-NRX/CogLingo.git
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the application:
    python main.py

📜 License

This project is licensed under the MIT License.

About

Empowering communication by mitigating accent-based discrimination through AI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.4%
  • JavaScript 9.6%