Skip to content

joshchamo/speech-translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

77 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

title emoji colorFrom colorTo sdk app_file pinned license short_description models tags sdk_version
50-Language Speech-to-Speech Translator using Whisper & mBART
πŸ—£οΈ
blue
indigo
gradio
app.py
false
mit
50-Language Speech Translator with Whisper & mBART.
openai/whisper-large-v3-turbo
facebook/mbart-large-50-many-to-many-mmt
engine gTTS (Google Text-to-Speech)
speech-to-text
speech-translation
speech-to-speech
transcription
translation
automatic-speech-recognition
text-to-speech
audio-to-audio
multilingual
whisper
mbart
gtts
gradio
audio
voice
6.5.1

🌍 50-Language Speech-to-Speech Translator

This Hugging Face Space is a multimodal demo that performs end-to-end speech translation by chaining together speech recognition, machine translation, and text-to-speech synthesis.

It allows users to speak in one language and hear the translated speech in another, supporting 50 languages.


πŸš€ How It Works

The application follows a linear processing pipeline:

  1. Automatic Speech Recognition (ASR)
    Spoken audio is transcribed into text using Whisper (Large v3 Turbo).

  2. Neural Machine Translation (NMT)
    The transcribed text is translated into a selected target language using mBART-50, which supports 50 languages.

  3. Text-to-Speech (TTS)
    The translated text is converted back into audio using gTTS (Google Text-to-Speech).

The result is a seamless speech-to-speech translation experience.

flowchart TD
    Start([User Records Audio]) --> ASR[Automatic Speech Recognition<br/>openai/whisper-large-v3-turbo]
    ASR --> |Transcribed Text| NMT[Neural Machine Translation<br/>facebook/mbart-large-50-many-to-many-mmt]
    NMT --> |Translated Text<br/>Target Language| TTS[Text-to-Speech<br/>gTTS - Google Text-to-Speech]
    TTS --> Output([Audio Output])
    
    style Start stroke:#2563eb,stroke-width:3px
    style ASR stroke:#dc2626,stroke-width:3px
    style NMT stroke:#7c3aed,stroke-width:3px
    style TTS stroke:#059669,stroke-width:3px
    style Output stroke:#2563eb,stroke-width:3px
Loading

πŸ› οΈ Tech Stack

  • UI: Gradio
  • Speech Recognition: Whisper
  • Translation: Facebook mBART-50
  • Text-to-Speech: gTTS
  • Hosting: Hugging Face Spaces , Vercel

🌐 Supported Languages

The demo supports 50 languages, including but not limited to:

Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Portuguese, Spanish, Vietnamese, and many more.


βš™οΈ Design Notes

  • Models were selected to balance language coverage, latency, and availability on Hugging Face Spaces.
  • Always-on or fast-loading models were preferred to avoid cold-start delays.
  • The demo focuses on clarity and reliability rather than pushing the largest possible models.

πŸ“Œ Limitations

  • Long audio inputs may increase processing time.
  • Translation quality can vary for less common language pairs.
  • TTS voices depend on gTTS language support.

πŸ“„ License

This project is released under the MIT License.