🌍 50-Language Speech-to-Speech Translator

title

emoji

colorFrom

colorTo

sdk

app_file

pinned

license

short_description

models

🌍 50-Language Speech-to-Speech Translator

This Hugging Face Space is a multimodal demo that performs end-to-end speech translation by chaining together speech recognition, machine translation, and text-to-speech synthesis.

It allows users to speak in one language and hear the translated speech in another, supporting 50 languages.

🚀 How It Works

The application follows a linear processing pipeline:

Automatic Speech Recognition (ASR)
Spoken audio is transcribed into text using Whisper (Large v3 Turbo).
Neural Machine Translation (NMT)
The transcribed text is translated into a selected target language using mBART-50, which supports 50 languages.
Text-to-Speech (TTS)
The translated text is converted back into audio using gTTS (Google Text-to-Speech).

The result is a seamless speech-to-speech translation experience.

flowchart TD
    Start([User Records Audio]) --> ASR[Automatic Speech Recognition<br/>openai/whisper-large-v3-turbo]
    ASR --> |Transcribed Text| NMT[Neural Machine Translation<br/>facebook/mbart-large-50-many-to-many-mmt]
    NMT --> |Translated Text<br/>Target Language| TTS[Text-to-Speech<br/>gTTS - Google Text-to-Speech]
    TTS --> Output([Audio Output])
    
    style Start stroke:#2563eb,stroke-width:3px
    style ASR stroke:#dc2626,stroke-width:3px
    style NMT stroke:#7c3aed,stroke-width:3px
    style TTS stroke:#059669,stroke-width:3px
    style Output stroke:#2563eb,stroke-width:3px

🛠️ Tech Stack

UI: Gradio
Speech Recognition: Whisper
Translation: Facebook mBART-50
Text-to-Speech: gTTS
Hosting: Hugging Face Spaces , Vercel

🌐 Supported Languages

The demo supports 50 languages, including but not limited to:

Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Portuguese, Spanish, Vietnamese, and many more.

⚙️ Design Notes

Models were selected to balance language coverage, latency, and availability on Hugging Face Spaces.
Always-on or fast-loading models were preferred to avoid cold-start delays.
The demo focuses on clarity and reliability rather than pushing the largest possible models.

📌 Limitations

Long audio inputs may increase processing time.
Translation quality can vary for less common language pairs.
TTS voices depend on gTTS language support.

📄 License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 50-Language Speech-to-Speech Translator

🚀 How It Works

🛠️ Tech Stack

🌐 Supported Languages

⚙️ Design Notes

📌 Limitations

📄 License

About

Uh oh!

Releases

Packages

Languages

joshchamo/speech-translator

Folders and files

Latest commit

History

Repository files navigation

🌍 50-Language Speech-to-Speech Translator

🚀 How It Works

🛠️ Tech Stack

🌐 Supported Languages

⚙️ Design Notes

📌 Limitations

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages