🚀 Introducing Whisper-Flamingo, an audio-visual speech recognition and translation model! #2231
roudimit
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Excellent work. Some questions.
Thank you for your time. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Excited to introduce Whisper-Flamingo! Check out the video demo below; Whisper-Flamingo can transcribe and translate speech with heavy background noise!
Whisper-Flamingo.teaser.mp4
We convert Whisper into an audio-visual speech recognition model so that it can use both audio and lip-based video as input.
Our audio-visual Whisper-Flamingo significantly outperforms the audio-only Whisper model when tested on noisy audio.
Our models transcribe English speech and translate English speech into 6 languages: Greek, Spanish, French, Italian, Portuguese, and Russian.
We are releasing our audio-visual models in three sizes (Large, Medium, Small), as well as the audio-only models fine-tuned on noisy audio.
Key Methods
Let me know if you have any comments or questions!
Beta Was this translation helpful? Give feedback.
All reactions