Replies: 2 comments 2 replies
-
Here is one approach to solve transcription with multiple languages (sample source code in the link)
Other possibilities to consider: |
Beta Was this translation helpful? Give feedback.
-
I have indeed implemented a long post processing script just to unify the time slots between the speaker diarization, transcription and translation. It's working very well indeed, but in some cases ( 2 speakers are talking in the same ) since each model is giving different timestamps! I am just afraid that chunking based on the timestamps of the speaker diarization will reduce the accuracy for sure. Anyway, It looks like this is the best I can do! Regarding AssemblyAI, I am looking for an offline solution indeed, so that's not gonna help much! Thank you |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently using Whisper Large V3 and I'm encountering two main issues with the pipeline shared on HuggingFace:
If the audio has 2 languages, sometimes it processes them without issue, but other times it requires me to select one language. To solve this issue, I need to transcribe the audio in 2 languages separately and then do some post processing. To do so, I need a way to detect the languages present in the audio.
Also, For certain languages like Persian and Urdu (and possibly others), I must explicitly specify the language.
I am using the pipeline here, but there is no way I can detect the language, and checking the transcribe function here, I cant find a way to explicitly specify the language, I am not sure what to do in this case!
{ "detail": "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language=... or make sure all input audio is of the same language." }
Beta Was this translation helpful? Give feedback.
All reactions