Replies: 1 comment
-
The paper primarily focused on zero-shot robustness and didn't include fine-tuned performance. We are mildly interested in knowing Whisper's fine-tuned performance, but we don't currently have plans to perform/publish fine-tuning studies, unfortunately. We tried to write decoding.py in an "object-oriented" manner, in hopes to make future extensions like language model integration easier. For example, a language model can be used at the token level by replacing the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all,
Thanks for releasing this with MIT license and making it easy to test out.
Already tried it out with Finnish language.
I have finetuned previously wav2vec2-xlsr for Finnish and made few demos about it. To understand whether in the future I could evaluate this model for my "auto-english-subtitles-demo for Finnish spoken videos" ( Demo available here https://huggingface.co/spaces/Finnish-NLP/Fin-Eng-ASR-autosubtitles )
I would like to know about the following:
Already answered
3. Would it be possible to get word level timestamps from this model? like with Wav2Vec2 in huggingface huggingface/transformers#11307 (Seems that this is already answered here #3 )
Beta Was this translation helpful? Give feedback.
All reactions