Open Source - Free (Beta) web transcription service - fine-tuning models #1684
gustavhartz
started this conversation in
Show and tell
Replies: 2 comments 2 replies
-
does it work in real time , can we add this in a video conference ? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Any followup examples for using the output for fine turning a custom whisper model? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Voxtir
Over the past few months, @MadsFrost and I, have been hard at work developing an open-source transcription service designed specifically for interviews, but probably useful elsewhere, based on Whisper and Pyannote named Voxtir.
You can try the beta version here
Why it's great
Voxtir produces transcriptions in a JSON & HTML format adhering to strict TipTap/Prosemirror schemas. While this makes backward-compatible changes slightly challenging, it offers a significant advantage. These HTML transcripts can be converted into formats suitable for training and fine-tuning your own transcription models.
The format is intended to be easy for users who clean the transcriptions. The editors. We are still experimenting with which formats work best for ease of use/speed. The current logic is, that there has to be placed a "Timestamp" the small green buttons every 30 seconds. This is easy and fast to do after the transcription has been cleaned up, but can also be done along the way.
It's quite important that these are correctly placed. For fine-tuning.
Technology
It's completely hosted in AWS and uses Batch Transform to run the whisper model on audio files stored in S3. This means that the transcriptions has to be shorter than 3 hours to finish within 1 hour with the large model. We will introduce the other models soon, but currently, everything runs on the large one.
If you want to see the terraform code - please let us know. Some diagrams of the infrastructure are available on the repo.
Other
Contribute and feedback
We are working on making it an easy-to-use service for fine-tuning audio models, but also generally good for interview transcriptions. If you have any feedback please let us know by sending an email or opening an issue on the repo.
Images
Beta Was this translation helpful? Give feedback.
All reactions