Open Source - Free (Beta) web transcription service - fine-tuning models #1684

gustavhartz · 2023-09-30T11:00:27Z

gustavhartz
Sep 30, 2023

Voxtir

Over the past few months, @MadsFrost and I, have been hard at work developing an open-source transcription service designed specifically for interviews, but probably useful elsewhere, based on Whisper and Pyannote named Voxtir.

You can try the beta version here

Why it's great

Voxtir produces transcriptions in a JSON & HTML format adhering to strict TipTap/Prosemirror schemas. While this makes backward-compatible changes slightly challenging, it offers a significant advantage. These HTML transcripts can be converted into formats suitable for training and fine-tuning your own transcription models.

The format is intended to be easy for users who clean the transcriptions. The editors. We are still experimenting with which formats work best for ease of use/speed. The current logic is, that there has to be placed a "Timestamp" the small green buttons every 30 seconds. This is easy and fast to do after the transcription has been cleaned up, but can also be done along the way.

It's quite important that these are correctly placed. For fine-tuning.

Technology

It's completely hosted in AWS and uses Batch Transform to run the whisper model on audio files stored in S3. This means that the transcriptions has to be shorter than 3 hours to finish within 1 hour with the large model. We will introduce the other models soon, but currently, everything runs on the large one.

If you want to see the terraform code - please let us know. Some diagrams of the infrastructure are available on the repo.

Other

It's easy to share projects
It exports transcriptions to Word in the frontend

Contribute and feedback

We are working on making it an easy-to-use service for fine-tuning audio models, but also generally good for interview transcriptions. If you have any feedback please let us know by sending an email or opening an issue on the repo.

Images

shephinphilip · 2023-10-25T07:30:56Z

shephinphilip
Oct 25, 2023

does it work in real time , can we add this in a video conference ?

1 reply

gustavhartz Oct 29, 2023
Author

Narh it is way to slow. The smallest models could probably work, but you would need an entirely different setup and some clever streaming algorithms. This is just batch processing

maximveksler · 2023-10-29T21:39:42Z

maximveksler
Oct 29, 2023

Any followup examples for using the output for fine turning a custom whisper model?

1 reply

gustavhartz Oct 29, 2023
Author

You can see an example of how to fine-tune here
https://github.com/Voxtir/whisper-fine-tuning-voxtir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Open Source - Free (Beta) web transcription service - fine-tuning models #1684

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Open Source - Free (Beta) web transcription service - fine-tuning models #1684

Uh oh!

gustavhartz Sep 30, 2023

Voxtir

Why it's great

Technology

Other

Contribute and feedback

Images

Replies: 2 comments · 2 replies

Uh oh!

shephinphilip Oct 25, 2023

Uh oh!

gustavhartz Oct 29, 2023 Author

Uh oh!

maximveksler Oct 29, 2023

Uh oh!

gustavhartz Oct 29, 2023 Author

gustavhartz
Sep 30, 2023

Replies: 2 comments 2 replies

shephinphilip
Oct 25, 2023

gustavhartz Oct 29, 2023
Author

maximveksler
Oct 29, 2023

gustavhartz Oct 29, 2023
Author