Happy to announce easy_whisper #1971

gyllila · 2024-01-21T21:09:20Z

gyllila
Jan 21, 2024

I’ve written a Python script to make transcription/translation with Whisper easier for me. Because I find it very useful, I also made a Python library out of it. Feel free to use it. :)

The main features are:

both CLI and (tkinter) GUI user interface
faster processing of long audio even on CPU
output in .txt format with time stamps in milliseconds precision

Here the GitHub link:
https://github.com/gyllila/easy_whisper

PS. „faster processing“ is possibly inaccurate. Out of curiosity, I just tested transcribing a 13:23 mp3 file with plain vanilla OpenAI Whisper („short“ in easy_whisper), it only took 200s to finish! I‘m really impressed. To be fair, I used the base.en model instead of a large one, but the transcript is nonetheless of high quality, almost free of error, and that on my more than 5y old laptop! These 200s don’t include time for loading the model, because in the GUI, once a model is loaded, it won’t be reloaded until another model is selected.

PSS. The next update, version 1.1.0, is even faster and only took 197s for the above audio. The higher speed results from lower-level access to Whisper, not splitting in sentences, which I thus removed. I‘ll then add the option of API access for whose who prefer using OpenAI‘s online transcription/translation service.

Purfview · 2024-01-22T15:37:20Z

Purfview
Jan 22, 2024

faster processing of long audio

What makes it faster?

6 replies

Purfview Jan 22, 2024

Just splitting up shouldn't make it faster.

gyllila Jan 22, 2024
Author

It only speeds up processing longer audio (more than 15m or so).

Purfview Jan 22, 2024

Quick looked at your repo, I don't see what would make it faster, with splitting up files you lose context, that's it.

Use --condition_on_previous_text=False to get similar effect.

ghost Jan 23, 2024

You also reduce repeats. Shouldn't use --condition_on_previous_text=false if you're worried about context. Set that to true and use shorter chucks with a repetition penalty. Logically grouped smaller "files" always process faster.

gyllila Jan 23, 2024
Author

Yes, that’s why I didn’t use --condition_on_previous_text=false, because I don’t want to lose context (see my answer below). Because I replied from iPhone, it’s shown like a new comment with Purfview‘s suggestion in citation.

gyllila · 2024-01-22T18:55:30Z

gyllila
Jan 22, 2024
Author

More correctly: by splitting up in sentences, I reduced the context length to within the sentence, which makes it faster while maintaining quality. I can assure you that already with base.en you can transcribe an English audiobook almost free of errors. I‘m really impressed by the good quality of OpenAI Whisper.Von meinem iPhone gesendetAm 22.01.2024 um 18:42 schrieb Purfview ***@***.***>: Quick looked at your repo, I don't see what would make it faster, with splitting up files you lose context, that's it. Use --condition_on_previous_text=False to get similar effect. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

5 replies

phineas-pta Jan 22, 2024

add a benchmark to support your claim

gyllila Jan 22, 2024
Author

Sorry, I don’t work in IT and wrote the library for my personal use. I’m using it very frequently and thought maybe others could need it. I just transcribed 4 chapters of an audiobook with an average duration of 10min. While it’s working, I used to read some short story for less than five minutes, and when I’m done, the program is already done.

phineas-pta Jan 22, 2024

no need for a claim like that

speed up is an important subject, in particular with current AI trend, any claim of speed up a big neural nets attract attention

gyllila Jan 22, 2024
Author

I didn’t change anything on the AI or neural nets part of OpenAI Whisper and never made such claim. As you can also read in my Repo, I speed up by splitting longer audio in sentences. As a heavy user I can confirm that this gain in speed is a very convenient feature to use.

phineas-pta Jan 23, 2024

it's your user case, u only tested small model on <1h english audio, how can u define yourself a "heavy user" just like that, u can skim through other posts here to see how much heavier other use cases can be
i said any claim speed up the neural nets attract attention, doesn't matter what u modify or not, that's not the point ; the point is claim of speed up

Purfview · 2024-01-23T01:16:26Z

Purfview
Jan 23, 2024

It doesn't work for me:

Empty txt files.

14 replies

Purfview Jan 24, 2024

...you are really desperate to smear...

Delusions are strong,.. don't forget to breath.
Unsubscribed...

smlkdev Feb 2, 2024

I got the same results. Empty .txt file. Running at OSX.

➜  whisper easy_whisper test.mp3 -m tiny.en
Initializing, wait ...
Processing test.mp3 in , wait ...
Done!

-rw-r--r--@  1 xyz  staff    78K  2 lut 14:47 test.mp3
-rw-r--r--   1 xyz  staff     0B  2 lut 14:57 test.txt
-rw-r--r--   1 xyz  staff   432K  2 lut 14:52 test.wav

gyllila Feb 2, 2024
Author

If you started the command with whisper, then you actually started OpenAI Whisper, and I‘m wondering how did you get this output…

smlkdev Feb 2, 2024

haha, be a a bit more prescient! :) that's just folder name that unfortunately matches app name 😄 so I have similar problem like previous poster.

Anyway I found nice docker image with builtin rest api and already tried it and it's working so I can process audio even from webbrowser. Maybe you can use it for your purposes:

https://github.com/ahmetoner/whisper-asr-webservice/

have a nice day!

gyllila Feb 3, 2024
Author

So you are also a developer like that purfview, then it must be easy for you to see there is a codex issue either in the OS, or ffmpeg, or pydub. Instead of giving any meaningful feedback to help a normal user get along with easy_whisper, you try to mislead them to stay away from this library altogether so that they can’t benefit from the alternative to your paid product. First, wav files work definitely, second, mp3 files sometimes too, though it could cease working after an OS update. Third, feel free to also „unsubscribe“, I don’t get paid by „subscribers“ anyway.

PratikAg001 · 2024-02-08T20:41:34Z

PratikAg001
Feb 8, 2024

I am getting this error,i have tried reinstalling the libraries,can you help me?

1 reply

gyllila Feb 8, 2024
Author

Hi, it looks like that ffmpeg isn’t installed properly on your PC. To quick-check if ffmpeg is installed properly, you can type in the command window::

ffmpeg —help

If you don’t see the help message, you need to install ffmpeg. OpenAI Whisper has a brief guide for installing ffmpeg, here the link:

https://github.com/openai/whisper?tab=readme-ov-file#setup

I have recently found out that the current OpenAI Whisper is already fast and can transcribe a 13:23 mp3 file within 200s (excluding model loading time) with base.en (the quality is good, almost error-free). I’m not sure if OpenAI Whisper needs ffmpeg for mp3, but you can try with the command whisper or alternatively using easy_whisper::

easy_whisper path_to_your_mp3 -s

Happy to announce easy_whisper #1971

Uh oh!

Uh oh!

Replies: 4 comments · 26 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gyllila Jan 22, 2024 Author

Uh oh!

Uh oh!

Uh oh!

gyllila Jan 23, 2024 Author

Uh oh!

gyllila Jan 22, 2024 Author

Uh oh!

Uh oh!

gyllila Jan 22, 2024 Author

Uh oh!

Uh oh!

gyllila Jan 22, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gyllila Feb 2, 2024 Author

Uh oh!

Uh oh!

gyllila Feb 3, 2024 Author

Uh oh!

Uh oh!

gyllila Feb 8, 2024 Author

Replies: 4 comments 26 replies

gyllila Jan 22, 2024
Author

gyllila Jan 23, 2024
Author

gyllila
Jan 22, 2024
Author

gyllila Jan 22, 2024
Author

gyllila Jan 22, 2024
Author

gyllila Feb 2, 2024
Author

gyllila Feb 3, 2024
Author

gyllila Feb 8, 2024
Author