Skip to content

Commit 6287b60

Browse files
committed
Update markdown documentation on audio conversion
1 parent 8a5a724 commit 6287b60

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

whisper.cpp/doc/getting-started.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,19 +40,23 @@ The `--no-prints` is optional. It's helpful in avoiding a lot of verbose
4040
logging and statistical information from being printed, which is useful
4141
when writing shell scripts.
4242

43-
## Converting MP3 to WAV
43+
## Supported Audio Formats
4444

45-
Whisperfile only currently understands .wav files. So if you have files
46-
in a different audio format, you need to convert them to wav beforehand.
47-
One great tool for doing that is sox (your swiss army knife for audio).
48-
It's easily installed and used on Debian systems as follows:
45+
Whisperfile prefers that the input file be a 16khz .wav file with 16-bit
46+
signed linear samples that's stereo or mono. Otherwise it'll attempt to
47+
convert your audiofile automatically using an internal library. The MP3,
48+
FLAC, and Ogg Vorbis Theora formats are supported across platforms.
49+
50+
For example, here's an audio recording of a famous poem in MP3 format:
4951

5052
```
51-
sudo apt install sox libsox-fmt-all
5253
wget https://archive.org/download/raven/raven_poe_64kb.mp3
53-
sox raven_poe_64kb.mp3 -r 16k raven_poe_64kb.wav
54+
o//whisper.cpp/main -m whisper-tiny.en-q5_1.bin -f raven_poe_64kb.mp3 -pc
5455
```
5556

57+
Here we also passed the `-pc` flag to get color-coded terminal output
58+
which communicates the confidence of transcription.
59+
5660
## Higher Quality Models
5761

5862
The tiny model may get some words wrong. For example, it might think
@@ -61,14 +65,14 @@ enables whisperfile to decode The Raven perfectly. However it's slower.
6165

6266
```
6367
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.en.bin
64-
o//whisper.cpp/main -m ggml-medium.en.bin -f raven_poe_64kb.wav --no-prints
68+
o//whisper.cpp/main -m ggml-medium.en.bin -f raven_poe_64kb.mp3 --no-prints
6569
```
6670

6771
Lastly, there's the large model, which is the best, but also slowest.
6872

6973
```
7074
wget -O whisper-large-v3.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.bin
71-
o//whisper.cpp/main -m whisper-large-v3.bin -f raven_poe_64kb.wav --no-prints
75+
o//whisper.cpp/main -m whisper-large-v3.bin -f raven_poe_64kb.mp3 --no-prints
7276
```
7377

7478
## Installation

0 commit comments

Comments
 (0)