This is a commandline tool for creating transcripts of conversations recorded as audio files. It uses the following AI models to achieve that:
pyannote.audiofor indentifying speakersOpenAI Whisperfor transcribing what the speakers say into text
Download ffmpeg and install it.
The script requires a token to the huggingface API for downloading
pyannote.audio models.
Here's how do get one.
In order to download the pyannote.audio models you need to accept their terms
and conditions. More on that here.
Clone this repository using git:
git clone https://github.com/jannawro/verbatim.gitcd verbatim
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txtRun:
python verbatim/main.py \
--audio-file sample.mp3 \
--audio-format mp3 \
--hugging-face-token hf_1234567890 \
--speakers 2 \
--output transcript.txtTo see all options run:
python verbatim/main.py --helpUse whisper models variants according to recommendations from OpenAI.
This can be set via the --whisper-model flag.
From initial testing models smaller than "large" work fine for english. For
satysfying results with other languages "large" is recommended.
Verbatim will spin up the amount of threads specified by --workers. This
number cannot be greater than the number of your CPU cores. Please note that
each worker will also spin up a separate instance of Whisper for parallel
processing. Use carefully togheter with --whisper-model to make sure you
have enough resources to spin up the number of Whispers of designated size.