verbatim

This is a commandline tool for creating transcripts of conversations recorded as audio files. It uses the following AI models to achieve that:

pyannote.audio for indentifying speakers
OpenAI Whisper for transcribing what the speakers say into text

Requirements

ffmpeg installed

Download ffmpeg and install it.

Hugging face token

The script requires a token to the huggingface API for downloading pyannote.audio models. Here's how do get one.

Accepting terms and conditions

In order to download the pyannote.audio models you need to accept their terms and conditions. More on that here.

How to run

Get the code

Clone this repository using git:

git clone https://github.com/jannawro/verbatim.git

Installing dependencies

cd verbatim
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Run

Run:

python verbatim/main.py \
    --audio-file sample.mp3 \
    --audio-format mp3 \
    --hugging-face-token hf_1234567890 \
    --speakers 2 \
    --output transcript.txt

To see all options run:

python verbatim/main.py --help

Recommendations

Use the optimal --whisper-model

Use whisper models variants according to recommendations from OpenAI. This can be set via the --whisper-model flag. From initial testing models smaller than "large" work fine for english. For satysfying results with other languages "large" is recommended.

Use the optimal number of --workers

Verbatim will spin up the amount of threads specified by --workers. This number cannot be greater than the number of your CPU cores. Please note that each worker will also spin up a separate instance of Whisper for parallel processing. Use carefully togheter with --whisper-model to make sure you have enough resources to spin up the number of Whispers of designated size.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
verbatim		verbatim
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

verbatim

Requirements

ffmpeg installed

Hugging face token

Accepting terms and conditions

How to run

Get the code

Installing dependencies

Run

Recommendations

Use the optimal --whisper-model

Use the optimal number of --workers

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

jannawro/verbatim

Folders and files

Latest commit

History

Repository files navigation

verbatim

Requirements

ffmpeg installed

Hugging face token

Accepting terms and conditions

How to run

Get the code

Installing dependencies

Run

Recommendations

Use the optimal --whisper-model

Use the optimal number of --workers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages