Locsum

Terminal tool for batch offline transcription and summarization of audio/video files.

Hardware Requirements

Transcription can run on a CPU without a GPU, but high-quality summarization requires significant GPU resources. I initially used an NVIDIA Jetson Orin Nano Super Developer Kit. While capable, its 8GB unified memory limited me to ~8B parameter models, which produced subpar summaries.

I recently upgraded to an ASUS Ascent GX10, a lower-cost alternative to the NVIDIA DGX Spark. With 128GB of unified memory, I can now run much larger models. I am currently running a 30B parameter model (quantized) with excellent results. Theoretically, the hardware supports models up to 200B parameters.

Dependencies

Locsum requires the following external libraries:

markdown-it: Used for Markdown to HTML conversion
ollama: Used for text summarization
PyMuPDF: Used for PDF analysis
weasyprint: Used for HTML to PDF conversion
whisper: Used for audio transcription

These libraries and their sub-dependencies will be installed automatically when you install Locsum.

Installation

Prerequisites

Ensure ffmpeg is installed on your system
Install Ollama and pull a model to use for the summarization (e.g. ollama pull gemma3:4b)

Installation with `pipx`

It is recommended to install Locsum within a virtual environment to avoid conflicts with system packages. Some Linux distributions enforce this. You can use pipx to handle the virtual environment automatically, or create one manually and use pip.

pipx installs Locsum in an isolated environment and makes it available globally.

1. Install pipx

Linux (Debian / Ubuntu / Mint)
```
sudo apt install pipx
pipx ensurepath
```

Linux (Other) / macOS

python3 -m pip install --user pipx
python3 -m pipx ensurepath

Windows

python -m pip install --user pipx
python -m pipx ensurepath

You may need to reopen your terminal for the PATH changes to take effect. If you encounter a problem, please refer to the official pipx documentation.

2. Install Locsum

pipx install locsum

Installation with `pip`

If you prefer to manage the virtual environment manually, you can create and activate it by following this tutorial. Then install Locsum:

pip install locsum

NVIDIA GPU Support

When installing Locsum, the PyTorch library is installed as a sub-dependency to the whisper library. However, the version installed by default doesn't include GPU support. For the transcription to benefit from GPU acceleration, you need to either upgrade PyTorch, or to install whisper.cpp as a replacement to the original whisper library. Locsum supports both options. Whisper.cpp is faster, but the speed gain will depend on your hardware. The first option is simpler, whereas the second option requires some compiling.

Option 1: Upgrade PyTorch

1. Get the CUDA version

Run nvidia-smi to find your driver version (13.0 in my case).

2. Upgrade PyTorch

Uninstall PyTorch and reinstall the right CUDA build (cu130 in my case).

If Locsum is installed with pipx

pipx runpip locsum uninstall torch
pipx inject locsum torch --index-url https://download.pytorch.org/whl/cu130

If Locsum is installed with pip (with the virtual environment activated)

pip uninstall torch
pip install torch --index-url https://download.pytorch.org/whl/cu130

3. Verify installation

Run locsum -c to check that CUDA is available.

PyTorch 2.10.0+cu130
CUDA 13.0 is available

Option 2: Install Whisper.cpp

Whisper.cpp doesn't need PyTorch, but it still requires CUDA and cuBLAS to be installed on your system. Refer to the official whisper.cpp documentation for more information.

1. Install libraries for ffmpeg integration

If you want to be able to transcribe files such as .aac without first having to convert them to .wav, you need to compile whisper.cpp with ffmpeg support. It seems however this option is only available on Linux.

sudo apt install libavcodec-dev libavformat-dev libavutil-dev

2. Clone whisper.cpp repository

cd ~  # Or wherever you wish to install whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp

3. Build whisper.cpp

To enable CUDA and ffmpeg support, you need to use the -DGGML_CUDA=1 and -DWHISPER_FFMPEG=yes arguments.

cmake -B build -DGGML_CUDA=1 -DWHISPER_FFMPEG=yes
cmake --build build -j

4. Verify installation

sh ./models/download-ggml-model.sh base.en  # Download the base.en model in ggml format
ffmpeg -i samples/jfk.wav samples/jfk.aac   # Convert the audio file to .aac format
./build/bin/whisper-cli -f samples/jfk.aac  # Transcribe the audio file

5. Configure Locsum

To enable whisper.cpp in Locsum, ensure the CLI binary and model directory are correctly specified in the configuration file (cli_path and models_path settings). If you installed whisper.cpp in your home directory, the default paths should work out of the box.

For optimal speed, experiment to find the fastest combination of threads and processors on your hardware. On my device (ASUS GX10, 20-core ARM CPU), setting threads=1 and processors=18 reduced inference time by ~3× compared to defaults (threads=4 and processors=1).

Use the bench.py script included with whisper.cpp to benchmark your setup. For example:

python3 scripts/bench.py -f samples/jfk.wav -t 1,2,4,8 -p 1,2,4,8,16

Deployments

View all releases on:

PyPI Releases
GitHub Releases

Usage

Basic Usage

locsum [arguments] FILE [FILE ...]

Arguments

Argument	Short Flag	Description
`--help`	`-h`	Show help message
`--check-cuda`	`-c`	Check if CUDA is available
`--language`	`-l`	Set the language of the audio
`--no-colors`	`-n`	Disable color output
`--no-compact`	`-N`	Disable PDF compaction
`--ollama-model`	`-o`	Set the Ollama model for summarization
`--openai-whisper`	`-O`	Use OpenAI's Whisper even if Whisper.cpp is available
`--reset-config`	`-r`	Reset configuration file to default
`--transcribe-only`	`-t`	Transcribe only, don't generate a summary
`--tiny`	`-T`	Use tiny Whisper and Ollama models for testing
`--version`	`-v`	Show program's version number and exit
`--whisper-model`	`-w`	Set the Whisper model for transcription
`--filter-warnings`	`-W`	Suppress warnings from PyTorch

Configuration

When you run Locsum for the first time, a config.toml file is automatically created. Its location depends on your operating system (typical paths are listed below):

Linux: ~/.config/locsum
macOS: ~/Library/Preferences/locsum
Windows: C:/Users/YourUsername/AppData/Roaming/locsum

You can edit this file to customize various settings. Common customizations include Whisper and Ollama models to use.

VPN Setup

Since the goal is to process files locally, we might as well download them as privately as possible. Here is how I installed and configured WireGuard VPN on my GX10.

First update your system with sudo apt update && sudo apt upgrade. If the kernel is updated during this step, a reboot is required before continuing.

Install WireGuard: sudo apt install wireguard
Download WireGuard configuration from my Proton VPN account
Copy the configuration file to /etc/wireguard/protonvpn.conf and chown root:root (with sudo)
Test connection manually
- Connect: sudo wg-quick up protonvpn
- Check connection: sudo wg
- Check IP address: curl -4 ip.me
- Disconnect: sudo wg-quick down protonvpn
Connect at boot: sudo systemctl enable --now wg-quick@protonvpn.service
Reboot and check VPN connection / IP address

Radio Deactivation

For a truly air-gapped system and to eliminate radiofrequency radiation, use the following methods to disable antennas:

Disable Bluetooth

sudo systemctl disable --now bluetooth
sudo systemctl mask bluetooth
sudo rfkill block bluetooth

Disable wifi

sudo systemctl disable --now wpa_supplicant
sudo systemctl mask wpa_supplicant
sudo rfkill block wifi
nmcli radio wifi off  # Redundant, but just in case

Reboot and check

sudo systemctl status bluetooth
sudo systemctl status wpa_supplicant
sudo rfkill list

Kernel-Level Deactivation

Even after disabling services, the firmware might still attempt background scans, emitting bursts of radiofrequency energy. To completely silence the device, you must prevent the kernel module from loading:

Identify the module

lspci -k  # Look for the wireless controller and find the module name (e.g. mt7925e)

Blacklist the module

# Replace [WIRELESS_MODULE] with the name found above (e.g. mt7925e)
echo "blacklist [WIRELESS_MODULE]" | sudo tee /etc/modprobe.d/blacklist-wifi.conf
sudo update-initramfs -u

Reboot and check

lsmod | grep [WIRELESS_MODULE]  # Should return nothing

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Thanks to the creators and contributors of all the powerful libraries used in this project for making it possible.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
img		img
locsum		locsum
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Locsum

Hardware Requirements

Dependencies

Installation

Prerequisites

Installation with `pipx`

Installation with `pip`

NVIDIA GPU Support

Option 1: Upgrade PyTorch

Option 2: Install Whisper.cpp

Deployments

Usage

Basic Usage

Arguments

Configuration

VPN Setup

Radio Deactivation

Kernel-Level Deactivation

License

Acknowledgements

About

Uh oh!

Releases 5

Languages

License

monsieurlinux/locsum

Folders and files

Latest commit

History

Repository files navigation

Locsum

Hardware Requirements

Dependencies

Installation

Prerequisites

Installation with pipx

Installation with pip

NVIDIA GPU Support

Option 1: Upgrade PyTorch

Option 2: Install Whisper.cpp

Deployments

Usage

Basic Usage

Arguments

Configuration

VPN Setup

Radio Deactivation

Kernel-Level Deactivation

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Languages

Installation with `pipx`

Installation with `pip`