Terminal tool for batch offline transcription and summarization of audio/video files.
Transcription can run on a CPU without a GPU, but high-quality summarization requires significant GPU resources. I initially used an NVIDIA Jetson Orin Nano Super Developer Kit. While capable, its 8GB unified memory limited me to ~8B parameter models, which produced subpar summaries.
I recently upgraded to an ASUS Ascent GX10, a lower-cost alternative to the NVIDIA DGX Spark. With 128GB of unified memory, I can now run much larger models. I am currently running a 30B parameter model (quantized) with excellent results. Theoretically, the hardware supports models up to 200B parameters.
Locsum requires the following external libraries:
- markdown-it: Used for Markdown to HTML conversion
- ollama: Used for text summarization
- PyMuPDF: Used for PDF analysis
- weasyprint: Used for HTML to PDF conversion
- whisper: Used for audio transcription
These libraries and their sub-dependencies will be installed automatically when you install Locsum.
- Ensure
ffmpegis installed on your system - Install Ollama and pull a model to use for the summarization (e.g.
ollama pull gemma3:4b)
It is recommended to install Locsum within a virtual environment to avoid conflicts with system packages. Some Linux distributions enforce this. You can use pipx to handle the virtual environment automatically, or create one manually and use pip.
pipx installs Locsum in an isolated environment and makes it available globally.
1. Install pipx
-
Linux (Debian / Ubuntu / Mint)
sudo apt install pipx pipx ensurepath
-
Linux (Other) / macOS
python3 -m pip install --user pipx python3 -m pipx ensurepath
-
Windows
python -m pip install --user pipx python -m pipx ensurepath
You may need to reopen your terminal for the PATH changes to take effect. If you encounter a problem, please refer to the official pipx documentation.
2. Install Locsum
pipx install locsumIf you prefer to manage the virtual environment manually, you can create and activate it by following this tutorial. Then install Locsum:
pip install locsumWhen installing Locsum, the PyTorch library is installed as a sub-dependency to the whisper library. However, the version installed by default doesn't include GPU support. For the transcription to benefit from GPU acceleration, you need to either upgrade PyTorch, or to install whisper.cpp as a replacement to the original whisper library. Locsum supports both options. Whisper.cpp is faster, but the speed gain will depend on your hardware. The first option is simpler, whereas the second option requires some compiling.
1. Get the CUDA version
Run nvidia-smi to find your driver version (13.0 in my case).
2. Upgrade PyTorch
Uninstall PyTorch and reinstall the right CUDA build (cu130 in my case).
-
If Locsum is installed with
pipxpipx runpip locsum uninstall torch pipx inject locsum torch --index-url https://download.pytorch.org/whl/cu130
-
If Locsum is installed with
pip(with the virtual environment activated)pip uninstall torch pip install torch --index-url https://download.pytorch.org/whl/cu130
3. Verify installation
Run locsum -c to check that CUDA is available.
PyTorch 2.10.0+cu130
CUDA 13.0 is available
Whisper.cpp doesn't need PyTorch, but it still requires CUDA and cuBLAS to be installed on your system. Refer to the official whisper.cpp documentation for more information.
1. Install libraries for ffmpeg integration
If you want to be able to transcribe files such as .aac without first having to convert them to .wav, you need to compile whisper.cpp with ffmpeg support. It seems however this option is only available on Linux.
sudo apt install libavcodec-dev libavformat-dev libavutil-dev2. Clone whisper.cpp repository
cd ~ # Or wherever you wish to install whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp3. Build whisper.cpp
To enable CUDA and ffmpeg support, you need to use the -DGGML_CUDA=1 and -DWHISPER_FFMPEG=yes arguments.
cmake -B build -DGGML_CUDA=1 -DWHISPER_FFMPEG=yes
cmake --build build -j4. Verify installation
sh ./models/download-ggml-model.sh base.en # Download the base.en model in ggml format
ffmpeg -i samples/jfk.wav samples/jfk.aac # Convert the audio file to .aac format
./build/bin/whisper-cli -f samples/jfk.aac # Transcribe the audio file5. Configure Locsum
To enable whisper.cpp in Locsum, ensure the CLI binary and model directory are correctly specified in the configuration file (cli_path and models_path settings). If you installed whisper.cpp in your home directory, the default paths should work out of the box.
For optimal speed, experiment to find the fastest combination of threads and processors on your hardware. On my device (ASUS GX10, 20-core ARM CPU), setting threads=1 and processors=18 reduced inference time by ~3× compared to defaults (threads=4 and processors=1).
Use the bench.py script included with whisper.cpp to benchmark your setup. For example:
python3 scripts/bench.py -f samples/jfk.wav -t 1,2,4,8 -p 1,2,4,8,16View all releases on:
locsum [arguments] FILE [FILE ...]| Argument | Short Flag | Description |
|---|---|---|
--help |
-h |
Show help message |
--check-cuda |
-c |
Check if CUDA is available |
--language |
-l |
Set the language of the audio |
--no-colors |
-n |
Disable color output |
--no-compact |
-N |
Disable PDF compaction |
--ollama-model |
-o |
Set the Ollama model for summarization |
--openai-whisper |
-O |
Use OpenAI's Whisper even if Whisper.cpp is available |
--reset-config |
-r |
Reset configuration file to default |
--transcribe-only |
-t |
Transcribe only, don't generate a summary |
--tiny |
-T |
Use tiny Whisper and Ollama models for testing |
--version |
-v |
Show program's version number and exit |
--whisper-model |
-w |
Set the Whisper model for transcription |
--filter-warnings |
-W |
Suppress warnings from PyTorch |
When you run Locsum for the first time, a config.toml file is automatically created. Its location depends on your operating system (typical paths are listed below):
- Linux:
~/.config/locsum - macOS:
~/Library/Preferences/locsum - Windows:
C:/Users/YourUsername/AppData/Roaming/locsum
You can edit this file to customize various settings. Common customizations include Whisper and Ollama models to use.
Since the goal is to process files locally, we might as well download them as privately as possible. Here is how I installed and configured WireGuard VPN on my GX10.
First update your system with sudo apt update && sudo apt upgrade. If the kernel is updated during this step, a reboot is required before continuing.
- Install WireGuard:
sudo apt install wireguard - Download WireGuard configuration from my Proton VPN account
- Copy the configuration file to
/etc/wireguard/protonvpn.confandchown root:root(with sudo) - Test connection manually
- Connect:
sudo wg-quick up protonvpn - Check connection:
sudo wg - Check IP address:
curl -4 ip.me - Disconnect:
sudo wg-quick down protonvpn
- Connect:
- Connect at boot:
sudo systemctl enable --now wg-quick@protonvpn.service - Reboot and check VPN connection / IP address
For a truly air-gapped system and to eliminate radiofrequency radiation, use the following methods to disable antennas:
-
Disable Bluetooth
sudo systemctl disable --now bluetooth sudo systemctl mask bluetooth sudo rfkill block bluetooth
-
Disable wifi
sudo systemctl disable --now wpa_supplicant sudo systemctl mask wpa_supplicant sudo rfkill block wifi nmcli radio wifi off # Redundant, but just in case -
Reboot and check
sudo systemctl status bluetooth sudo systemctl status wpa_supplicant sudo rfkill list
Even after disabling services, the firmware might still attempt background scans, emitting bursts of radiofrequency energy. To completely silence the device, you must prevent the kernel module from loading:
-
Identify the module
lspci -k # Look for the wireless controller and find the module name (e.g. mt7925e) -
Blacklist the module
# Replace [WIRELESS_MODULE] with the name found above (e.g. mt7925e) echo "blacklist [WIRELESS_MODULE]" | sudo tee /etc/modprobe.d/blacklist-wifi.conf sudo update-initramfs -u
-
Reboot and check
lsmod | grep [WIRELESS_MODULE] # Should return nothing
Copyright (c) 2026 Monsieur Linux
This project is licensed under the MIT License. See the LICENSE file for details.
Thanks to the creators and contributors of all the powerful libraries used in this project for making it possible.
