Skip to content

Terminal tool for batch offline transcription and summarization of audio/video files.

License

Notifications You must be signed in to change notification settings

monsieurlinux/locsum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Locsum: Batch Offline Transcription and Summarization of Videos

Locsum

PyPI License

Terminal tool for batch offline transcription and summarization of audio/video files.

Hardware Requirements

Transcription can run on a CPU without a GPU, but high-quality summarization requires significant GPU resources. I initially used an NVIDIA Jetson Orin Nano Super Developer Kit. While capable, its 8GB unified memory limited me to ~8B parameter models, which produced subpar summaries.

I recently upgraded to an ASUS Ascent GX10, a lower-cost alternative to the NVIDIA DGX Spark. With 128GB of unified memory, I can now run much larger models. I am currently running a 30B parameter model (quantized) with excellent results. Theoretically, the hardware supports models up to 200B parameters.

Dependencies

Locsum requires the following external libraries:

These libraries and their sub-dependencies will be installed automatically when you install Locsum.

Installation

Prerequisites

  • Ensure ffmpeg is installed on your system
  • Install Ollama and pull a model to use for the summarization (e.g. ollama pull gemma3:4b)

Installation with pipx

It is recommended to install Locsum within a virtual environment to avoid conflicts with system packages. Some Linux distributions enforce this. You can use pipx to handle the virtual environment automatically, or create one manually and use pip.

pipx installs Locsum in an isolated environment and makes it available globally.

1. Install pipx

  • Linux (Debian / Ubuntu / Mint)

    sudo apt install pipx
    pipx ensurepath
  • Linux (Other) / macOS

    python3 -m pip install --user pipx
    python3 -m pipx ensurepath
  • Windows

    python -m pip install --user pipx
    python -m pipx ensurepath

You may need to reopen your terminal for the PATH changes to take effect. If you encounter a problem, please refer to the official pipx documentation.

2. Install Locsum

pipx install locsum

Installation with pip

If you prefer to manage the virtual environment manually, you can create and activate it by following this tutorial. Then install Locsum:

pip install locsum

NVIDIA GPU Support

When installing Locsum, the PyTorch library is installed as a sub-dependency to the whisper library. However, the version installed by default doesn't include GPU support. For the transcription to benefit from GPU acceleration, you need to either upgrade PyTorch, or to install whisper.cpp as a replacement to the original whisper library. Locsum supports both options. Whisper.cpp is faster, but the speed gain will depend on your hardware. The first option is simpler, whereas the second option requires some compiling.

Option 1: Upgrade PyTorch

1. Get the CUDA version

Run nvidia-smi to find your driver version (13.0 in my case).

2. Upgrade PyTorch

Uninstall PyTorch and reinstall the right CUDA build (cu130 in my case).

  • If Locsum is installed with pipx

    pipx runpip locsum uninstall torch
    pipx inject locsum torch --index-url https://download.pytorch.org/whl/cu130
  • If Locsum is installed with pip (with the virtual environment activated)

    pip uninstall torch
    pip install torch --index-url https://download.pytorch.org/whl/cu130

3. Verify installation

Run locsum -c to check that CUDA is available.

PyTorch 2.10.0+cu130
CUDA 13.0 is available

Option 2: Install Whisper.cpp

Whisper.cpp doesn't need PyTorch, but it still requires CUDA and cuBLAS to be installed on your system. Refer to the official whisper.cpp documentation for more information.

1. Install libraries for ffmpeg integration

If you want to be able to transcribe files such as .aac without first having to convert them to .wav, you need to compile whisper.cpp with ffmpeg support. It seems however this option is only available on Linux.

sudo apt install libavcodec-dev libavformat-dev libavutil-dev

2. Clone whisper.cpp repository

cd ~  # Or wherever you wish to install whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp

3. Build whisper.cpp

To enable CUDA and ffmpeg support, you need to use the -DGGML_CUDA=1 and -DWHISPER_FFMPEG=yes arguments.

cmake -B build -DGGML_CUDA=1 -DWHISPER_FFMPEG=yes
cmake --build build -j

4. Verify installation

sh ./models/download-ggml-model.sh base.en  # Download the base.en model in ggml format
ffmpeg -i samples/jfk.wav samples/jfk.aac   # Convert the audio file to .aac format
./build/bin/whisper-cli -f samples/jfk.aac  # Transcribe the audio file

5. Configure Locsum

To enable whisper.cpp in Locsum, ensure the CLI binary and model directory are correctly specified in the configuration file (cli_path and models_path settings). If you installed whisper.cpp in your home directory, the default paths should work out of the box.

For optimal speed, experiment to find the fastest combination of threads and processors on your hardware. On my device (ASUS GX10, 20-core ARM CPU), setting threads=1 and processors=18 reduced inference time by ~3× compared to defaults (threads=4 and processors=1).

Use the bench.py script included with whisper.cpp to benchmark your setup. For example:

python3 scripts/bench.py -f samples/jfk.wav -t 1,2,4,8 -p 1,2,4,8,16

Deployments

View all releases on:

Usage

Basic Usage

locsum [arguments] FILE [FILE ...]

Arguments

Argument Short Flag Description
--help -h Show help message
--check-cuda -c Check if CUDA is available
--language -l Set the language of the audio
--no-colors -n Disable color output
--no-compact -N Disable PDF compaction
--ollama-model -o Set the Ollama model for summarization
--openai-whisper -O Use OpenAI's Whisper even if Whisper.cpp is available
--reset-config -r Reset configuration file to default
--transcribe-only -t Transcribe only, don't generate a summary
--tiny -T Use tiny Whisper and Ollama models for testing
--version -v Show program's version number and exit
--whisper-model -w Set the Whisper model for transcription
--filter-warnings -W Suppress warnings from PyTorch

Configuration

When you run Locsum for the first time, a config.toml file is automatically created. Its location depends on your operating system (typical paths are listed below):

  • Linux: ~/.config/locsum
  • macOS: ~/Library/Preferences/locsum
  • Windows: C:/Users/YourUsername/AppData/Roaming/locsum

You can edit this file to customize various settings. Common customizations include Whisper and Ollama models to use.

VPN Setup

Since the goal is to process files locally, we might as well download them as privately as possible. Here is how I installed and configured WireGuard VPN on my GX10.

First update your system with sudo apt update && sudo apt upgrade. If the kernel is updated during this step, a reboot is required before continuing.

  • Install WireGuard: sudo apt install wireguard
  • Download WireGuard configuration from my Proton VPN account
  • Copy the configuration file to /etc/wireguard/protonvpn.conf and chown root:root (with sudo)
  • Test connection manually
    • Connect: sudo wg-quick up protonvpn
    • Check connection: sudo wg
    • Check IP address: curl -4 ip.me
    • Disconnect: sudo wg-quick down protonvpn
  • Connect at boot: sudo systemctl enable --now wg-quick@protonvpn.service
  • Reboot and check VPN connection / IP address

Radio Deactivation

For a truly air-gapped system and to eliminate radiofrequency radiation, use the following methods to disable antennas:

  • Disable Bluetooth

    sudo systemctl disable --now bluetooth
    sudo systemctl mask bluetooth
    sudo rfkill block bluetooth
  • Disable wifi

    sudo systemctl disable --now wpa_supplicant
    sudo systemctl mask wpa_supplicant
    sudo rfkill block wifi
    nmcli radio wifi off  # Redundant, but just in case
  • Reboot and check

    sudo systemctl status bluetooth
    sudo systemctl status wpa_supplicant
    sudo rfkill list

Kernel-Level Deactivation

Even after disabling services, the firmware might still attempt background scans, emitting bursts of radiofrequency energy. To completely silence the device, you must prevent the kernel module from loading:

  • Identify the module

    lspci -k  # Look for the wireless controller and find the module name (e.g. mt7925e)
  • Blacklist the module

    # Replace [WIRELESS_MODULE] with the name found above (e.g. mt7925e)
    echo "blacklist [WIRELESS_MODULE]" | sudo tee /etc/modprobe.d/blacklist-wifi.conf
    sudo update-initramfs -u
  • Reboot and check

    lsmod | grep [WIRELESS_MODULE]  # Should return nothing

License

Copyright (c) 2026 Monsieur Linux

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Thanks to the creators and contributors of all the powerful libraries used in this project for making it possible.