Skip to content

josephfried/whisper-transcribe-mac-shortcut

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Local Audio Transcription Automation (whisper.cpp + macOS Shortcuts)

This project provides a fast, reliable workflow for transcribing audio and video files locally on macOS using whisper.cpp and ffmpeg. It includes:

  • a hardened shell script (transcribe.sh) that handles real-world audio issues
  • an optional macOS Shortcut (Whisper Transcribe.shortcut) for drag-and-drop or Finder-menu automation
  • support for automatic format conversion, silence handling, and duplicate-line reduction

It is designed for personal workflows, research, and offline transcription without relying on cloud services.


Features

✔ Robust audio handling

Real-world files come in every format: .m4a, .mov, .wav, .mp3, .flac. The script normalizes everything to 16 kHz mono WAV, ensuring whisper.cpp receives consistent input.

✔ Silence trimming + reduced duplicate lines

Whisper’s overlapping context windows can produce repeated lines when long silences occur. To solve this, the script uses:

  • VAD mode
  • Silero VAD model
  • entropy threshold tuning (--entropy-thold 2.8)
  • zero context reuse (--max-context 0)

This combination dramatically improves transcription clarity.

✔ Automatic, non-colliding output filenames

Every transcript is saved to the Desktop with incrementing suffixes when needed.

✔ macOS Shortcut Integration

Included: shortcuts/Whisper Transcribe.shortcut

Use it from Finder, the Services menu, or as part of a larger Shortcuts automation.


Repository Structure

.
├── transcribe.sh
├── README.md
├── .gitignore
├── shortcuts/
│   ├── Whisper Transcribe.shortcut
│   └── Whisper Transcribe.png

Requirements

macOS

Install the required tools with Homebrew:

brew install ffmpeg
brew install whisper-cpp

Models

Download whisper.cpp models to:

~/.cache/whisper.cpp/models/

For example:

ggml-large-v3-turbo.bin
ggml-silero-v5.1.2.bin

The script defaults to large-v3-turbo unless overridden.


Using the macOS Shortcut

  1. Open the .shortcut file in:

    shortcuts/Whisper Transcribe.shortcut
    
  2. macOS will prompt you to import it.

  3. The Shortcut will call transcribe.sh with the selected file.

  4. The transcript is written to your Desktop automatically.

You can trigger it via:

  • Finder right-click → Quick Actions
  • Spotlight → typing the name of the Shortcut
  • Incorporating it into automated workflows

Using the Script Directly

Basic usage

./transcribe.sh /path/to/audio.m4a

Override model

./transcribe.sh input.wav medium.en

Transcripts are saved as:

~/Desktop/<filename>.txt

with unique naming logic.


How It Works

Format normalization

EXT="${IN##*.}"
case "$EXT" in
  wav|flac|mp3|ogg)
    run_whisper "$IN"
    ;;
  *)
    ffmpeg -i "$IN" -ar 16000 -ac 1 -f wav - | run_whisper -
    ;;
esac

Preventing duplicate lines

--vad \
--vad-model "$VADMODEL" \
--entropy-thold 2.8 \
--max-context 0

These settings reduce Whisper’s tendency to repeat segments when long silences are present.


Why This Exists

This workflow evolved from solving real issues with whisper.cpp:

  • inconsistent audio formats causing errors
  • duplicate lines from overlapping-context handling
  • background noise and silence introducing artifacts
  • wanting a fast, local, repeatable transcription workflow
  • needing drag-and-drop convenience via macOS Shortcuts

The result is a reliable, automated transcription system you can run completely offline.

About

Offline audio transcription workflow (whisper.cpp + ffmpeg) with macOS Shortcut automation. Handles real-world formats, silence trimming, and duplicate-line reduction.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages