This project provides a fast, reliable workflow for transcribing audio and video files locally on macOS using whisper.cpp and ffmpeg. It includes:
- a hardened shell script (
transcribe.sh) that handles real-world audio issues - an optional macOS Shortcut (
Whisper Transcribe.shortcut) for drag-and-drop or Finder-menu automation - support for automatic format conversion, silence handling, and duplicate-line reduction
It is designed for personal workflows, research, and offline transcription without relying on cloud services.
Real-world files come in every format: .m4a, .mov, .wav, .mp3, .flac.
The script normalizes everything to 16 kHz mono WAV, ensuring whisper.cpp receives consistent input.
Whisper’s overlapping context windows can produce repeated lines when long silences occur. To solve this, the script uses:
- VAD mode
- Silero VAD model
- entropy threshold tuning (
--entropy-thold 2.8) - zero context reuse (
--max-context 0)
This combination dramatically improves transcription clarity.
Every transcript is saved to the Desktop with incrementing suffixes when needed.
Included:
shortcuts/Whisper Transcribe.shortcut
Use it from Finder, the Services menu, or as part of a larger Shortcuts automation.
.
├── transcribe.sh
├── README.md
├── .gitignore
├── shortcuts/
│ ├── Whisper Transcribe.shortcut
│ └── Whisper Transcribe.png
Install the required tools with Homebrew:
brew install ffmpeg
brew install whisper-cppDownload whisper.cpp models to:
~/.cache/whisper.cpp/models/
For example:
ggml-large-v3-turbo.bin
ggml-silero-v5.1.2.bin
The script defaults to large-v3-turbo unless overridden.
-
Open the
.shortcutfile in:shortcuts/Whisper Transcribe.shortcut -
macOS will prompt you to import it.
-
The Shortcut will call
transcribe.shwith the selected file. -
The transcript is written to your Desktop automatically.
You can trigger it via:
- Finder right-click → Quick Actions
- Spotlight → typing the name of the Shortcut
- Incorporating it into automated workflows
./transcribe.sh /path/to/audio.m4a./transcribe.sh input.wav medium.enTranscripts are saved as:
~/Desktop/<filename>.txt
with unique naming logic.
EXT="${IN##*.}"
case "$EXT" in
wav|flac|mp3|ogg)
run_whisper "$IN"
;;
*)
ffmpeg -i "$IN" -ar 16000 -ac 1 -f wav - | run_whisper -
;;
esac--vad \
--vad-model "$VADMODEL" \
--entropy-thold 2.8 \
--max-context 0These settings reduce Whisper’s tendency to repeat segments when long silences are present.
This workflow evolved from solving real issues with whisper.cpp:
- inconsistent audio formats causing errors
- duplicate lines from overlapping-context handling
- background noise and silence introducing artifacts
- wanting a fast, local, repeatable transcription workflow
- needing drag-and-drop convenience via macOS Shortcuts
The result is a reliable, automated transcription system you can run completely offline.
