Voice to Text

Record audio from your microphone and transcribe it to text using whisper.cpp. The program records until you press Enter, then runs the recording through Whisper and prints the transcription.

Platform: macOS (capture uses AVFoundation). Linux/Windows would require changing the ffmpeg input in code.

Prerequisites

Go 1.25 or later
ffmpeg (install via Homebrew: brew install ffmpeg)
whisper.cpp (included as a submodule; must be built and the model downloaded)

Setup

Clone with submodules

git clone --recurse-submodules <repo-url>
cd voice-to-text

If the repo is already cloned:

git submodule update --init --recursive

Build whisper.cpp

From the project root:
```
cd third_party/whisper
mkdir -p build && cd build
cmake ..
make
cd ../../..
```
Ensure the CLI binary is at third_party/whisper/build/bin/whisper-cli. If your build puts the binary elsewhere (e.g. third_party/whisper/main), copy or symlink it to that path.
Download a Whisper model

The app expects the large-v3-turbo model by default at:

third_party/whisper/models/ggml-large-v3-turbo.bin

Download it using the whisper.cpp script (from project root):
```
mkdir -p third_party/whisper/models
third_party/whisper/models/download-ggml-model.sh large-v3-turbo third_party/whisper/models
```
Other models: You can use a different model by setting WHISPER_MODEL, e.g. for the smaller English base model:

third_party/whisper/models/ggml-base.en.bin

Model files are large (large-v3-turbo is ~1.5 GB); they are gitignored.

Lighter option: For a smaller/faster model, use e.g. ggml-base.en.bin or ggml-small.en.bin and set WHISPER_MODEL accordingly.

Usage

Run from the project root (paths to the CLI and model are relative to it):

go run ./cmd/vtt

Or build and run the binary:

go build -o vtt ./cmd/vtt
./vtt

The program starts and prints "Recording... Press ENTER to stop."
Speak; when done, press Enter to stop recording.
The audio is saved to cmd/vtt/input.wav and then transcribed.
The transcription is printed under "--- TRANSCRIPTION ---".

Recordings and generated .wav files are gitignored.

Project layout

cmd/vtt/main.go – entrypoint: recording via ffmpeg, then transcription via whisper-cli
third_party/whisper – whisper.cpp as a git submodule (build output and models are local only)

How it works

Recording: ffmpeg captures from the default system microphone (:0) with AVFoundation, applies light noise reduction, and outputs 16 kHz mono for Whisper.
Transcription: The saved WAV is passed to whisper-cli with -sns (suppress non-speech tokens) and -l en for cleaner English output.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cmd/vtt		cmd/vtt
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice to Text

Prerequisites

Setup

Usage

Project layout

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice to Text

Prerequisites

Setup

Usage

Project layout

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages