Local speech recognition service based on nvidia/parakeet-tdt-0.6b-v3 with an OpenAI-compatible transcription API.
- Local-first ASR (no cloud required)
- OpenAI Whisper-compatible endpoint (
/v1/audio/transcriptions) - CPU by default, GPU optional on Linux
- Works on:
- Ubuntu (tested)
- macOS Intel (supported)
- macOS Apple Silicon (supported, uses CPU/MPS via PyTorch)
git clone https://github.com/rundax/parakeet-asr.git
cd parakeet-asr
./setup.sh
./start-parakeet.shHealth check:
curl http://localhost:9001/healthStop service:
./stop-parakeet.shcurl -X POST http://localhost:9001/v1/audio/transcriptions \
-F "file=@audio.mp3" \
-F "model=parakeet-tdt-0.6b-v3"curl http://localhost:9001/healthsetup.sh installs dependencies via apt, dnf, yum, or pacman when available.
- Default install: CPU PyTorch wheels
- Optional CUDA 11.8 wheels:
USE_CUDA=1 ./setup.shsetup.sh uses Homebrew to install:
ffmpeglibsndfilepython@3.12
Then it installs PyTorch from standard pip wheels.
sed -e "s|{{USER}}|$USER|g" -e "s|{{APP_DIR}}|$PWD|g" parakeet-asr.service.template > parakeet-asr.service
sudo cp parakeet-asr.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now parakeet-asrFirst launch downloads model weights from Hugging Face. Ensure internet access.
Reinstall ffmpeg + libsndfile (or run ./setup.sh again).
Use shorter audio files, or enable GPU on Linux.
If you use OpenClaw and want one-command agent setup, use the packaged skill from this repo:
- Skill source:
openclaw-skill/parakeet-local-asr/ - Package command:
python ~/.npm-global/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py openclaw-skill/parakeet-local-asrThis generates a .skill bundle you can upload to ClawHub.
MIT for this repo code. Model weights/license remain under NVIDIA terms (CC BY 4.0 as published by model provider).