Audio feedback for Claude Code using text-to-speech. Hear summaries of Claude's responses and permission requests.
- Response summaries - After Claude responds, hear a TTS summary of what it did
- Permission announcements - Hear what permission Claude is requesting before you approve
- Interrupt support - New audio cancels currently playing audio with a transition chime
- SSH support - Works on remote servers via reverse tunnel
- REST API - Control the TTS server programmatically
Response summary - After Claude completes a task
response_summary.mp4
Permission request - Before a tool requires approval
permission_request.mp4
- Claude Code
- uv (Python package manager)
- Groq API key (free tier works fine) OR Ollama for local inference
➜ uv syncCopy the example environment file and add your API key:
➜ cp .env.example .envEdit .env and set your Groq API key:
SUMMARY_GROQ_API_KEY=your-groq-api-key-hereSee Configuration below for all available options.
➜ uv run tts-server➜ mkdir -p ~/.claude/hooks
➜ cp claude-code-hooks/* ~/.claude/hooks/Add to ~/.claude/settings.local.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/summary-tts.sh",
"timeout": 3
}
]
}
],
"PermissionRequest": [
{
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/permission-tts.sh",
"timeout": 3
}
]
}
]
}
}➜ claude --model=haiku -p 'tell me a fantasy story in 1 paragraph'You may need to load claude and check the hooks are loaded with the /hooks command.
claude_code_tts_server/ # Python package
├── main.py # FastAPI server entry point
├── config.py # Configuration with Pydantic
├── api/ # REST API endpoints
├── core/ # Audio manager, playback, sounds
├── summarizers/ # LLM backends (Groq)
└── tts/ # TTS backends (Kokoro)
claude-code-hooks/ # Shell script wrappers
├── summary-tts.sh # Stop hook -> POST /summarize
└── permission-tts.sh # PermissionRequest hook -> POST /permission
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /summarize |
Full pipeline: transcript -> summary -> TTS |
| POST | /permission |
Permission announcement pipeline |
| POST | /speak |
Direct TTS (skip summarization) |
| GET | /queue |
Queue status |
| POST | /queue/clear |
Clear all pending audio |
| POST | /queue/skip |
Skip currently playing audio |
Run the TTS server on your local machine, then SSH with a reverse tunnel:
# Local machine
➜ uv run tts-server
# SSH to remote (forward port 20202 back to local)
➜ ssh -R 20202:localhost:20202 user@remote-serverMake sure you have the hooks copied over to the remote server, as well as updated the ~/.claude/settings.json on the server to use them.
The hooks on the remote server will send requests through the tunnel to your local TTS server for audio playback.
All settings can be configured via environment variables in .env or CLI args. CLI args take precedence over env vars.
| Variable | Default | Description |
|---|---|---|
SUMMARY_BACKEND |
groq |
Backend: groq or ollama |
SUMMARY_GROQ_API_KEY |
- | Groq API key (required for groq) |
SUMMARY_GROQ_MODEL_LARGE |
openai/gpt-oss-120b |
Groq model for long responses |
SUMMARY_GROQ_MODEL_SMALL |
llama-3.1-8b-instant |
Groq model for short responses |
SUMMARY_OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
SUMMARY_OLLAMA_MODEL_LARGE |
qwen3:4b-instruct-2507-q4_K_M |
Ollama model for long responses |
SUMMARY_OLLAMA_MODEL_SMALL |
qwen3:4b-instruct-2507-q4_K_M |
Ollama model for short responses |
| Variable | Default | Description |
|---|---|---|
TTS_SERVER_HOST |
127.0.0.1 |
Host to bind to |
SUMMARY_AUDIO_PORT |
20202 |
Port to listen on |
TTS_SERVER_LOG_LEVEL |
INFO |
Log level |
| Variable | Default | Description |
|---|---|---|
AUDIO_INTERRUPT |
true |
Allow new audio to interrupt |
AUDIO_MIN_DURATION |
1.5 |
Seconds before interrupt allowed |
AUDIO_QUEUE |
true |
Queue messages to play in order |
AUDIO_MAX_QUEUE |
10 |
Maximum queue depth |
AUDIO_INTERRUPT_CHIME |
true |
Play chime on interrupt |
AUDIO_DROP_SOUND |
true |
Play sound when messages dropped |
AUDIO_SPEED |
1.0 |
Playback speed multiplier (requires rubberband, see below) |
AUDIO_VOLUME |
1.0 |
Playback volume (e.g. 2.0 for double volume) |
| Variable | Default | Description |
|---|---|---|
TTS_BACKEND |
kokoro |
Backend: kokoro |
TTS_KOKORO_VOICE |
af_heart |
Kokoro voice |
TTS_KOKORO_LANG |
a |
Kokoro language code |
For local inference without API keys, install Ollama and configure via .env:
SUMMARY_BACKEND=ollama
SUMMARY_OLLAMA_MODEL_LARGE=llama3.1:8b
SUMMARY_OLLAMA_MODEL_SMALL=llama3.2:1bOr use CLI args:
➜ uv run tts-server --summarizer ollamaNote: Local inference is slower than Groq, especially without a GPU. Expect 2-10+ seconds per summary depending on your hardware.
Response Summaries (Stop hook):
- Short responses (<300 chars): Cleaned for TTS (removes markdown)
- Long responses: Summarized to 1-3 sentences in first person
- Content-aware: Actions use past tense, explanations get summarized, questions kept as-is
Permission Announcements (PermissionRequest hook):
- Extracts tool name and parameters
- Generates brief announcement like "Permission requested: Bash command to check disk space"
- Non-blocking to avoid delaying the permission dialog
TTS Server:
- FastAPI-based REST API
- Async architecture handles multiple connections
- Configurable interrupt and queue behavior
- Optional audio indicators (chime on interrupt, blip on skip)
➜ uv run tts-server [options]| Option | Default | Description |
|---|---|---|
--port |
20202 |
Port to listen on |
--host |
127.0.0.1 |
Host to bind to |
--voice |
af_heart |
Kokoro voice to use |
--lang |
a |
Language code (a = American English) |
--interrupt |
true |
Allow new audio to interrupt playing audio |
--no-interrupt |
- | Disable interrupts (play to completion) |
--min-duration |
1.5 |
Seconds to play before allowing interrupt |
--queue |
true |
Queue messages to play in order |
--no-queue |
- | Skip to latest message only |
--max-queue |
10 |
Maximum queue depth (oldest dropped) |
--interrupt-chime |
true |
Play chime when interrupting audio |
--no-interrupt-chime |
- | Disable interrupt chime |
--drop-sound |
true |
Play blip when messages are skipped |
--no-drop-sound |
- | Disable drop sound |
--speed |
1.0 |
Playback speed (1.3 = 30% faster, requires rubberband) |
--volume |
1.0 |
Playback volume (2.0 = double volume) |
--log-level |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR |
--summarizer |
groq |
Summarizer backend: groq or ollama |
--ollama-model-large |
llama3.1:8b |
Ollama model for long responses |
--ollama-model-small |
llama3.2:1b |
Ollama model for short responses/permissions |
--ollama-url |
http://localhost:11434 |
Ollama server URL |
The --interrupt and --queue options are independent and combine as follows:
| interrupt | queue | Behavior |
|---|---|---|
| true | true | Default. Interrupt after min-duration to play next queued message. All messages eventually play. |
| true | false | Interrupt after min-duration to skip to latest. Intermediate messages are dropped. |
| false | true | Play all messages to completion in order. No interrupts. |
| false | false | Play current to completion, then skip to latest. Intermediate messages are dropped. |
| Sound | When | Option |
|---|---|---|
| Interrupt Chime (two-note G5 -> C6) | Playing audio is interrupted | --interrupt-chime / --no-interrupt-chime |
| Drop tone (short blip) | Message skipped without playing | --drop-sound / --no-drop-sound |
Speed up audio playback while preserving pitch using the --speed option:
➜ uv run tts-server --speed 1.3 # 30% fasterRequires rubberband (optional dependency):
# macOS
➜ brew install rubberband
# Linux (Debian/Ubuntu)
➜ sudo apt install rubberband-cli
# Then install Python bindings
➜ uv sync --extra speedThe rubberband library provides high-quality pitch-preserving time-stretching. If AUDIO_SPEED is left at the default (1.0), rubberband is not required.
The default voice (af_heart) sounds great. The other Kokoro voices aren't as good, but you can find them here: VOICES.md
Logs are written to:
~/.claude/hooks/summary-tts.output~/.claude/hooks/permission-tts.output
Free tier (as of December 2025):
- ~1,000 long summaries/day (
openai/gpt-oss-120b) - ~14,400 short summaries/day (
llama-3.1-8b-instant)
MIT