Skip to content

Releases: aperepel/claude-mlx-tts

v1.3.0

01 Jan 20:50

Choose a tag to compare

New Features

Streaming TTS (79% faster time-to-first-audio)
- Audio playback now starts almost immediately instead of waiting for full generation

Dynamic Audio Compression
- Added professional-grade compressor/limiter for consistent volume levels
- Default "notification punch" preset for clear, punchy TTS output
- Prevents audio clipping and sudden volume spikes

TTFT Metrics
- Time-to-first-token measurements now logged for performance monitoring

Breaking Changes

Dependency change: pyloudnorm → pedalboard
- If upgrading, run `uv sync --extra mlx` to install the new dependency

v1.2.0

01 Jan 20:49

Choose a tag to compare

- Voice embeddings caching — Voice cloning now runs once at server startup.
  Subsequent requests load cached embeddings from disk, reducing per-request
  overhead by ~99% (1.5s → <10ms)

- Permission prompt notifications — Get an audio alert when Claude needs
  tool permission approval, so you don't miss prompts while away from terminal

- Separate logging for generation vs playback time for clearer performance metrics

v1.1.1

01 Jan 20:49

Choose a tag to compare

Version 1.1.1: Fix permission hook to use venv Python for MLX TTS

v1.1.0

01 Jan 20:49

Choose a tag to compare

Version 1.1.0: TTS notification for tool permission prompts

v1.0.0

01 Jan 20:49

Choose a tag to compare

Add YouTube demo video to README

v0.1.0

01 Jan 20:49

Choose a tag to compare

Working implementation using subprocess calls for TTS:
- macOS 'say' command as default TTS backend
- MLX voice cloning via 'python -m mlx_audio.tts.generate' subprocess
- Claude CLI subprocess for summarization
- Threshold-based triggering (duration, tool calls, thinking keywords)

Architecture: scripts/tts-notify.py (single script, ~290 lines)
- Hook fires on Claude stop event
- Checks thresholds against transcript
- Summarizes via claude -p subprocess
- Speaks via say or mlx_audio subprocess

Known limitation: Each MLX TTS call loads ~4GB model from scratch (5-10s latency)
Next: Direct Python API integration with background daemon for sub-second response