feat(voice-server + installer): Google Cloud TTS + cross-platform audio#872
Open
fayerman-source wants to merge 4 commits intodanielmiessler:mainfrom
Open
feat(voice-server + installer): Google Cloud TTS + cross-platform audio#872fayerman-source wants to merge 4 commits intodanielmiessler:mainfrom
fayerman-source wants to merge 4 commits intodanielmiessler:mainfrom
Conversation
Adds Google Cloud Text-to-Speech as alternative TTS provider and fixes audio playback on Linux (WSL2). - Google Cloud TTS via REST API (no SDK), configurable in settings.json - Cross-platform audio: afplay (macOS), mpv/ffplay/aplay (Linux) - Cross-platform notifications: osascript (macOS), notify-send (Linux) - Backwards compatible: defaults to ElevenLabs when ttsProvider unset - Accepts GOOGLE_CLOUD_API_KEY or GOOGLE_API_KEY from ~/.env Re-implementation of danielmiessler#687 with additional Linux/WSL2 support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expands the installer's Step 7 (Voice Setup) to support multiple TTS providers instead of hardcoding ElevenLabs: - New provider selection prompt: ElevenLabs / Google Cloud TTS / Skip - Google Cloud TTS path: key search, validation, Neural2-D default - ElevenLabs path: unchanged existing flow - Settings.json gets ttsProvider + googleCloudVoice when Google selected - .env saves the correct key for chosen provider - Key validation for Google Cloud via texttospeech.googleapis.com - Updated types, config-gen, detect, steps descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ports the Linux service installation from danielmiessler#686 to v4.0.3: - Platform detection at startup (Darwin/Linux) - Linux: systemd user service instead of LaunchAgent - Linux: checks for audio player (mpv/ffplay/aplay) and notify-send - Detects both ElevenLabs and Google Cloud API keys - Menu bar indicator prompt only on macOS - Removed macOS-specific "say" fallback references Re-implementation of danielmiessler#686 install.sh changes for current architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
blu3dot
pushed a commit
to blu3dot/Personal_AI_Infrastructure
that referenced
this pull request
Mar 3, 2026
When ELEVENLABS_API_KEY is missing or the ElevenLabs API call fails, the VoiceServer now falls back to macOS native `say` command instead of silently skipping voice output. Pronunciation rules from pronunciations.json are applied to the fallback too. - Only triggers when ElevenLabs path didn't play (no double-speak) - Reuses existing spawnSafe() and applyPronunciations() helpers - Fails gracefully — logs error, doesn't crash server - Uses error: unknown with instanceof type guard - TODO: Linux equivalent (see danielmiessler#855, danielmiessler#872) Co-Authored-By: Claude <noreply@anthropic.com>
4 tasks
blu3dot
pushed a commit
to blu3dot/Personal_AI_Infrastructure
that referenced
this pull request
Mar 3, 2026
…ut device New `voice.requireHeadphones` setting in settings.json (default: false). When enabled, VoiceServer checks the default audio output device via `system_profiler SPAudioDataType -json` and skips voice playback if the output is built-in laptop speakers. Voice plays normally through Bluetooth, USB, HDMI, AirPlay, and other external audio devices. - Uses `-json` flag for reliable machine-parseable output - Caches detection result for 30 seconds (system_profiler takes 140-250ms) - 3-second timeout prevents hangs if system_profiler stalls - Fails open — if detection fails, voice plays anyway (convenience, not security) - Desktop notification banners display regardless of headphone state - Config uses `=== true` (opt-in, missing key defaults to OFF) - TODO: Linux equivalent (see danielmiessler#855, danielmiessler#872) References danielmiessler#855 Co-Authored-By: Claude <noreply@anthropic.com>
6 tasks
blu3dot
pushed a commit
to blu3dot/Personal_AI_Infrastructure
that referenced
this pull request
Mar 3, 2026
…ut device New `voice.requireHeadphones` setting in settings.json (default: false). When enabled, VoiceServer checks the default audio output device via `system_profiler SPAudioDataType -json` and skips voice playback if the output is built-in laptop speakers. Voice plays normally through Bluetooth, USB, HDMI, AirPlay, and other external audio devices. - Uses `-json` flag for reliable machine-parseable output - Caches detection result for 30 seconds (system_profiler takes 140-250ms) - 3-second timeout prevents hangs if system_profiler stalls - Fails open — if detection fails, voice plays anyway (convenience, not security) - Desktop notification banners display regardless of headphone state - Config uses `=== true` (opt-in, missing key defaults to OFF) - TODO: Linux equivalent (see danielmiessler#855, danielmiessler#872) References danielmiessler#855 Co-Authored-By: Claude <noreply@anthropic.com>
Merged origin/main into feat/google-cloud-tts-v2. Single conflict in actions.ts imports — kept both PAI_VERSION/ALGORITHM_VERSION from main and validateGoogleCloudKey from this branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
settings.json → daidentity.ttsProviderafplay(macOS),mpv/ffplay/aplay(Linux/WSL2) at startuposascript(macOS),notify-send(Linux) with silent fallbackttsProvideris not setWhy
afplayandosascript, making it non-functional on Linux/WSL2fetchConfiguration
Add to
~/.env:Add to
~/.claude/settings.json:{ "daidentity": { "ttsProvider": "google-cloud", "googleCloudVoice": { "languageCode": "en-US", "voiceName": "en-US-Neural2-D", "voiceType": "NEURAL2", "speakingRate": 1.0, "pitch": 0.0 } } }Or keep using ElevenLabs by not setting
ttsProvider(or setting it to"elevenlabs").Test plan
ttsProvider: "google-cloud"and logs correct providervoice_system: "google-cloud",google_cloud_configured: true,audio_playerttsProviderdefaults to ElevenLabsafplayContext
Re-implementation of #687 (closed during v4.0 restructuring) with additional Linux/WSL2 cross-platform support. Tested live on WSL2 with Google Cloud Neural2-D voice.
🤖 Generated with Claude Code