feat(tts): add Text-to-Speech via ElevenLabs#360
Open
guicheffer wants to merge 22 commits intomainfrom
Open
Conversation
Contributor
CI Summary
|
Contributor
✅MegaLinter analysis: Success
See detailed reports in MegaLinter artifacts
|
Contributor
Author
|
@rodrigoluizs this is a feature @jeffujioka had an idea for us to play around — please check it once you have some time — not yet FULLY ready to review but ready to test it though — cc @jeffujioka |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
a2d84e4 to
18b268a
Compare
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Add a third "Text to Speech" tab to the LlmPanel with ElevenLabs integration: toggle to enable TTS, API key input, voice default hint, and test voice button. The TTS tab is always visible regardless of whether LLM enhancement is enabled. Also adds VolumeIcon, SecretInput mock, and tts mock to test helpers. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>
… error handling Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Resolve three TTS issues: - Fix play button not appearing due to focus race condition by making frontmost-app PID query the primary path in getSelectedText(), with system-wide AXFocusedUIElement as fallback. Also filter out Vox's own PID from getFrontmostPid(). - Make Test Voice button actually play audio by returning the synthesized buffer from testConnection() and routing it through TtsManager's new testAndPlay() method. Show "Playing test audio..." status during test. - Add "Get your API key at elevenlabs.io" link below the API key field. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
- Replace display:none with opacity/pointer-events pattern using .tts-available class so the button follows the same visibility flow as other hover buttons - Add buttonsVisible flag to guard async IPC callback, preventing stale promise from showing button after hideSideButtons runs - Preserve tts-available class in onTtsStateChanged handler when resetting className - Check ttsEnabled and elevenLabsApiKey in tts:has-selected-text IPC handler before returning true Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
…ents Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
When the user clicks the Play button on the HUD, the Vox/Electron process becomes frontmost, causing getSelectedText() to return empty or wrong text. Fix by caching the text during the hover check (hasSelectedText) and reusing it in play(). Also fix getFrontmostPid to use PID-based filtering instead of hardcoded app name, so it works in both dev (Electron) and prod (Vox) modes. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
The 1.5s polling interval called hasSelectedText() which overwrote cachedSelectedText even when empty (focus shifted to HUD). Now the cache is only updated when text is found, preserving the last known selection for play(). Also adds diagnostic logging to trace text selection flow. Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Previous approach only tried the frontmost PID and one fallback, which failed when Electron became frontmost during HUD hover. Now uses a multi-tier strategy: 1. Frontmost non-Electron app 2. Last PID that had selected text (survives focus shifts) 3. Brute-force scan of all visible processes 4. System-wide AXFocusedUIElement fallback Also prevents polling from clearing cached text when focus shifts away. Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
- Add ttsConnectionTested and ttsConfigHash to track test state - Require successful test before enabling TTS toggle - Auto-disable toggle when API key/voice changes - Add shake animation on invalid toggle click (respects reduceAnimations) - Show error message when attempting to enable without testing - Clear warning banner after successful test - Auto-enable HUD when TTS is enabled - Update info banner text to clarify HUD is always visible - Improve error messages from ElevenLabs API (payment_required, etc) - Return detailed error info from test endpoint - Add TTS test state to DevPanel - Move "Default voice" hint inline with API key link - Keep TTS stop button visible and larger (24px) during playback - Prevent tab switching after TTS test completes - Disable toggle when config changes (require re-test) - Add i18n for all new messages (10 languages) Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
- Update tts.test return type in preload to match IPC handler - Add eslint exception for bullet point separator character - Add empty lines before rules in shake animation keyframes Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com> Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>
Update unit tests to expect { success, audio?, error? } instead of
ArrayBuffer | null from testConnection function.
Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com>
Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>
7cee403 to
e2e0443
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
AXSelectedText)Closes https://github.com/app-vox/specs/issues/58
Changes
New files
src/main/tts/elevenlabs.ts— ElevenLabs API client (synthesize,testConnection)src/main/tts/manager.ts— TTS orchestration (selection → API → play, with abort support)src/main/input/selection.ts— Read selected text via macOS Accessibility APIsrc/shared/icons/svg/volume.svg— Volume icon for settingsModified files
src/shared/config.ts—ttsEnabled,elevenLabsApiKey,elevenLabsVoiceIdfieldssrc/main/config/manager.ts—SENSITIVE_CONFIG_FIELDSfor API key encryptionsrc/main/audio/recorder.ts—playMp3Buffer()andstopMp3Playback()methodssrc/main/ipc.ts—tts:play,tts:stop,tts:has-selected-text,tts:testhandlerssrc/preload/index.ts— HUD bridge (hudPlayTts,hudStopTts,hudCheckSelectedText,onTtsStateChanged) + settings bridge (voxApi.tts.test)src/main/app.ts— TtsManager creation and wiringsrc/main/hud.ts— Transcriptions button moved to upper-left, Play button added to lower-left with loading/playing/stop statessrc/renderer/components/llm/LlmPanel.tsx— New "Text to Speech" tab with toggle, API key, test buttontts.*keysTest coverage
tests/main/config/manager.test.ts— TTS config round-trip, encryption, defaults (5 new tests)tests/main/input/selection.test.ts— Selection reader export + VITEST guard (2 tests)tests/main/tts/elevenlabs.test.ts— API contract, errors, abort signal, text limit (7 tests)tests/main/tts/manager.test.ts— Orchestration, stop, error recovery, validation (10 tests)HUD Layout
In the app:

In the hud:

Play button only appears on hover when: TTS enabled + API key set + text selected.
Test plan
npm run typecheck && npm run lint && npx vitest runall pass