Skip to content

feat(tts): add Text-to-Speech via ElevenLabs#360

Open
guicheffer wants to merge 22 commits intomainfrom
feature/tts-elevenlabs
Open

feat(tts): add Text-to-Speech via ElevenLabs#360
guicheffer wants to merge 22 commits intomainfrom
feature/tts-elevenlabs

Conversation

@guicheffer
Copy link
Contributor

@guicheffer guicheffer commented Mar 5, 2026

Summary

  • Add Text-to-Speech feature using ElevenLabs API
  • Play button on HUD reads selected text aloud
  • New "Text to Speech" tab in AI Enhancement settings
  • Text selection detection via macOS Accessibility API (AXSelectedText)

Closes https://github.com/app-vox/specs/issues/58

Changes

New files

  • src/main/tts/elevenlabs.ts — ElevenLabs API client (synthesize, testConnection)
  • src/main/tts/manager.ts — TTS orchestration (selection → API → play, with abort support)
  • src/main/input/selection.ts — Read selected text via macOS Accessibility API
  • src/shared/icons/svg/volume.svg — Volume icon for settings

Modified files

  • src/shared/config.tsttsEnabled, elevenLabsApiKey, elevenLabsVoiceId fields
  • src/main/config/manager.tsSENSITIVE_CONFIG_FIELDS for API key encryption
  • src/main/audio/recorder.tsplayMp3Buffer() and stopMp3Playback() methods
  • src/main/ipc.tstts:play, tts:stop, tts:has-selected-text, tts:test handlers
  • src/preload/index.ts — HUD bridge (hudPlayTts, hudStopTts, hudCheckSelectedText, onTtsStateChanged) + settings bridge (voxApi.tts.test)
  • src/main/app.ts — TtsManager creation and wiring
  • src/main/hud.ts — Transcriptions button moved to upper-left, Play button added to lower-left with loading/playing/stop states
  • src/renderer/components/llm/LlmPanel.tsx — New "Text to Speech" tab with toggle, API key, test button
  • All 10 i18n locale files — tts.* keys

Test coverage

  • tests/main/config/manager.test.ts — TTS config round-trip, encryption, defaults (5 new tests)
  • tests/main/input/selection.test.ts — Selection reader export + VITEST guard (2 tests)
  • tests/main/tts/elevenlabs.test.ts — API contract, errors, abort signal, text limit (7 tests)
  • tests/main/tts/manager.test.ts — Orchestration, stop, error recovery, validation (10 tests)

HUD Layout

In the app:
image

In the hud:
image
Play button only appears on hover when: TTS enabled + API key set + text selected.

Test plan

  • Toggle TTS on/off in Settings → AI Enhancement → Text to Speech
  • Enter ElevenLabs API key and click "Test Voice"
  • Select text in any app, hover HUD → Play button appears (lower-left, green hover)
  • Click Play → spinner → text is spoken aloud → button becomes Stop (red hover)
  • Click Stop → audio stops immediately
  • No Play button when TTS disabled, no API key, or no text selected
  • Transcriptions button now in upper-left (symmetric with Settings)
  • npm run typecheck && npm run lint && npx vitest run all pass

@guicheffer guicheffer requested a review from rodrigoluizs as a code owner March 5, 2026 23:01
@guicheffer guicheffer added the feature New feature implementation label Mar 5, 2026
@guicheffer guicheffer self-assigned this Mar 5, 2026
@guicheffer guicheffer added the feature New feature implementation label Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

CI Summary

Check Status
Typecheck ✅ Passed
Lint ✅ Passed
Lint CSS ✅ Passed
Design Tokens ✅ Passed
Test ✅ Passed
Build ✅ Passed

Run #1073

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

MegaLinter analysis: Success

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ JSON jsonlint 10 0 0 0.76s
✅ JSON npm-package-json-lint yes no no 0.4s
✅ JSON prettier 10 0 0 1.49s
✅ JSON v8r 10 0 0 4.05s
✅ REPOSITORY checkov yes no no 26.34s
✅ REPOSITORY devskim yes no no 2.72s
✅ REPOSITORY dustilock yes no no 0.75s
✅ REPOSITORY gitleaks yes no no 2.23s
✅ REPOSITORY git_diff yes no no 0.12s
✅ REPOSITORY grype yes no no 45.69s
✅ REPOSITORY kics yes no no 2.62s
✅ REPOSITORY kingfisher yes no no 4.77s
✅ REPOSITORY secretlint yes no no 5.94s
✅ REPOSITORY syft yes no no 2.31s
✅ REPOSITORY trivy yes no no 19.38s
✅ REPOSITORY trivy-sbom yes no no 3.28s
✅ REPOSITORY trufflehog yes no no 3.66s

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

@guicheffer guicheffer marked this pull request as draft March 5, 2026 23:44
@guicheffer
Copy link
Contributor Author

@rodrigoluizs this is a feature @jeffujioka had an idea for us to play around — please check it once you have some time — not yet FULLY ready to review but ready to test it though — cc @jeffujioka

@guicheffer guicheffer changed the title feat(tts): add Text-to-Speech via ElevenLabs feat(tts): add Text-to-Speech via _ElevenLabs_ Mar 6, 2026
@guicheffer guicheffer changed the title feat(tts): add Text-to-Speech via _ElevenLabs_ feat(tts): add Text-to-Speech via ElevenLabs Mar 6, 2026
@guicheffer guicheffer marked this pull request as ready for review March 10, 2026 20:38
@guicheffer guicheffer force-pushed the feature/tts-elevenlabs branch from a2d84e4 to 18b268a Compare March 12, 2026 09:45
guicheffer and others added 12 commits March 19, 2026 18:14
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Add a third "Text to Speech" tab to the LlmPanel with ElevenLabs
integration: toggle to enable TTS, API key input, voice default hint,
and test voice button. The TTS tab is always visible regardless of
whether LLM enhancement is enabled. Also adds VolumeIcon, SecretInput
mock, and tts mock to test helpers.

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>
… error handling

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Resolve three TTS issues:
- Fix play button not appearing due to focus race condition by making
  frontmost-app PID query the primary path in getSelectedText(), with
  system-wide AXFocusedUIElement as fallback. Also filter out Vox's own
  PID from getFrontmostPid().
- Make Test Voice button actually play audio by returning the synthesized
  buffer from testConnection() and routing it through TtsManager's new
  testAndPlay() method. Show "Playing test audio..." status during test.
- Add "Get your API key at elevenlabs.io" link below the API key field.

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
- Replace display:none with opacity/pointer-events pattern using
  .tts-available class so the button follows the same visibility
  flow as other hover buttons
- Add buttonsVisible flag to guard async IPC callback, preventing
  stale promise from showing button after hideSideButtons runs
- Preserve tts-available class in onTtsStateChanged handler when
  resetting className
- Check ttsEnabled and elevenLabsApiKey in tts:has-selected-text
  IPC handler before returning true

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
guicheffer and others added 10 commits March 19, 2026 18:16
…ents

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
When the user clicks the Play button on the HUD, the Vox/Electron process
becomes frontmost, causing getSelectedText() to return empty or wrong text.
Fix by caching the text during the hover check (hasSelectedText) and reusing
it in play(). Also fix getFrontmostPid to use PID-based filtering instead of
hardcoded app name, so it works in both dev (Electron) and prod (Vox) modes.

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
The 1.5s polling interval called hasSelectedText() which overwrote
cachedSelectedText even when empty (focus shifted to HUD). Now the
cache is only updated when text is found, preserving the last known
selection for play().

Also adds diagnostic logging to trace text selection flow.

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Previous approach only tried the frontmost PID and one fallback, which
failed when Electron became frontmost during HUD hover. Now uses a
multi-tier strategy:

1. Frontmost non-Electron app
2. Last PID that had selected text (survives focus shifts)
3. Brute-force scan of all visible processes
4. System-wide AXFocusedUIElement fallback

Also prevents polling from clearing cached text when focus shifts away.

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
- Add ttsConnectionTested and ttsConfigHash to track test state
- Require successful test before enabling TTS toggle
- Auto-disable toggle when API key/voice changes
- Add shake animation on invalid toggle click (respects reduceAnimations)
- Show error message when attempting to enable without testing
- Clear warning banner after successful test
- Auto-enable HUD when TTS is enabled
- Update info banner text to clarify HUD is always visible
- Improve error messages from ElevenLabs API (payment_required, etc)
- Return detailed error info from test endpoint
- Add TTS test state to DevPanel
- Move "Default voice" hint inline with API key link
- Keep TTS stop button visible and larger (24px) during playback
- Prevent tab switching after TTS test completes
- Disable toggle when config changes (require re-test)
- Add i18n for all new messages (10 languages)

Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>
- Update tts.test return type in preload to match IPC handler
- Add eslint exception for bullet point separator character
- Add empty lines before rules in shake animation keyframes

Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com>
Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>
Update unit tests to expect { success, audio?, error? } instead of
ArrayBuffer | null from testConnection function.

Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com>
Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>
@guicheffer guicheffer force-pushed the feature/tts-elevenlabs branch from 7cee403 to e2e0443 Compare March 19, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant