feat(tts): add Text-to-Speech via ElevenLabs by guicheffer · Pull Request #360 · app-vox/vox

guicheffer · 2026-03-05T23:01:32Z

Summary

Add Text-to-Speech feature using ElevenLabs API
Play button on HUD reads selected text aloud
New "Text to Speech" tab in AI Enhancement settings
Text selection detection via macOS Accessibility API (AXSelectedText)

Closes https://github.com/app-vox/specs/issues/58

Changes

New files

src/main/tts/elevenlabs.ts — ElevenLabs API client (synthesize, testConnection)
src/main/tts/manager.ts — TTS orchestration (selection → API → play, with abort support)
src/main/input/selection.ts — Read selected text via macOS Accessibility API
src/shared/icons/svg/volume.svg — Volume icon for settings

Modified files

src/shared/config.ts — ttsEnabled, elevenLabsApiKey, elevenLabsVoiceId fields
src/main/config/manager.ts — SENSITIVE_CONFIG_FIELDS for API key encryption
src/main/audio/recorder.ts — playMp3Buffer() and stopMp3Playback() methods
src/main/ipc.ts — tts:play, tts:stop, tts:has-selected-text, tts:test handlers
src/preload/index.ts — HUD bridge (hudPlayTts, hudStopTts, hudCheckSelectedText, onTtsStateChanged) + settings bridge (voxApi.tts.test)
src/main/app.ts — TtsManager creation and wiring
src/main/hud.ts — Transcriptions button moved to upper-left, Play button added to lower-left with loading/playing/stop states
src/renderer/components/llm/LlmPanel.tsx — New "Text to Speech" tab with toggle, API key, test button
All 10 i18n locale files — tts.* keys

Test coverage

tests/main/config/manager.test.ts — TTS config round-trip, encryption, defaults (5 new tests)
tests/main/input/selection.test.ts — Selection reader export + VITEST guard (2 tests)
tests/main/tts/elevenlabs.test.ts — API contract, errors, abort signal, text limit (7 tests)
tests/main/tts/manager.test.ts — Orchestration, stop, error recovery, validation (10 tests)

HUD Layout

In the app:

In the hud:

Play button only appears on hover when: TTS enabled + API key set + text selected.

Test plan

Toggle TTS on/off in Settings → AI Enhancement → Text to Speech
Enter ElevenLabs API key and click "Test Voice"
Select text in any app, hover HUD → Play button appears (lower-left, green hover)
Click Play → spinner → text is spoken aloud → button becomes Stop (red hover)
Click Stop → audio stops immediately
No Play button when TTS disabled, no API key, or no text selected
Transcriptions button now in upper-left (symmetric with Settings)
npm run typecheck && npm run lint && npx vitest run all pass

src/renderer/components/llm/LlmPanel.tsx

src/main/ipc.ts

github-actions · 2026-03-05T23:12:07Z

CI Summary

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Lint CSS	✅ Passed
Design Tokens	✅ Passed
Test	✅ Passed
Build	✅ Passed

Run #1073

github-actions · 2026-03-05T23:14:56Z

✅MegaLinter analysis: Success

Descriptor	Linter	Files	Errors	Warnings	Elapsed time
✅ JSON	jsonlint	10	0	0	0.76s
✅ JSON	npm-package-json-lint	yes	no	no	0.4s
✅ JSON	prettier	10	0	0	1.49s
✅ JSON	v8r	10	0	0	4.05s
✅ REPOSITORY	checkov	yes	no	no	26.34s
✅ REPOSITORY	devskim	yes	no	no	2.72s
✅ REPOSITORY	dustilock	yes	no	no	0.75s
✅ REPOSITORY	gitleaks	yes	no	no	2.23s
✅ REPOSITORY	git_diff	yes	no	no	0.12s
✅ REPOSITORY	grype	yes	no	no	45.69s
✅ REPOSITORY	kics	yes	no	no	2.62s
✅ REPOSITORY	kingfisher	yes	no	no	4.77s
✅ REPOSITORY	secretlint	yes	no	no	5.94s
✅ REPOSITORY	syft	yes	no	no	2.31s
✅ REPOSITORY	trivy	yes	no	no	19.38s
✅ REPOSITORY	trivy-sbom	yes	no	no	3.28s
✅ REPOSITORY	trufflehog	yes	no	no	3.66s

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

Show us your support by starring ⭐ the repository

src/main/ipc.ts

guicheffer · 2026-03-06T00:32:27Z

@rodrigoluizs this is a feature @jeffujioka had an idea for us to play around — please check it once you have some time — not yet FULLY ready to review but ready to test it though — cc @jeffujioka

codecov · 2026-03-06T00:50:58Z

Codecov Report

❌ Patch coverage is 31.29973% with 259 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/main/input/selection.ts	4.58%	103 Missing and 1 partial ⚠️
src/renderer/components/llm/LlmPanel.tsx	20.95%	72 Missing and 11 partials ⚠️
src/main/ipc.ts	0.00%	30 Missing ⚠️
src/main/tts/manager.ts	82.75%	8 Missing and 2 partials ⚠️
src/main/app.ts	0.00%	8 Missing ⚠️
src/main/audio/recorder.ts	0.00%	7 Missing ⚠️
src/main/tts/elevenlabs.ts	76.92%	4 Missing and 2 partials ⚠️
src/preload/index.ts	0.00%	5 Missing ⚠️
src/renderer/components/dev/DevPanel.tsx	0.00%	5 Missing ⚠️
src/main/config/manager.ts	95.45%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

src/main/ipc.ts

src/renderer/components/llm/LlmPanel.tsx

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Add a third "Text to Speech" tab to the LlmPanel with ElevenLabs integration: toggle to enable TTS, API key input, voice default hint, and test voice button. The TTS tab is always visible regardless of whether LLM enhancement is enabled. Also adds VolumeIcon, SecretInput mock, and tts mock to test helpers. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

… error handling Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Resolve three TTS issues: - Fix play button not appearing due to focus race condition by making frontmost-app PID query the primary path in getSelectedText(), with system-wide AXFocusedUIElement as fallback. Also filter out Vox's own PID from getFrontmostPid(). - Make Test Voice button actually play audio by returning the synthesized buffer from testConnection() and routing it through TtsManager's new testAndPlay() method. Show "Playing test audio..." status during test. - Add "Get your API key at elevenlabs.io" link below the API key field. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

- Replace display:none with opacity/pointer-events pattern using .tts-available class so the button follows the same visibility flow as other hover buttons - Add buttonsVisible flag to guard async IPC callback, preventing stale promise from showing button after hideSideButtons runs - Preserve tts-available class in onTtsStateChanged handler when resetting className - Check ttsEnabled and elevenLabsApiKey in tts:has-selected-text IPC handler before returning true Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

…ents Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

When the user clicks the Play button on the HUD, the Vox/Electron process becomes frontmost, causing getSelectedText() to return empty or wrong text. Fix by caching the text during the hover check (hasSelectedText) and reusing it in play(). Also fix getFrontmostPid to use PID-based filtering instead of hardcoded app name, so it works in both dev (Electron) and prod (Vox) modes. Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

The 1.5s polling interval called hasSelectedText() which overwrote cachedSelectedText even when empty (focus shifted to HUD). Now the cache is only updated when text is found, preserving the last known selection for play(). Also adds diagnostic logging to trace text selection flow. Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Previous approach only tried the frontmost PID and one fallback, which failed when Electron became frontmost during HUD hover. Now uses a multi-tier strategy: 1. Frontmost non-Electron app 2. Last PID that had selected text (survives focus shifts) 3. Brute-force scan of all visible processes 4. System-wide AXFocusedUIElement fallback Also prevents polling from clearing cached text when focus shifts away. Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

- Add ttsConnectionTested and ttsConfigHash to track test state - Require successful test before enabling TTS toggle - Auto-disable toggle when API key/voice changes - Add shake animation on invalid toggle click (respects reduceAnimations) - Show error message when attempting to enable without testing - Clear warning banner after successful test - Auto-enable HUD when TTS is enabled - Update info banner text to clarify HUD is always visible - Improve error messages from ElevenLabs API (payment_required, etc) - Return detailed error info from test endpoint - Add TTS test state to DevPanel - Move "Default voice" hint inline with API key link - Keep TTS stop button visible and larger (24px) during playback - Prevent tab switching after TTS test completes - Disable toggle when config changes (require re-test) - Add i18n for all new messages (10 languages) Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

- Update tts.test return type in preload to match IPC handler - Add eslint exception for bullet point separator character - Add empty lines before rules in shake animation keyframes Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com> Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>

Update unit tests to expect { success, audio?, error? } instead of ArrayBuffer | null from testConnection function. Co-Authored-By: Claude (global.anthropic.claude-sonnet-4-5-20250929-v1:0) <noreply@anthropic.com> Co-Authored-By: jeffujioka <jeff.ujioka@gmail.com>

guicheffer requested a review from rodrigoluizs as a code owner March 5, 2026 23:01

guicheffer added the feature New feature implementation label Mar 5, 2026

guicheffer self-assigned this Mar 5, 2026

guicheffer added the feature New feature implementation label Mar 5, 2026

github-code-quality bot found potential problems Mar 5, 2026

View reviewed changes

src/renderer/components/llm/LlmPanel.tsx Fixed Show fixed Hide fixed

src/renderer/components/llm/LlmPanel.tsx Fixed Show fixed Hide fixed

src/main/ipc.ts Fixed Show fixed Hide fixed

github-code-quality bot found potential problems Mar 5, 2026

View reviewed changes

src/main/ipc.ts Fixed Show fixed Hide fixed

github-code-quality bot found potential problems Mar 5, 2026

View reviewed changes

src/main/ipc.ts Fixed Show fixed Hide fixed

guicheffer marked this pull request as draft March 5, 2026 23:44

guicheffer changed the title ~~feat(tts): add Text-to-Speech via ElevenLabs~~ feat(tts): add Text-to-Speech via _ElevenLabs_ Mar 6, 2026

guicheffer changed the title ~~feat(tts): add Text-to-Speech via _ElevenLabs_~~ feat(tts): add Text-to-Speech via ElevenLabs Mar 6, 2026

guicheffer marked this pull request as ready for review March 10, 2026 20:38

guicheffer force-pushed the feature/tts-elevenlabs branch from a2d84e4 to 18b268a Compare March 12, 2026 09:45

github-code-quality bot found potential problems Mar 12, 2026

View reviewed changes

src/main/ipc.ts Fixed Show fixed Hide fixed

github-code-quality bot found potential problems Mar 13, 2026

View reviewed changes

src/renderer/components/llm/LlmPanel.tsx Dismissed Show dismissed Hide dismissed

src/renderer/components/llm/LlmPanel.tsx Dismissed Show dismissed Hide dismissed

guicheffer and others added 12 commits March 19, 2026 18:14

feat(tts): add i18n translation keys for text-to-speech

94ccfda

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add text selection reader via Accessibility API

f06863e

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add ElevenLabs API client

89e610e

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add TTS manager for orchestration

6a96fac

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add MP3 buffer playback and stop methods to recorder

16a2a1d

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add IPC handlers, preload bridge, and app wiring

0c1de49

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com>

feat(tts): add Play button to HUD with state management

d5e4c78

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

fix(tts): address code review findings — stop playback, abort signal,…

1dfdb90

… error handling Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

feat(tts): add TTS info card to DevPanel

e22cb61

Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

guicheffer and others added 10 commits March 19, 2026 18:16

fix(tts): fix audio playback, add selection polling, add analytics ev…

b708dab

…ents Co-Authored-By: Claude (global.anthropic.claude-opus-4-6-v1) <noreply@anthropic.com> Co-authored-by: Jefferson Masahiro Fujioka <jefferson.fujioka@gmail.com>

chore: add .superpowers/ to gitignore

ddbd2cc

Potential fix for pull request finding 'Missing await'

c35a84a

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

fix(tts): update import paths after input/ → platform/darwin/ rename

e2e0443

guicheffer force-pushed the feature/tts-elevenlabs branch from 7cee403 to e2e0443 Compare March 19, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): add Text-to-Speech via ElevenLabs#360

feat(tts): add Text-to-Speech via ElevenLabs#360
guicheffer wants to merge 22 commits intomainfrom
feature/tts-elevenlabs

guicheffer commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

guicheffer commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guicheffer commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New files

Modified files

Test coverage

HUD Layout

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Summary

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅MegaLinter analysis: Success

Uh oh!

Uh oh!

Uh oh!

guicheffer commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guicheffer commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 6, 2026 •

edited

Loading