feat: add real-time streaming transcription with live overlay#217
Open
wrt54gl wants to merge 8 commits intoOpenWhispr:mainfrom
Open
feat: add real-time streaming transcription with live overlay#217wrt54gl wants to merge 8 commits intoOpenWhispr:mainfrom
wrt54gl wants to merge 8 commits intoOpenWhispr:mainfrom
Conversation
On GNOME Wayland, xdotool can only interact with XWayland windows. When OpenWhispr (an Electron/XWayland app) tries to paste, xdotool targets OpenWhispr's own window instead of the focused native Wayland window, silently reports success, and prevents fallback to ydotool. This commit reorders the paste tool candidates on GNOME Wayland to try ydotool first. It also uses Ctrl+Shift+V with ydotool since terminal detection via xdotool/kdotool fails for native Wayland windows. Ctrl+Shift+V works correctly in both terminals (paste) and other apps (paste without formatting). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On GNOME Wayland, xdotool always returns OpenWhispr's own XWayland window instead of the actual focused window, making terminal detection fail. This caused ydotool to always send Ctrl+Shift+V (which doesn't work in apps like GNOME Text Editor) or always Ctrl+V (which doesn't work in terminals). Use the AT-SPI2 accessibility API to detect the active application, which works for both native Wayland and XWayland windows. This lets ydotool send the correct keystroke: Ctrl+Shift+V for terminals, Ctrl+V for everything else. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add multi-provider streaming transcription that displays live text beside the floating mic icon during recording. Supports 4 backends: - AssemblyAI (existing, now wired to UI overlay) - Deepgram Nova-3 (new WebSocket client, ~$0.0043/min) - OpenAI Realtime API (transcription-only mode, 16→24kHz resampling) - Parakeet local (chunked re-transcription every 2s, fully offline) Key changes: - AudioWorklet (pcm-streaming-processor.js) converts mic float32→int16 PCM - LiveTranscriptOverlay component with auto-scroll and click-through - Deterministic window resize priority (transcript > menu > toast > base) - Multi-provider dispatch in audioManager.js (getStreamingProvider) - Deepgram API key persistence via PERSISTED_KEYS in environment.js - Settings UI with provider dropdown and conditional API key input - TypeScript declarations for all new IPC channels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Chromium flags to fix two Linux-specific issues: - --enable-transparent-visuals + --disable-gpu-compositing + 300ms startup delay to prevent transparent window flickering on X11/Wayland - --gtk-version=3 to prevent Chromium from dlopen'ing libgtk-4, which crashes on GTK 4.18+ (refuses to coexist with GTK 3 in same process) - Ozone platform hints for native Wayland rendering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
isStreaming and partialTranscript were referenced at line 125 before being destructured from useAudioRecording at line 149, causing a "Cannot access before initialization" error that crashed the entire React app and broke the dictation overlay rendering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use transcription_session.update event type (not session.update) - Use flat session fields (input_audio_format, input_audio_transcription) matching the ?intent=transcription endpoint schema - Handle transcription_session.created/updated server events - Remove input_audio_buffer.commit on disconnect (empty buffer error) - Suppress non-critical "buffer too small" errors from user toast Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ge serialization
Two bugs prevented Parakeet live streaming from working:
1. useSettings stored streamingProvider via JSON.stringify, wrapping the
value in extra quotes ('"parakeet"' instead of 'parakeet'). audioManager's
getStreamingProvider() comparison never matched, silently disabling streaming.
2. parakeet-streaming-start handler checked status.ready but
ParakeetWsServer.getStatus() returns 'running', not 'ready', so it
always returned NO_SERVER and fell back to regular recording.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merges upstream commits including inline AudioWorklet blob URL fix, AssemblyAI keep-alive pings, runtime .env support, auth hardening, and language selector positioning fix. Resolved conflicts in audioManager.js (multi-provider streaming stop, worklet blob URL adoption) and useSettings.ts (deepgramApiKey setter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
useLocalStoragehook usedJSON.stringifyfor string settings, causing double-quoted values that broke provider matching inaudioManager.jsDependencies
This PR includes commits from #202 and #203 for Linux compatibility. Those PRs should be merged first — the overlapping commits will be no-ops on merge.
Streaming architecture
AudioWorklet(pcm-streaming-processor.js) captures 16kHz PCM and sends chunks via IPCLiveTranscriptOverlaycomponent shows real-time text; main window auto-resizes during streaminggetStreamingProvider()inaudioManager.jsreadsstreamingProviderfrom localStorage with fallback logic per mode (local/cloud/BYOK)Commits
16ebeee1b125d4605c0a49ee03c3f8f15697a847cd758e6d5Test plan
🤖 Generated with Claude Code