feat: add real-time streaming transcription with live overlay by wrt54gl · Pull Request #217 · OpenWhispr/openwhispr

wrt54gl · 2026-02-07T21:37:30Z

Summary

Add real-time streaming transcription with live text overlay during recording, supporting four backends: AssemblyAI (WebSocket), Deepgram, NVIDIA Parakeet (local via sherpa-onnx), and OpenAI Realtime API
Add streaming provider selector in Settings with auto-detection and per-provider configuration
Fix localStorage serialization bug where useLocalStorage hook used JSON.stringify for string settings, causing double-quoted values that broke provider matching in audioManager.js

Dependencies

This PR includes commits from #202 and #203 for Linux compatibility. Those PRs should be merged first — the overlapping commits will be no-ops on merge.

fix(linux): prefer ydotool over xdotool on GNOME Wayland #202 — GNOME Wayland paste fix (ydotool preference, AT-SPI2 terminal detection)
fix(linux): fix transparent window flickering on Wayland and X11 #203 — Transparent window flicker fix and GTK 3/4 symbol crash prevention

Streaming architecture

Renderer: AudioWorklet (pcm-streaming-processor.js) captures 16kHz PCM and sends chunks via IPC
Main process: Provider-specific handlers manage WebSocket connections and broadcast partial transcripts back to renderer
UI: LiveTranscriptOverlay component shows real-time text; main window auto-resizes during streaming
Parakeet: Chunked re-transcription every 2s against local sherpa-onnx WebSocket server (accumulates full audio buffer)
Provider selection: getStreamingProvider() in audioManager.js reads streamingProvider from localStorage with fallback logic per mode (local/cloud/BYOK)

Commits

Commit	Description
`16ebeee`	fix(linux): prefer ydotool over xdotool on GNOME Wayland (from #202)
`1b125d4`	fix(linux): use AT-SPI2 for terminal detection on GNOME Wayland (from #202)
`605c0a4`	feat: add real-time streaming transcription with live overlay
`9ee03c3`	fix(linux): fix transparent window flicker and GTK 3/4 symbol crash (from #203)
`f8f1569`	fix: move useAudioRecording call before showTranscript to fix TDZ crash
`7a847cd`	fix: correct OpenAI Realtime transcription API schema and buffer error
`758e6d5`	fix: fix Parakeet streaming by correcting status check and localStorage serialization

Test plan

Verify streaming toggle appears in Settings → Transcription with provider selector
Test AssemblyAI streaming (requires OpenWhispr Cloud sign-in)
Test Deepgram streaming with BYOK API key
Test Parakeet local streaming (requires sherpa-onnx binary + downloaded model)
Test OpenAI Realtime streaming with BYOK API key
Verify live transcript overlay appears during streaming recording and disappears on stop
Verify final transcription is correct after streaming stop
Verify non-streaming (regular) transcription still works when streaming is disabled
Test on Linux GNOME Wayland: transparent window, hotkey, paste

🤖 Generated with Claude Code

On GNOME Wayland, xdotool can only interact with XWayland windows. When OpenWhispr (an Electron/XWayland app) tries to paste, xdotool targets OpenWhispr's own window instead of the focused native Wayland window, silently reports success, and prevents fallback to ydotool. This commit reorders the paste tool candidates on GNOME Wayland to try ydotool first. It also uses Ctrl+Shift+V with ydotool since terminal detection via xdotool/kdotool fails for native Wayland windows. Ctrl+Shift+V works correctly in both terminals (paste) and other apps (paste without formatting). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On GNOME Wayland, xdotool always returns OpenWhispr's own XWayland window instead of the actual focused window, making terminal detection fail. This caused ydotool to always send Ctrl+Shift+V (which doesn't work in apps like GNOME Text Editor) or always Ctrl+V (which doesn't work in terminals). Use the AT-SPI2 accessibility API to detect the active application, which works for both native Wayland and XWayland windows. This lets ydotool send the correct keystroke: Ctrl+Shift+V for terminals, Ctrl+V for everything else. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add multi-provider streaming transcription that displays live text beside the floating mic icon during recording. Supports 4 backends: - AssemblyAI (existing, now wired to UI overlay) - Deepgram Nova-3 (new WebSocket client, ~$0.0043/min) - OpenAI Realtime API (transcription-only mode, 16→24kHz resampling) - Parakeet local (chunked re-transcription every 2s, fully offline) Key changes: - AudioWorklet (pcm-streaming-processor.js) converts mic float32→int16 PCM - LiveTranscriptOverlay component with auto-scroll and click-through - Deterministic window resize priority (transcript > menu > toast > base) - Multi-provider dispatch in audioManager.js (getStreamingProvider) - Deepgram API key persistence via PERSISTED_KEYS in environment.js - Settings UI with provider dropdown and conditional API key input - TypeScript declarations for all new IPC channels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Chromium flags to fix two Linux-specific issues: - --enable-transparent-visuals + --disable-gpu-compositing + 300ms startup delay to prevent transparent window flickering on X11/Wayland - --gtk-version=3 to prevent Chromium from dlopen'ing libgtk-4, which crashes on GTK 4.18+ (refuses to coexist with GTK 3 in same process) - Ozone platform hints for native Wayland rendering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

isStreaming and partialTranscript were referenced at line 125 before being destructured from useAudioRecording at line 149, causing a "Cannot access before initialization" error that crashed the entire React app and broke the dictation overlay rendering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use transcription_session.update event type (not session.update) - Use flat session fields (input_audio_format, input_audio_transcription) matching the ?intent=transcription endpoint schema - Handle transcription_session.created/updated server events - Remove input_audio_buffer.commit on disconnect (empty buffer error) - Suppress non-critical "buffer too small" errors from user toast Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ge serialization Two bugs prevented Parakeet live streaming from working: 1. useSettings stored streamingProvider via JSON.stringify, wrapping the value in extra quotes ('"parakeet"' instead of 'parakeet'). audioManager's getStreamingProvider() comparison never matched, silently disabling streaming. 2. parakeet-streaming-start handler checked status.ready but ParakeetWsServer.getStatus() returns 'running', not 'ready', so it always returned NO_SERVER and fell back to regular recording. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merges upstream commits including inline AudioWorklet blob URL fix, AssemblyAI keep-alive pings, runtime .env support, auth hardening, and language selector positioning fix. Resolved conflicts in audioManager.js (multi-provider streaming stop, worklet blob URL adoption) and useSettings.ts (deepgramApiKey setter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wendel Toews and others added 8 commits February 6, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add real-time streaming transcription with live overlay#217

feat: add real-time streaming transcription with live overlay#217
wrt54gl wants to merge 8 commits intoOpenWhispr:mainfrom
wrt54gl:socket-transcribe

wrt54gl commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wrt54gl commented Feb 7, 2026

Summary

Dependencies

Streaming architecture

Commits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant