feat(desktop): listening-state toggle with persistence and global hotkey (#6649)#7199
feat(desktop): listening-state toggle with persistence and global hotkey (#6649)#7199mvanhorn wants to merge 3 commits intoBasedHardware:mainfrom
Conversation
…key (BasedHardware#6649) Adds a one-tap toggle to pause/resume conversation listening on the desktop app, with an obvious state indicator, persistence across restarts, and a user-rebindable global hotkey. - AppState publishes isConversationListening, persisted via @AppStorage - Audio forwarding gates on the flag at the AppState callback for both the BLE conversation handler (BleAudioService) and the local mic capture (AudioCaptureService) - mic device and BLE socket stay open - Push-to-talk continues to work while listening is paused - Sidebar pill button toggles state (ear.fill / ear.slash.fill, green/orange) - Floating control bar gains a status button + tap-to-toggle compact dot - New global shortcut Toggle Listening (default Cmd-Shift-L), rebindable via Shortcuts settings, posts toggleListeningShortcutPressed - AnalyticsManager.listeningToggled emits Sentry breadcrumb + PostHog listening_toggled event with state and source Default on first launch: on (preserves existing behavior). Closes BasedHardware#6649
When listening is paused, gracefully end the in-flight transcription session via stopTranscription() instead of silently dropping frames. The /v4/listen backend times out sessions after about 90 seconds without client activity, so a long pause would otherwise close the WebSocket while AppState still treated the same recording as active. Adds a guard at startTranscription() so auto-start triggers (BLE reconnect, user action) decline to start while paused. Frame-drop guards at the audio callbacks stay in place as defensive belts. Addresses self-review P1.
Greptile SummaryThis PR adds a listening-state toggle for the desktop app: a
Confidence Score: 3/5The pause direction works correctly, but resuming listening leaves the transcription pipeline stopped while the UI reports Listening, so users silently lose recordings after every pause/resume cycle. The resume path in setListening sets the flag but never calls startTranscription(), meaning the device appears active to the user but captures nothing. Both audio-delivery closures also read a @MainActor-isolated property from background audio threads without actor hopping. The rest of the change — shortcut wiring, persistence, analytics, UI surfaces — is well-structured and follows existing patterns. AppState.swift — specifically setListening (resume branch) and the two audio-callback closures that bypass main-actor isolation. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
U([User action: UI / hotkey / sidebar]) --> TL[toggleListening]
TL --> SL[setListening on/off]
SL -->|on = false| ST[stopTranscription]
ST --> SAC[stopAudioCapture + clearTranscriptionState]
SAC --> FLAG_OFF[isConversationListening = false\npersisted to UserDefaults]
SL -->|on = true| FLAG_ON[isConversationListening = true\npersisted to UserDefaults]
FLAG_ON -. missing call .-> NOSTART[startTranscription NOT called - recording stays stopped]
FLAG_OFF --> MIC_GUARD[audioMixer callback - guard isConversationListening from background thread]
FLAG_OFF --> BLE_GUARD[conversationAudioHandler - guard isConversationListening from background thread]
MIC_GUARD -->|blocked| DROP1[audio dropped]
BLE_GUARD -->|blocked| DROP2[audio dropped]
GSM[GlobalShortcutManager Carbon hotkey] --> NC[NotificationCenter toggleListeningShortcutPressed]
NC --> TL
Reviews (1): Last reviewed commit: "fix(desktop): stop transcription on paus..." | Re-trigger Greptile |
| func setListening(_ on: Bool, source: String = "ui") { | ||
| let previous = isConversationListening | ||
| guard previous != on else { return } | ||
|
|
||
| if !on && isTranscribing { | ||
| stopTranscription() | ||
| } | ||
|
|
||
| isConversationListening = on | ||
| persistedConversationListening = on | ||
| UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey) | ||
|
|
||
| AnalyticsManager.shared.listeningToggled(isListening: on, source: source) | ||
| log("listening: \(previous ? "on" : "off") -> \(on ? "on" : "off") (source=\(source))") | ||
| } |
There was a problem hiding this comment.
Resume doesn't restart transcription
setListening(false) calls stopTranscription(), which calls stopAudioCapture() and clearTranscriptionState() — fully tearing down the pipeline and setting isTranscribing = false. But setListening(true) only flips the flag; it never calls startTranscription(). After the user resumes, isConversationListening is true and the sidebar shows "Listening," yet no recording is happening. The user must manually press play to restart, which silently contradicts the visual affordance of the toggle.
| // Start the mixer — it sums mic + system into a mono stream and forwards it to | ||
| // the transcription WebSocket. | ||
| audioMixer?.start { [weak self] monoMixed in | ||
| guard self?.isConversationListening == true else { return } | ||
| self?.transcriptionService?.sendAudio(monoMixed) | ||
| } |
There was a problem hiding this comment.
Cross-actor read of
@MainActor-isolated property from audio thread
AppState is @MainActor-isolated, so isConversationListening must only be accessed from the main thread. The closure passed to audioMixer?.start runs on whichever background thread the audio subsystem uses, making self?.isConversationListening a cross-actor access. A matching problem exists in the conversationAudioHandler closure at the BLE path (line 1566). Under Swift 6 strict concurrency this is a compile error; under Swift 5.x it is a runtime data race that can yield stale reads. Dispatch the check to the main actor, or promote both closures to @MainActor to keep the isolation correct.
| isConversationListening = on | ||
| persistedConversationListening = on | ||
| UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey) |
There was a problem hiding this comment.
Triple-write to the same UserDefaults key
@AppStorage already synchronises persistedConversationListening to UserDefaults automatically. The explicit UserDefaults.standard.set(on, forKey:) call on the line below is redundant and could mislead future readers into thinking @AppStorage doesn't update the store itself.
| isConversationListening = on | |
| persistedConversationListening = on | |
| UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey) | |
| isConversationListening = on | |
| persistedConversationListening = on |
…or isolation, UserDefaults) - P1 (resume doesn't restart): setListening(true) now calls startTranscription() when not already transcribing. The existing startTranscription guards (isConversationListening + device/mic) keep the no-op cases safe. - P1 (cross-actor read on audio thread): added a nonisolated NSLock-guarded conversationListeningSnapshot, updated from setListening on the MainActor. Both the audio mixer closure (~line 1483) and the BLE conversationAudioHandler (~line 1582) now read snapshotIsConversationListening() instead of the @MainActor-isolated isConversationListening, removing the runtime data race under Swift 5 and the strict-concurrency error under Swift 6 without paying a main-actor hop per audio chunk. - P2 (triple-write to UserDefaults): dropped the explicit UserDefaults.standard.set call; @AppStorage already syncs persistedConversationListening, and the new helper updates the in-memory snapshot only.
|
Addressed in 7609650:
Verified locally: |
Summary
Adds a one-tap pause/resume toggle for conversation listening on the desktop app, persisted across restarts and bound to a rebindable global hotkey (default ⌘⇧L).
Why this matters
Issue #6649 asked for granular control over when the app is actively listening, with an obvious indicator and a hotkey. Today users have to power off the device or quit the app to stop conversation capture. This adds three discoverable surfaces (sidebar pill, floating control bar, hotkey) so a user can pause sensitive conversations without losing the device session.
Demo
Simulated demo (Remotion) of the toggle states. The actual desktop app build is gated by Apple Developer ID signing identity and SwiftPM dep clone failures in my workspace, so this is a programmatic mock against the real Omi color palette and layout, not a live capture.
Changes
AppState.isConversationListening(@Published+@AppStorage("omi.listening.enabled")),toggleListening(source:),setListening(_:source:). Default on first launch istrue(preserves existing behavior).stopTranscription()path. Microphone and BLE remain open. A guard instartTranscription()declines to start while paused, so auto-start triggers (BLE reconnect, user action) are no-ops until the user resumes.Toggle Listening(default⌘⇧L), rebindable via Shortcuts settings.GlobalShortcutManagerregisters the hotkey and postsNotification.Name.toggleListeningShortcutPressed;AppStatelistens and callstoggleListening(source: "hotkey").ear.fill/ear.slash.fill, success/warning colors), floating control bar status pill, and tap-to-toggle dot on the compact bar.AnalyticsManager.listeningToggledemits a Sentry breadcrumb (category: "listening") and a PostHoglistening_toggledevent withstateandsourceproperties.PushToTalkManager) is independent and continues to work while listening is paused.desktop/Desktop/Tests/AppStateListeningTests.swiftcovering toggle + persistence acrossAppStatereloads.Note
Touches
FloatingControlBarWindow.swift, which my open #6770 also modifies. The change here is additive (new init parameter); rebase from whichever lands first.Fixes #6649