Verify: Desktop migration PRs #5374 + #5395 + #5413 (combined)#5506
Verify: Desktop migration PRs #5374 + #5395 + #5413 (combined)#5506
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket client that connects to /v4/listen with Bearer auth and sends screen_frame JSON messages. Routes focus_result responses back to callers via async continuations with frame_id correlation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#5396) Replace direct Gemini API calls with backend WebSocket screen_frame messages. Context building (goals, tasks, memories, AI profile) moves server-side. Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#5396) Start WS connection when monitoring starts, disconnect on stop. Pass service to FocusAssistant (shared for future assistant types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice (send screen_frame with analyze type, receive typed result via frame_id) Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks (send typed JSON message, receive result via single-slot continuation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient tool-calling loop with backendService.extractTasks(). Remove extractTaskSingleStage, refreshContext, vector/keyword search, validateTaskTitle — all LLM logic now server-side. -550 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient.sendRequest with backendService.extractMemories(). Remove prompt/schema building — all LLM logic now server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with backendService.generateAdvice(). Remove compressForGemini, getUserLanguage, buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.deduplicateTasks(). Remove prompt/schema building, local dedup logic — server handles everything. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.rerankTasks(). Remove prompt/ schema building, context fetching — server handles reranking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-stage Gemini profile generation with backendService.requestProfile(). Remove fetchDataSources, buildPrompt, buildConsolidationPrompt — server fetches user data from Firestore and generates profile server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…y uses created_at, idempotent delete
…boundary caps (50 total)
…ions Path updates (5 endpoints): - v2/chat/initial-message → v2/initial-message - v2/agent/provision → v1/agent/vm-ensure - v2/agent/status → v1/agent/vm-status - v1/personas/check-username → v1/apps/check-username - v1/personas/generate-prompt → v1/app/generate-prompts (POST→GET) Decoder hardening: - ServerConversation.createdAt: use decodeIfPresent with Date() fallback - ActionItemsListResponse: try "action_items" then "items" key (Python vs staged-tasks) - AgentProvisionResponse/AgentStatusResponse: make fields optional, add hasVm - UsernameAvailableResponse: support both is_taken (Python) and available (Rust) Graceful no-ops: - recordLlmUsage(): no-op with log (endpoint removed) - fetchTotalOmiAICost(): return nil immediately (endpoint removed) - getChatMessageCount(): return 0 immediately (endpoint removed) Remove staged-tasks migration: - Remove migrateStagedTasks() and migrateConversationItemsToStaged() from APIClient - Remove migration callers and functions from TasksStore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
C1: Replace unsafe base64 JWT decode with firebase_admin.auth.verify_id_token() which verifies signature against Google public keys before trusting claims. C2: Wrap email in sanitize_pii() per CLAUDE.md logging rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ough /v4/listen (kai+ren)
Greptile SummaryThis combined verification PR merges three desktop migration branches (#5374 Rust→Python backend, #5395 STT via Key findings:
Confidence Score: 3/5
Sequence DiagramsequenceDiagram
participant Desktop as Desktop App
participant BTS as BackendTranscriptionService
participant BPS as BackendProactiveService
participant BE as Backend /v4/listen (WebSocket)
participant LLM as Gemini Flash (server-side)
participant FS as Firestore
Note over Desktop,BE: STT Connection (BackendTranscriptionService)
Desktop->>BTS: start(onTranscript:)
BTS->>BE: WS connect /v4/listen?codec=pcm16&source=desktop
BTS-->>Desktop: onConnected() [after 500ms heuristic]
loop Audio streaming
Desktop->>BTS: sendAudio(PCMData)
BTS->>BE: binary audio frame (buffered ~100ms)
BE-->>BTS: [TranscriptSegment JSON]
BTS-->>Desktop: onTranscript(segment)
end
Note over Desktop,BE: Proactive AI Connection (BackendProactiveService)
Desktop->>BPS: connect()
BPS->>BE: WS connect /v4/listen?source=desktop_proactive
BPS-->>Desktop: isConnected = true
Desktop->>BPS: analyzeFocus(imageBase64, appName, windowTitle)
BPS->>BE: {type:"screen_frame", frame_id:"uuid", analyze:["focus"], image_b64:"..."}
BE->>LLM: analyze_focus(uid, image_b64)
LLM-->>BE: FocusResult
BE-->>BPS: {type:"focus_result", frame_id:"uuid", status:..., app_or_site:...}
BPS-->>Desktop: ScreenAnalysis (continuation resumed)
Desktop->>BPS: rerankTasks()
BPS->>BE: {type:"task_rerank"}
BE->>FS: get staged_tasks(uid)
FS-->>BE: tasks list
BE->>LLM: rerank tasks
LLM-->>BE: updated scores
BE->>FS: batch_update_scores(uid, scores)
BE-->>BPS: {type:"rerank_complete", updated_tasks:[...]}
BPS-->>Desktop: RerankExtractedResult
Last reviewed commit: 0841bd3 |
| return try await withCheckedThrowingContinuation { continuation in | ||
| requestLock.lock() | ||
| pendingLiveNote = continuation | ||
| requestLock.unlock() | ||
| sendAndTimeoutSingle(jsonString: jsonString, timeout: textRequestTimeout, | ||
| remove: { let c = self.pendingLiveNote; self.pendingLiveNote = nil; return c }) | ||
| } | ||
| } | ||
|
|
||
| /// Request profile generation (server fetches user data from Firestore). | ||
| func requestProfile() async throws -> String { | ||
| guard isConnected else { throw ServiceError.notConnected } | ||
| let jsonString = try buildJSON(["type": "profile_request"]) | ||
|
|
||
| return try await withCheckedThrowingContinuation { continuation in | ||
| requestLock.lock() | ||
| pendingProfile = continuation | ||
| requestLock.unlock() | ||
| sendAndTimeoutSingle(jsonString: jsonString, timeout: textRequestTimeout, | ||
| remove: { let c = self.pendingProfile; self.pendingProfile = nil; return c }) | ||
| } | ||
| } | ||
|
|
||
| /// Request task reranking (server fetches tasks from Firestore). | ||
| func rerankTasks() async throws -> RerankExtractedResult { | ||
| guard isConnected else { throw ServiceError.notConnected } | ||
| let jsonString = try buildJSON(["type": "task_rerank"]) | ||
|
|
||
| return try await withCheckedThrowingContinuation { continuation in | ||
| requestLock.lock() | ||
| pendingRerank = continuation | ||
| requestLock.unlock() | ||
| sendAndTimeoutSingle(jsonString: jsonString, timeout: textRequestTimeout, | ||
| remove: { let c = self.pendingRerank; self.pendingRerank = nil; return c }) | ||
| } | ||
| } | ||
|
|
||
| /// Request task deduplication (server fetches tasks from Firestore). | ||
| func deduplicateTasks() async throws -> DedupExtractedResult { | ||
| guard isConnected else { throw ServiceError.notConnected } | ||
| let jsonString = try buildJSON(["type": "task_dedup"]) | ||
|
|
||
| return try await withCheckedThrowingContinuation { continuation in | ||
| requestLock.lock() | ||
| pendingDedup = continuation | ||
| requestLock.unlock() | ||
| sendAndTimeoutSingle(jsonString: jsonString, timeout: textRequestTimeout, | ||
| remove: { let c = self.pendingDedup; self.pendingDedup = nil; return c }) | ||
| } |
There was a problem hiding this comment.
Single-slot continuation overwrite — concurrent callers silently abandoned
pendingLiveNote, pendingProfile, pendingRerank, and pendingDedup are each a single optional continuation. If two callers invoke the same text-only method concurrently (e.g., two generateLiveNote calls), the second call overwrites the first continuation before it has been resumed:
requestLock.lock()
pendingLiveNote = continuation // overwrites any previous value
requestLock.unlock()The overwritten continuation is now orphaned — it can never be resumed by the response handler (handleMessage), nor by the timeout task (which calls remove() which may return nil since the slot was already replaced). The first caller will suspend forever until the 60-second textRequestTimeout fires. More critically, checked continuations in Swift must be resumed exactly once; leaking one without eventually resuming it is a concurrency bug.
A safe minimal fix is to fail fast if a request of the same type is already in-flight:
requestLock.lock()
guard pendingLiveNote == nil else {
requestLock.unlock()
continuation.resume(throwing: ServiceError.serverError("Request already in-flight"))
return
}
pendingLiveNote = continuation
requestLock.unlock()This applies identically to pendingProfile, pendingRerank, and pendingDedup.
| ref = db.collection('users').document(uid).collection(COLLECTION) | ||
|
|
||
| # Dedup: check for existing task with same description (case-insensitive) | ||
| normalized = description.lower() | ||
| for doc in ref.stream(): | ||
| existing = doc.to_dict() | ||
| if existing.get('deleted'): | ||
| continue | ||
| if existing.get('description', '').strip().lower() == normalized: | ||
| existing['id'] = doc.id | ||
| return _prepare_for_read(existing) |
There was a problem hiding this comment.
O(N) full-collection scan on every task create
The deduplication check calls ref.stream() without any filter, which reads every document in the user's staged_tasks collection on every call to create_staged_task. For a user with hundreds of accumulated staged tasks, this becomes expensive in both latency and Firestore read cost.
for doc in ref.stream(): # reads ALL staged tasks unconditionally
existing = doc.to_dict()
if existing.get('deleted'):
continue
if existing.get('description', '').strip().lower() == normalized:
...A more targeted approach is to store a description_normalized field and query against it directly:
query = ref.where(filter=firestore.FieldFilter('description_normalized', '==', normalized)) \
.where(filter=firestore.FieldFilter('deleted', '==', False)) \
.limit(1)
existing_docs = list(query.stream())
if existing_docs:
existing = existing_docs[0].to_dict()
existing['id'] = existing_docs[0].id
return _prepare_for_read(existing)This requires: (1) writing description_normalized on create, and (2) a composite Firestore index on (description_normalized, deleted) — but it reduces the read from O(N) to O(1) per call. At minimum, the deleted != true filter should be pushed into the query to cut document reads even without the normalized field.
| private func startConnect() { | ||
| guard let baseURL = Self.getBaseURL() else { | ||
| log("BackendProactiveService: OMI_API_URL not set") | ||
| return | ||
| } |
There was a problem hiding this comment.
Inconsistent URL resolution vs BackendTranscriptionService
BackendProactiveService resolves the backend URL via a private Self.getBaseURL() that reads the OMI_API_URL environment variable, while BackendTranscriptionService (introduced in the same PR) uses await APIClient.shared.baseURL:
// BackendProactiveService
private func startConnect() {
guard let baseURL = Self.getBaseURL() else { ... } // reads OMI_API_URL env var
// BackendTranscriptionService
private func connect() {
let baseURL = await APIClient.shared.baseURL // reads from app settingsIf a user has configured a custom backend URL through the app's settings UI (which APIClient.shared.baseURL respects), the proactive service will ignore it and fall back to the compile-time environment variable. This means the two WebSocket connections will point to different backends under a custom URL configuration.
Consider unifying both services to use APIClient.shared.baseURL for consistency.
| def get_action_items_for_overall_score(uid: str) -> Tuple[int, int]: | ||
| """Count completed vs total action items (all time, not deleted). | ||
|
|
||
| Returns (completed_count, total_count). | ||
| """ | ||
| ref = db.collection('users').document(uid).collection('action_items') | ||
|
|
||
| completed = 0 | ||
| total = 0 | ||
| for doc in ref.stream(): | ||
| data = doc.to_dict() | ||
| if data.get('deleted'): | ||
| continue | ||
| total += 1 | ||
| if data.get('completed'): | ||
| completed += 1 | ||
| return completed, total |
There was a problem hiding this comment.
O(N) full-collection scan for overall score — unbounded for active users
get_action_items_for_overall_score streams the entire action_items collection with no filter:
for doc in ref.stream(): # no filter, no limit — scans everything
data = doc.to_dict()
if data.get('deleted'):
continue
total += 1
if data.get('completed'):
completed += 1Unlike the daily/weekly score queries (which are bounded by a time range), the overall score query will grow without bound as a user accumulates action items over months. A counter document pattern, or at minimum a Firestore count() aggregation query split by completed=True/False, would eliminate the full scan. If the raw list is not needed beyond the counts, the Firestore count() aggregation API is the cleanest solution.
| // Mark as connected after a short delay (backend doesn't send a connect confirmation) | ||
| DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) { [weak self] in | ||
| guard let self = self, self.webSocketTask?.state == .running else { return } | ||
| self.isConnected = true | ||
| self.reconnectAttempts = 0 | ||
| self.lastDataReceivedAt = Date() | ||
| self.lastKeepaliveSuccessAt = Date() | ||
| log("BackendTranscriptionService: Connected") | ||
| self.startKeepalive() | ||
| self.startWatchdog() | ||
| self.onConnected?() | ||
| } |
There was a problem hiding this comment.
Heuristic connection confirmation may send audio before WebSocket is ready
isConnected is set to true via a 500ms asyncAfter delay rather than on a real server acknowledgement:
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) { [weak self] in
guard let self = self, self.webSocketTask?.state == .running else { return }
self.isConnected = true
...
}URLSessionWebSocketTask.state == .running is set the moment resume() is called on the task (after the TCP handshake begins, not after it completes). On a slow or congested network, the 500ms window may elapse before the TLS+WebSocket handshake finishes, causing the service to declare itself connected and start buffering/sending audio data before the underlying connection is established. Listening for the first pong from the server's keepalive, or using a longer delay with additional guards, would make this materially more robust.
Live E2E Verification Report — Audio Recording TestVerifier: noa (independent) Test Environment
Permissions Verified
Audio Recording Tests30-second test (10:04:26 – 10:05:00 UTC):
5-minute sustained test (10:05:14 – 10:10:41 UTC):
Key Endpoints Verified (all 200 OK)
Audio Pipeline
Non-blocking Issues
Evidence FilesVerdictPASS — Audio recording pipeline fully functional. Desktop app authenticated, communicating with backend, Deepgram STT active, conversations created over 5-minute sustained test.
|
Screen Capture E2E Verification — PASS ✅Following up on the audio-only E2E test. Screen capture pipeline has now been verified end-to-end. Test Setup
Pipeline VerifiedEvidence: Backend Received Screen FramesEvidence: Full Round-Trip ConfirmedEvidence: Local PipelineKnown Issues
Evidence Files
Combined Verdict Update
Overall: PASS — Both audio and screen capture pipelines verified end-to-end on Mac Mini. |
Combined Re-Verification (v2) — Rebased PRsVerifier: noa | Date: 2026-03-10 | Branch: Locked SHAs (all rebased on main
|
| PR | Branch | SHA | Status |
|---|---|---|---|
| #5374 | collab/5302-integration | 78d15d27 |
✅ PASS |
| #5395 | fix/desktop-stt-backend-5393 | e2a88573 |
✅ PASS |
| #5413 | collab/5396-integration | 15bf1ec6 |
✅ PASS |
Merge Order: #5374 → #5395 → #5413
- test.sh conflict resolved (kept all entries from both PRs)
- AppState.swift auto-merged cleanly
Test Results
| Surface | Main (baseline) | Combined | Delta |
|---|---|---|---|
| Passed | 591 | 761 | +170 (new PR tests) |
| Failed | 139 | 105 | -34 (PR fixes) |
| Errors | 0 | 39 | +39 (GCP creds needed) |
| Regressions | — | 0 | — |
134 PR-specific tests all pass: auth_routes, from_segments, desktop_chat, chat_generate_title, conversations_count, focus_sessions, advice.
Architecture Review (Codex Audit)
- 0 CRITICAL, 4 WARNING (non-blocking)
- W1: Dead code
_verify_apple_id_token(auth.py:496-534) — never called - W2: Inconsistent URL resolution (BackendTranscriptionService vs BackendProactiveService)
- W3: Pre-existing in-function imports (not from these PRs)
- W4: Pre-existing test failures (not from these PRs)
Mac Mini E2E (agent-swift v0.1.0 + cliclick)
agent-swift connect --bundle-id com.omi.desktop-dev— full accessibility treeagent-swift press— interactive element pressing works (Back button)agent-swift snapshot -i— interactive element discovery- cliclick sidebar navigation: Dashboard → Chat → Memories → Tasks → Settings
- Screenshots via
screencapture -l <WID>for each page - All pages render correctly with proper UI elements
Remote Branch Sync
git merge-base --is-ancestor origin/collab/5302-integration origin/verify/noa-combined-5374-5395-5413-v2 → OK
git merge-base --is-ancestor origin/fix/desktop-stt-backend-5393 origin/verify/noa-combined-5374-5395-5413-v2 → OK
git merge-base --is-ancestor origin/collab/5396-integration origin/verify/noa-combined-5374-5395-5413-v2 → OK
Overall Verdict: ✅ PASS
All 3 rebased PRs verified with zero regressions, clean architecture, and Mac Mini E2E confirmation.
Onboarding + Recording E2E Verification (agent-swift v0.2.1)Mac Mini: beastoin-agents-f1-mac-mini | App: Omi Computer (me.omi.computer) PID 68352/68782 Onboarding Flow (5 screens)Triggered by deleting
Post-completion: "Transcription Error: DEEPGRAM_API_KEY not set" — correct behavior without Rust backend. Recording Flow
Evidence (accessibility tree assertions)Onboarding Step 1 (Notifications): Onboarding Step 4 (Tasks): Dashboard (post-onboarding): Key FindingApp bundle ID is VerdictPASS — Onboarding renders all expected screens with correct UI elements. Recording flow handles missing backend gracefully. No crashes or rendering issues. |


Combined Verification — Desktop Migration PRs
Verifier: noa (independent, did not author any of this code)
Authors: kai + ren
Merge Order
94c9130ff) — Desktop migration Rust → Python backend71a20c06e) — Desktop route STT through backend /v4/listen8b79e013f) — Desktop remove GEMINI_API_KEY, route proactive AI through /v4/listenCombined UAT Summary
Test Results
Codex Audit: 0 CRITICAL, 10 WARNING
ininstead of==(not introduced by these PRs)Auth Fix Verified
Commit
94c9130ffreplaces unsafe base64 JWT decode withfirebase_admin.auth.verify_id_token(). Fix is sound.Verification Steps Completed
Remote Sync
All 3 PR branches verified as ancestors of this combined branch.
Overall Verdict: PASS
Blockers: none
Ready for merge in order: #5374 → #5395 → #5413
🤖 Generated with Claude Code