prerelease: desktop migration #5374 #5395 #5413 #5537#5538
prerelease: desktop migration #5374 #5395 #5413 #5537#5538
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket client that connects to /v4/listen with Bearer auth and sends screen_frame JSON messages. Routes focus_result responses back to callers via async continuations with frame_id correlation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#5396) Replace direct Gemini API calls with backend WebSocket screen_frame messages. Context building (goals, tasks, memories, AI profile) moves server-side. Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#5396) Start WS connection when monitoring starts, disconnect on stop. Pass service to FocusAssistant (shared for future assistant types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice (send screen_frame with analyze type, receive typed result via frame_id) Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks (send typed JSON message, receive result via single-slot continuation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient tool-calling loop with backendService.extractTasks(). Remove extractTaskSingleStage, refreshContext, vector/keyword search, validateTaskTitle — all LLM logic now server-side. -550 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient.sendRequest with backendService.extractMemories(). Remove prompt/schema building — all LLM logic now server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with backendService.generateAdvice(). Remove compressForGemini, getUserLanguage, buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.deduplicateTasks(). Remove prompt/schema building, local dedup logic — server handles everything. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.rerankTasks(). Remove prompt/ schema building, context fetching — server handles reranking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket client that connects to /v4/listen with Bearer auth and sends screen_frame JSON messages. Routes focus_result responses back to callers via async continuations with frame_id correlation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#5396) Replace direct Gemini API calls with backend WebSocket screen_frame messages. Context building (goals, tasks, memories, AI profile) moves server-side. Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#5396) Start WS connection when monitoring starts, disconnect on stop. Pass service to FocusAssistant (shared for future assistant types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice (send screen_frame with analyze type, receive typed result via frame_id) Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks (send typed JSON message, receive result via single-slot continuation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient tool-calling loop with backendService.extractTasks(). Remove extractTaskSingleStage, refreshContext, vector/keyword search, validateTaskTitle — all LLM logic now server-side. -550 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient.sendRequest with backendService.extractMemories(). Remove prompt/schema building — all LLM logic now server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with backendService.generateAdvice(). Remove compressForGemini, getUserLanguage, buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.deduplicateTasks(). Remove prompt/schema building, local dedup logic — server handles everything. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.rerankTasks(). Remove prompt/ schema building, context fetching — server handles reranking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-stage Gemini profile generation with backendService.requestProfile(). Remove fetchDataSources, buildPrompt, buildConsolidationPrompt — server fetches user data from Firestore and generates profile server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ts (#5396) Pass shared BackendProactiveService to all 4 assistants and 3 text-only services. Remove do/catch since inits no longer throw. Update AdviceTestRunnerWindow fallback creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace direct GeminiClient usage with BackendProactiveService. Uses configure(backendService:) singleton pattern matching other text-based services. Prompt logic moves server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add configure(backendService:) call for LiveNotesMonitor alongside other singleton text-based services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update GoogleService-Info-Dev.plist with dev Firebase values: API_KEY, PROJECT_ID, STORAGE_BUCKET, GCM_SENDER_ID, GOOGLE_APP_ID. Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dev builds load GoogleService-Info-Dev.plist (via run.sh), prod builds load GoogleService-Info.plist. AuthService now reads API_KEY from whichever plist is in the bundle, with prod key as fallback. Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dev.sh builds Omi Dev (com.omi.desktop-dev) but was copying the prod GoogleService-Info.plist. Now uses the same dev plist logic as run.sh. Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reset-and-run.sh builds Omi Dev (com.omi.desktop-dev) but was copying the prod GoogleService-Info.plist. Now uses the same dev plist logic as run.sh. Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CODEx review: dev builds should not silently use prod credentials. Now logs a FATAL warning if GoogleService-Info.plist is missing or has no API_KEY in a dev build (bundle ID ending in -dev). Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ck to prod CODEx review round 2: logging is not fail-fast. Dev builds now crash with fatalError if GoogleService-Info.plist has no API_KEY, preventing silent use of prod credentials. Prod builds still fall back safely. Fixes #5536 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e DEEPGRAM_API_KEY
…hrough /v4/listen (#5396) Resolved test.sh conflict: kept all entries from both sides.
Resolved conflicts: test.sh (kept all entries), transcribe.py (formatting only), test files (took #5537 versions).
Greptile SummaryThis PR combines four desktop migration sub-PRs, migrating the macOS Omi desktop app from a standalone Rust backend to the shared Python backend. The core changes are:
Key issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Desktop as Desktop App (Swift)
participant Auth as AuthService
participant Backend as Python Backend
participant Firebase as Firebase Auth
note over Desktop,Firebase: Auth Flow (auth.py)
Desktop->>Backend: GET /v1/auth/authorize (provider, redirect_uri)
Backend->>Backend: Validate redirect_uri scheme (omi://, omi-computer://)
Backend->>Firebase: OAuth redirect (Google/Apple)
Firebase-->>Backend: OAuth callback with id_token
Backend->>Firebase: REST signInWithIdp (FIREBASE_API_KEY)
alt REST succeeds
Firebase-->>Backend: localId (Firebase UID)
else REST fails — fallback (BROKEN)
Backend->>Firebase: verify_id_token(oauth_token) ← always throws
note over Backend: InvalidIdTokenError: wrong aud claim
end
Backend->>Firebase: create_custom_token(uid)
Backend-->>Desktop: auth_code via redirect_uri
note over Desktop,Backend: STT / Proactive AI Flow (transcribe.py)
Desktop->>Backend: WS /v4/listen?language=en&source=desktop
loop Audio streaming
Desktop->>Backend: Binary PCM16 frames (mono, 16kHz)
Backend-->>Desktop: transcript segment JSON
end
Desktop->>Backend: JSON {type: screen_frame, analyze: [focus,tasks,...]}
par Fan-out handlers
Backend->>Backend: analyze_focus(uid, image_b64)
Backend-->>Desktop: {type: focus_result, ...}
and
Backend->>Backend: extract_tasks(uid, image_b64)
Backend-->>Desktop: {type: tasks_extracted, ...}
end
Desktop->>Backend: JSON {type: live_notes_text, text: ...}
Backend->>Backend: generate_live_note(text)
Backend-->>Desktop: {type: live_note, text: ...}
|
| due_start = f'{date_str}T00:00:00Z' | ||
| due_end = f'{date_str}T23:59:59.999Z' | ||
|
|
||
| completed, total = staged_tasks_db.get_action_items_for_daily_score(uid, due_start, due_end) |
There was a problem hiding this comment.
datetime.now() without timezone causes UTC mismatch
datetime.now() returns the server's local time with no timezone info. The date string is then combined with hardcoded UTC markers (T00:00:00Z), so if the server ever runs in a non-UTC timezone the query window will be silently wrong. The same issue exists on line 263 in get_scores. Use datetime.now(timezone.utc) consistently:
| completed, total = staged_tasks_db.get_action_items_for_daily_score(uid, due_start, due_end) | |
| parsed = datetime.now(timezone.utc).date() |
And on line 263:
parsed = datetime.now(timezone.utc).date()| if len(status_list) > 10: | ||
| raise HTTPException(status_code=400, detail="Too many status values (max 10)") | ||
| try: | ||
| count = conversations_db.count_conversations(uid, statuses=status_list) |
There was a problem hiding this comment.
O(n) fallback for conversation count can be very slow
When count_conversations raises (e.g., if the aggregation query isn't supported), the fallback calls stream_conversations and materialises every document in Python just to count them. For users with thousands of conversations this is extremely slow and will hold a thread for seconds.
Consider either surfacing the exception directly (so it's visible and can be fixed at the DB layer), or implementing an explicit count with a CollectionReference.count() / AggregateQuery at the Firestore level, which is O(1) and avoids the full scan:
except Exception as e:
logger.warning(f'count_conversations aggregation fallback: {e}')
# Return 0 rather than scanning all documents; the caller can treat
# this as "unknown" and retry when the DB layer supports aggregates.
count = 0| /// Send a screen_frame for task extraction. | ||
| func extractTasks(imageBase64: String, appName: String, windowTitle: String) async throws -> TasksExtractedResult { | ||
| guard isConnected else { throw ServiceError.notConnected } | ||
| let frameId = UUID().uuidString | ||
| let jsonString = try buildScreenFrameJSON(frameId: frameId, analyzeTypes: ["tasks"], imageBase64: imageBase64, appName: appName, windowTitle: windowTitle) | ||
|
|
||
| return try await withCheckedThrowingContinuation { continuation in | ||
| requestLock.lock() | ||
| pendingTasksRequests[frameId] = continuation | ||
| requestLock.unlock() |
There was a problem hiding this comment.
isConnected checked outside the lock — potential TOCTOU race
isConnected is read without holding requestLock at the top of each handler (e.g. guard isConnected else { throw ... }), while disconnect() sets isConnected = false and calls cancelAllPending under the lock. There is a window where:
- Thread A passes the
guard isConnectedcheck. - Thread B calls
disconnect()→cancelAllPending()clears all pending continuations. - Thread A registers its new continuation after the lock-protected removal loop is done — the continuation is never cancelled and will eventually time out instead of failing immediately.
For a proactive-AI service the impact is bounded (timeout fires after 30s), but for correctness the guard isConnected check should be inside the lock together with the continuation insertion, or isConnected should be accessed via an actor.
return try await withCheckedThrowingContinuation { continuation in
requestLock.lock()
guard isConnected else {
requestLock.unlock()
continuation.resume(throwing: ServiceError.notConnected)
return
}
pendingFocusRequests[frameId] = continuation
requestLock.unlock()
sendAndTimeout(...)
}The same pattern applies to extractTasks, extractMemories, generateAdvice, generateLiveNote, requestProfile, rerankTasks, and deduplicateTasks.
|
Closing — manager needs to verify noa's combined verification branch first before a prerelease PR. Will reopen or create a new one after verification is confirmed. by AI for @beastoin |
|
Hey @beastoin 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
Prerelease: Desktop Migration — Rust Backend → Python Backend
Combines 4 verified desktop PRs into a single merge-ready branch. All independently verified by noa (PR #5506).
Merge Order (preserved in branch)
Verification Evidence
Conflict Resolution
backend/test.sh: kept all test entries from both sidesbackend/routers/transcribe.py: formatting-only conflicts (same logic, different whitespace) — took black-formatted versionAncestry Check
All 4 sub-PR HEADs confirmed as ancestors:
collab/5302-integration(78d15d2) ✓fix/desktop-stt-backend-5393(e2a8857) ✓collab/5396-integration(15bf1ec) ✓collab/5396-ren-focus(6d8b57e) ✓For manager
This PR is ready to merge. Regular merge (no squash).