Skip to content

prerelease: desktop migration #5374 #5395 #5413 #5537#5538

Closed
beastoin wants to merge 160 commits intomainfrom
prerelease/5374-5395-5413-5537
Closed

prerelease: desktop migration #5374 #5395 #5413 #5537#5538
beastoin wants to merge 160 commits intomainfrom
prerelease/5374-5395-5413-5537

Conversation

@beastoin
Copy link
Collaborator

Prerelease: Desktop Migration — Rust Backend → Python Backend

Combines 4 verified desktop PRs into a single merge-ready branch. All independently verified by noa (PR #5506).

Merge Order (preserved in branch)

  1. PR Desktop migration: Rust backend → Python backend (#5302) #5374 — Desktop migration: Rust backend → Python backend (Migrate desktop macOS app from Rust backend to Python backend #5302)
  2. PR Desktop: route STT through backend /v4/listen, remove DEEPGRAM_API_KEY #5395 — Desktop: route STT through backend /v4/listen, remove DEEPGRAM_API_KEY
  3. PR Desktop: remove GEMINI_API_KEY, route proactive AI through /v4/listen (#5396) #5413 — Desktop: remove GEMINI_API_KEY, route proactive AI through /v4/listen (Desktop: move proactive AI to /v4/listen, remove GEMINI_API_KEY #5396)
  4. PR Desktop: use dev Firebase config for dev builds #5537 — Desktop: use dev Firebase config for dev builds

Verification Evidence

Conflict Resolution

  • backend/test.sh: kept all test entries from both sides
  • backend/routers/transcribe.py: formatting-only conflicts (same logic, different whitespace) — took black-formatted version
  • 6 test files (add/add): took Desktop: use dev Firebase config for dev builds #5537 versions (latest)

Ancestry Check

All 4 sub-PR HEADs confirmed as ancestors:

  • collab/5302-integration (78d15d2) ✓
  • fix/desktop-stt-backend-5393 (e2a8857) ✓
  • collab/5396-integration (15bf1ec) ✓
  • collab/5396-ren-focus (6d8b57e) ✓

For manager

This PR is ready to merge. Regular merge (no squash).

beastoin and others added 30 commits March 7, 2026 05:13
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket client that connects to /v4/listen with Bearer auth and
sends screen_frame JSON messages. Routes focus_result responses back
to callers via async continuations with frame_id correlation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#5396)

Replace direct Gemini API calls with backend WebSocket screen_frame messages.
Context building (goals, tasks, memories, AI profile) moves server-side.
Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#5396)

Start WS connection when monitoring starts, disconnect on stop.
Pass service to FocusAssistant (shared for future assistant types).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice
(send screen_frame with analyze type, receive typed result via frame_id)

Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks
(send typed JSON message, receive result via single-slot continuation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient tool-calling loop with backendService.extractTasks().
Remove extractTaskSingleStage, refreshContext, vector/keyword search,
validateTaskTitle — all LLM logic now server-side. -550 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient.sendRequest with backendService.extractMemories().
Remove prompt/schema building — all LLM logic now server-side.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with
backendService.generateAdvice(). Remove compressForGemini, getUserLanguage,
buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.deduplicateTasks(). Remove
prompt/schema building, local dedup logic — server handles everything.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.rerankTasks(). Remove prompt/
schema building, context fetching — server handles reranking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
beastoin and others added 25 commits March 10, 2026 03:15
WebSocket client that connects to /v4/listen with Bearer auth and
sends screen_frame JSON messages. Routes focus_result responses back
to callers via async continuations with frame_id correlation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#5396)

Replace direct Gemini API calls with backend WebSocket screen_frame messages.
Context building (goals, tasks, memories, AI profile) moves server-side.
Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#5396)

Start WS connection when monitoring starts, disconnect on stop.
Pass service to FocusAssistant (shared for future assistant types).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…5396)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice
(send screen_frame with analyze type, receive typed result via frame_id)

Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks
(send typed JSON message, receive result via single-slot continuation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient tool-calling loop with backendService.extractTasks().
Remove extractTaskSingleStage, refreshContext, vector/keyword search,
validateTaskTitle — all LLM logic now server-side. -550 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient.sendRequest with backendService.extractMemories().
Remove prompt/schema building — all LLM logic now server-side.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with
backendService.generateAdvice(). Remove compressForGemini, getUserLanguage,
buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.deduplicateTasks(). Remove
prompt/schema building, local dedup logic — server handles everything.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace GeminiClient with backendService.rerankTasks(). Remove prompt/
schema building, context fetching — server handles reranking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 2-stage Gemini profile generation with backendService.requestProfile().
Remove fetchDataSources, buildPrompt, buildConsolidationPrompt — server
fetches user data from Firestore and generates profile server-side.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ts (#5396)

Pass shared BackendProactiveService to all 4 assistants and 3 text-only
services. Remove do/catch since inits no longer throw. Update
AdviceTestRunnerWindow fallback creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace direct GeminiClient usage with BackendProactiveService.
Uses configure(backendService:) singleton pattern matching other
text-based services. Prompt logic moves server-side.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add configure(backendService:) call for LiveNotesMonitor alongside
other singleton text-based services.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update GoogleService-Info-Dev.plist with dev Firebase values:
API_KEY, PROJECT_ID, STORAGE_BUCKET, GCM_SENDER_ID, GOOGLE_APP_ID.

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dev builds load GoogleService-Info-Dev.plist (via run.sh), prod builds
load GoogleService-Info.plist. AuthService now reads API_KEY from
whichever plist is in the bundle, with prod key as fallback.

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dev.sh builds Omi Dev (com.omi.desktop-dev) but was copying the prod
GoogleService-Info.plist. Now uses the same dev plist logic as run.sh.

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reset-and-run.sh builds Omi Dev (com.omi.desktop-dev) but was copying
the prod GoogleService-Info.plist. Now uses the same dev plist logic
as run.sh.

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CODEx review: dev builds should not silently use prod credentials.
Now logs a FATAL warning if GoogleService-Info.plist is missing or
has no API_KEY in a dev build (bundle ID ending in -dev).

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ck to prod

CODEx review round 2: logging is not fail-fast. Dev builds now crash
with fatalError if GoogleService-Info.plist has no API_KEY, preventing
silent use of prod credentials. Prod builds still fall back safely.

Fixes #5536

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hrough /v4/listen (#5396)

Resolved test.sh conflict: kept all entries from both sides.
Resolved conflicts: test.sh (kept all entries), transcribe.py (formatting only),
test files (took #5537 versions).
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR combines four desktop migration sub-PRs, migrating the macOS Omi desktop app from a standalone Rust backend to the shared Python backend. The core changes are:

  • STT routing: Direct Deepgram WebSocket connections are replaced by BackendTranscriptionService, which streams mono PCM audio to the backend's /v4/listen endpoint. The backend now owns conversation creation, so the client no longer calls createConversationFromSegments().
  • Proactive AI routing: All Gemini calls (focus, task extraction, memory, advice, live notes, profile, reranking, dedup) are replaced by BackendProactiveService, which sends typed JSON messages over the same /v4/listen WebSocket and receives structured results.
  • New Python backend endpoints: Desktop-specific routers are added for chat sessions (/v2/chat-sessions), staged tasks (/v1/staged-tasks), screen activity (/v1/screen-activity), focus sessions, advice, and a /v1/conversations/from-segments fallback endpoint.
  • Auth consolidation: AuthService now resolves OMI_API_URL at runtime (same as APIClient) instead of a hardcoded Cloud Run URL, and reads API_KEY from the active GoogleService-Info.plist so dev builds use dev Firebase credentials. A omi-computer:// / omi-computer-dev:// redirect URI allowlist is enforced server-side.
  • Env variable cleanup: DEEPGRAM_API_KEY and GEMINI_API_KEY are removed from the desktop .env.example and all call sites.

Key issues found:

  • Critical: The Admin SDK fallback in _generate_custom_token (auth.py:151) calls firebase_admin.auth.verify_id_token() with a raw Google/Apple OAuth token, which will always raise an InvalidIdTokenError. This makes the fallback path non-functional and will surface as a 500 for any user whose auth triggers that branch.
  • Bug: datetime.now() (local time, no timezone) is used in staged_tasks.py score endpoints while the resulting date strings use UTC Z suffixes — scores will be incorrect on non-UTC servers.
  • Concurrency: BackendProactiveService checks isConnected outside requestLock before inserting continuations, creating a TOCTOU window where a racing disconnect() could leave orphaned continuations that only time out after 30–60 seconds.
  • Performance: The O(n) stream_conversations fallback in the /v1/conversations/count endpoint will be very slow for users with large conversation histories.

Confidence Score: 2/5

  • Not safe to merge — the auth fallback bug can silently break sign-in for users whose Firebase API key has restrictions.
  • The _generate_custom_token fallback path in auth.py is demonstrably broken: firebase_admin.auth.verify_id_token cannot verify raw Google or Apple OAuth tokens. Depending on deployment config (API key restrictions), this can block all desktop sign-ins. The datetime.now() timezone issue in staged tasks can cause incorrect daily score calculations. These two issues lower confidence significantly despite the overall migration being well-structured and thoroughly tested.
  • Pay close attention to backend/routers/auth.py (broken fallback) and backend/routers/staged_tasks.py (datetime timezone).

Important Files Changed

Filename Overview
backend/routers/auth.py Adds omi-computer:// and omi-computer-dev:// redirect URI validation (good) and a two-path Firebase token strategy, but the Admin SDK fallback calls verify_id_token with raw OAuth tokens, which will always fail.
backend/routers/staged_tasks.py New router for desktop staged tasks with CRUD, batch score updates, and promote logic. datetime.now() used instead of datetime.now(timezone.utc) in daily/weekly score endpoints, risking off-by-one date windows on non-UTC servers.
backend/routers/transcribe.py Adds screen_frame fan-out handlers (focus, tasks, memories, advice) and text-only message types (live_notes_text, profile_request, task_rerank, task_dedup) into the WebSocket stream handler. Explicit parameter passing prevents closure-capture bugs.
backend/routers/conversations.py Adds /v1/conversations/from-segments and /v1/conversations/count endpoints. Route ordering is correct (count before {conversation_id}). The count_conversations O(n) fallback could be slow for large datasets.
desktop/Desktop/Sources/BackendTranscriptionService.swift New service replacing direct Deepgram WebSocket connection with the OMI backend /v4/listen. Well-structured with exponential backoff, keepalive, watchdog, and audio buffering (~100ms chunks). isConnected is read without synchronization in some paths.
desktop/Desktop/Sources/ProactiveAssistants/Core/BackendProactiveService.swift New WebSocket client routing all proactive AI (focus, tasks, memories, advice, live notes, profile, rerank, dedup) through the backend instead of calling Gemini directly. isConnected guard is checked outside the lock in all handler methods, creating a TOCTOU window.
desktop/Desktop/Sources/AppState.swift Switches transcriptionService type from TranscriptionService to BackendTranscriptionService, removes Deepgram key dependency, sets backendOwnsConversation = true to skip client-side conversation upload. Error handling migrated from try/catch to onError callback.
desktop/Desktop/Sources/AuthService.swift Auth service now reads OMI_API_URL (same as APIClient) instead of a hardcoded Cloud Run URL, and reads API_KEY from the active GoogleService-Info.plist at runtime so dev builds use dev Firebase credentials.
desktop/Desktop/Sources/APIClient.swift Migrates multiple endpoint URLs from Rust backend paths to Python backend paths (v2/messagesv2/messages/save, agent routes, persona routes). Removes LLM usage tracking and chat message count as no-ops, and makes AgentProvisionResponse fields optional to match new schema.
backend/routers/chat.py Adds desktop-specific chat session management endpoints (/v2/chat-sessions CRUD), message persistence without LLM pipeline (/v2/messages/save), message rating, and a chat title generation endpoint. Logic is clean; uses proper UUID generation and Firestore timestamps.

Sequence Diagram

sequenceDiagram
    participant Desktop as Desktop App (Swift)
    participant Auth as AuthService
    participant Backend as Python Backend
    participant Firebase as Firebase Auth

    note over Desktop,Firebase: Auth Flow (auth.py)
    Desktop->>Backend: GET /v1/auth/authorize (provider, redirect_uri)
    Backend->>Backend: Validate redirect_uri scheme (omi://, omi-computer://)
    Backend->>Firebase: OAuth redirect (Google/Apple)
    Firebase-->>Backend: OAuth callback with id_token
    Backend->>Firebase: REST signInWithIdp (FIREBASE_API_KEY)
    alt REST succeeds
        Firebase-->>Backend: localId (Firebase UID)
    else REST fails — fallback (BROKEN)
        Backend->>Firebase: verify_id_token(oauth_token) ← always throws
        note over Backend: InvalidIdTokenError: wrong aud claim
    end
    Backend->>Firebase: create_custom_token(uid)
    Backend-->>Desktop: auth_code via redirect_uri

    note over Desktop,Backend: STT / Proactive AI Flow (transcribe.py)
    Desktop->>Backend: WS /v4/listen?language=en&source=desktop
    loop Audio streaming
        Desktop->>Backend: Binary PCM16 frames (mono, 16kHz)
        Backend-->>Desktop: transcript segment JSON
    end
    Desktop->>Backend: JSON {type: screen_frame, analyze: [focus,tasks,...]}
    par Fan-out handlers
        Backend->>Backend: analyze_focus(uid, image_b64)
        Backend-->>Desktop: {type: focus_result, ...}
    and
        Backend->>Backend: extract_tasks(uid, image_b64)
        Backend-->>Desktop: {type: tasks_extracted, ...}
    end
    Desktop->>Backend: JSON {type: live_notes_text, text: ...}
    Backend->>Backend: generate_live_note(text)
    Backend-->>Desktop: {type: live_note, text: ...}
Loading

Comments Outside Diff (1)

  1. backend/routers/auth.py, line 151-154 (link)

    Admin SDK fallback will always fail for OAuth tokens

    firebase_admin.auth.verify_id_token(id_token) is designed to verify Firebase ID tokens (JWTs with aud equal to the Firebase project ID, signed by Firebase). However, id_token here is a raw Google or Apple OAuth token, which has a different aud (the Google client ID) and is signed by Google/Apple — not Firebase. This call will always throw a ValueError / InvalidIdTokenError like "Firebase ID token has incorrect 'aud' (audience) claim", making the entire fallback path non-functional.

    When FIREBASE_API_KEY is set but the REST signInWithIdp call fails (e.g., because the key has domain/IP restrictions), the user will see an auth failure instead of a graceful fallback.

    The correct approach for extracting an identity from a raw Google OAuth id_token is to use google.oauth2.id_token.verify_oauth2_token (from google-auth) with the appropriate CLIENT_ID as the audience, or for Apple tokens, use a dedicated Apple JWT verifier. Alternatively, if only the REST API path is reliable, remove the misleading fallback or gate it on the token issuer:

    # For Google: use google-auth to verify the raw OAuth id_token
    from google.oauth2 import id_token as google_id_token
    from google.auth.transport import requests as google_requests
    
    if provider == 'google':
        id_info = google_id_token.verify_oauth2_token(
            id_token, google_requests.Request(), audience=None  # skip aud check or pass client_id
        )
        email = id_info.get('email')

    Until fixed, the fallback branch (if not firebase_uid:) will raise an exception and surface as a 500 error to the client whenever the REST API fails.

Last reviewed commit: d4fede5

due_start = f'{date_str}T00:00:00Z'
due_end = f'{date_str}T23:59:59.999Z'

completed, total = staged_tasks_db.get_action_items_for_daily_score(uid, due_start, due_end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime.now() without timezone causes UTC mismatch

datetime.now() returns the server's local time with no timezone info. The date string is then combined with hardcoded UTC markers (T00:00:00Z), so if the server ever runs in a non-UTC timezone the query window will be silently wrong. The same issue exists on line 263 in get_scores. Use datetime.now(timezone.utc) consistently:

Suggested change
completed, total = staged_tasks_db.get_action_items_for_daily_score(uid, due_start, due_end)
parsed = datetime.now(timezone.utc).date()

And on line 263:

parsed = datetime.now(timezone.utc).date()

Comment on lines +259 to +262
if len(status_list) > 10:
raise HTTPException(status_code=400, detail="Too many status values (max 10)")
try:
count = conversations_db.count_conversations(uid, statuses=status_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(n) fallback for conversation count can be very slow

When count_conversations raises (e.g., if the aggregation query isn't supported), the fallback calls stream_conversations and materialises every document in Python just to count them. For users with thousands of conversations this is extremely slow and will hold a thread for seconds.

Consider either surfacing the exception directly (so it's visible and can be fixed at the DB layer), or implementing an explicit count with a CollectionReference.count() / AggregateQuery at the Firestore level, which is O(1) and avoids the full scan:

except Exception as e:
    logger.warning(f'count_conversations aggregation fallback: {e}')
    # Return 0 rather than scanning all documents; the caller can treat
    # this as "unknown" and retry when the DB layer supports aggregates.
    count = 0

Comment on lines +104 to +113
/// Send a screen_frame for task extraction.
func extractTasks(imageBase64: String, appName: String, windowTitle: String) async throws -> TasksExtractedResult {
guard isConnected else { throw ServiceError.notConnected }
let frameId = UUID().uuidString
let jsonString = try buildScreenFrameJSON(frameId: frameId, analyzeTypes: ["tasks"], imageBase64: imageBase64, appName: appName, windowTitle: windowTitle)

return try await withCheckedThrowingContinuation { continuation in
requestLock.lock()
pendingTasksRequests[frameId] = continuation
requestLock.unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isConnected checked outside the lock — potential TOCTOU race

isConnected is read without holding requestLock at the top of each handler (e.g. guard isConnected else { throw ... }), while disconnect() sets isConnected = false and calls cancelAllPending under the lock. There is a window where:

  1. Thread A passes the guard isConnected check.
  2. Thread B calls disconnect()cancelAllPending() clears all pending continuations.
  3. Thread A registers its new continuation after the lock-protected removal loop is done — the continuation is never cancelled and will eventually time out instead of failing immediately.

For a proactive-AI service the impact is bounded (timeout fires after 30s), but for correctness the guard isConnected check should be inside the lock together with the continuation insertion, or isConnected should be accessed via an actor.

return try await withCheckedThrowingContinuation { continuation in
    requestLock.lock()
    guard isConnected else {
        requestLock.unlock()
        continuation.resume(throwing: ServiceError.notConnected)
        return
    }
    pendingFocusRequests[frameId] = continuation
    requestLock.unlock()
    sendAndTimeout(...)
}

The same pattern applies to extractTasks, extractMemories, generateAdvice, generateLiveNote, requestProfile, rerankTasks, and deduplicateTasks.

@beastoin
Copy link
Collaborator Author

Closing — manager needs to verify noa's combined verification branch first before a prerelease PR. Will reopen or create a new one after verification is confirmed.


by AI for @beastoin

@beastoin beastoin closed this Mar 10, 2026
@github-actions
Copy link
Contributor

Hey @beastoin 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

  • Project standards — Ensuring consistency across the codebase
  • User needs — Making sure changes align with what our users need
  • Code best practices — Maintaining code quality and maintainability
  • Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant