prerelease: desktop migration #5374 #5395 #5413 #5537 by beastoin · Pull Request #5538 · BasedHardware/omi

beastoin · 2026-03-10T06:57:07Z

Prerelease: Desktop Migration — Rust Backend → Python Backend

Combines 4 verified desktop PRs into a single merge-ready branch. All independently verified by noa (PR #5506).

Merge Order (preserved in branch)

PR Desktop migration: Rust backend → Python backend (#5302) #5374 — Desktop migration: Rust backend → Python backend (Migrate desktop macOS app from Rust backend to Python backend #5302)
PR Desktop: route STT through backend /v4/listen, remove DEEPGRAM_API_KEY #5395 — Desktop: route STT through backend /v4/listen, remove DEEPGRAM_API_KEY
PR Desktop: remove GEMINI_API_KEY, route proactive AI through /v4/listen (#5396) #5413 — Desktop: remove GEMINI_API_KEY, route proactive AI through /v4/listen (Desktop: move proactive AI to /v4/listen, remove GEMINI_API_KEY #5396)
PR Desktop: use dev Firebase config for dev builds #5537 — Desktop: use dev Firebase config for dev builds

Verification Evidence

Combined verification by noa: Verify: Desktop migration PRs #5374 + #5395 + #5413 (combined) #5506

Conflict Resolution

backend/test.sh: kept all test entries from both sides
backend/routers/transcribe.py: formatting-only conflicts (same logic, different whitespace) — took black-formatted version
6 test files (add/add): took Desktop: use dev Firebase config for dev builds #5537 versions (latest)

Ancestry Check

All 4 sub-PR HEADs confirmed as ancestors:

collab/5302-integration (78d15d2) ✓
fix/desktop-stt-backend-5393 (e2a8857) ✓
collab/5396-integration (15bf1ec) ✓
collab/5396-ren-focus (6d8b57e) ✓

For manager

This PR is ready to merge. Regular merge (no squash).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WebSocket client that connects to /v4/listen with Bearer auth and sends screen_frame JSON messages. Routes focus_result responses back to callers via async continuations with frame_id correlation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

#5396) Replace direct Gemini API calls with backend WebSocket screen_frame messages. Context building (goals, tasks, memories, AI profile) moves server-side. Client becomes thin: encode JPEG→base64, send screen_frame, receive focus_result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#5396) Start WS connection when monitoring starts, disconnect on stop. Pass service to FocusAssistant (shared for future assistant types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…5396) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Vision handlers: analyzeFocus, extractTasks, extractMemories, generateAdvice (send screen_frame with analyze type, receive typed result via frame_id) Text handlers: generateLiveNote, requestProfile, rerankTasks, deduplicateTasks (send typed JSON message, receive result via single-slot continuation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace GeminiClient tool-calling loop with backendService.extractTasks(). Remove extractTaskSingleStage, refreshContext, vector/keyword search, validateTaskTitle — all LLM logic now server-side. -550 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace GeminiClient.sendRequest with backendService.extractMemories(). Remove prompt/schema building — all LLM logic now server-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace 2-phase Gemini tool-calling loop (execute_sql + vision) with backendService.generateAdvice(). Remove compressForGemini, getUserLanguage, buildActivitySummary, buildPhase1/2Tools — all LLM logic server-side. -560 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace GeminiClient with backendService.deduplicateTasks(). Remove prompt/schema building, local dedup logic — server handles everything. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace GeminiClient with backendService.rerankTasks(). Remove prompt/ schema building, context fetching — server handles reranking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>