-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug Description
GET /v1/conversations regularly takes 30-70s (worst: 91.6s), causing 504 gateway timeouts at the Cloud Run 30s boundary. 20+ slow requests observed in a 12h window.
Root Cause
The list endpoint runs a full hydration pipeline instead of a lightweight list projection. Multiple compounding bottlenecks:
1. N+1 Firestore Photo Queries (biggest impact)
get_conversations() at backend/database/conversations.py:176 is decorated with @with_photos (:175), which fires a separate Firestore subcollection query per conversation via get_conversation_photos() (:139-143).
For limit=100: 1 main query + 100 photo subqueries = 101 serial Firestore round-trips.
2. Full Transcript Decrypt/Decompress on List View
@prepare_for_read decorator (:174) runs _prepare_conversation_for_read() on every item:
copy.deepcopy()always (:96)- AES decrypt + zlib decompress + JSON parse for enhanced-protection conversations (
:45-55,:99-106)
Full transcript data is processed even though the list view does not need it.
3. Offset-Based Pagination - O(offset)
.limit(limit).offset(offset) at :213 - Firestore offset reads and discards N documents. Page 50 (offset=5000) reads ~5,100 docs but returns only 100.
No cursor-based pagination exists anywhere in the codebase.
4. Heavy Response Serialization
response_model=List[Conversation] (:116) serializes full conversation objects including transcript_segments and photos - large payloads for a list endpoint.
Execution Trace
GET /v1/conversations?limit=100&offset=0&statuses=processing,completed
-> auth (Firebase token verify)
-> conversations_db.get_conversations() # 1 Firestore query
-> @with_photos -> 100x get_conversation_photos() # 100 Firestore queries (N+1)
-> @prepare_for_read -> 100x deepcopy + conditional decrypt/decompress
-> post-process locked conversations
-> FastAPI response_model validation (List[Conversation])
Estimated cost: 6.5-14s minimum for 100 conversations at offset=0. Compounds to 30-90s with large datasets, deep offsets, or enhanced-protection users.
Existing Optimization (Unused)
get_conversations_without_photos() exists at :220 but is only used for Wrapped generation (backend/utils/wrapped/generate_2025.py:750). The main list endpoint always takes the slow path.
Suggested Fixes (Priority Order)
- Switch list endpoint to
get_conversations_without_photos()or create a lightweight list variant that skips photos and transcript decryption - Implement cursor-based pagination using
created_at+ doc ID as composite cursor (replaces.offset()) - Create a list-specific response model that excludes
transcript_segmentsandphotos(move to detail endpoint) - Add missing composite indexes for common filter combinations (
discarded + status + created_at DESC, etc.)
Severity
CRITICAL - Causes 504 timeouts for users with moderate conversation counts. Affects the primary conversation listing used by the mobile app, web app, and all API consumers.