Skip to content

GET /v1/conversations extreme latency (30-90s) from N+1 queries and full hydration #4904

@beastoin

Description

@beastoin

Bug Description

GET /v1/conversations regularly takes 30-70s (worst: 91.6s), causing 504 gateway timeouts at the Cloud Run 30s boundary. 20+ slow requests observed in a 12h window.

Root Cause

The list endpoint runs a full hydration pipeline instead of a lightweight list projection. Multiple compounding bottlenecks:

1. N+1 Firestore Photo Queries (biggest impact)

get_conversations() at backend/database/conversations.py:176 is decorated with @with_photos (:175), which fires a separate Firestore subcollection query per conversation via get_conversation_photos() (:139-143).

For limit=100: 1 main query + 100 photo subqueries = 101 serial Firestore round-trips.

2. Full Transcript Decrypt/Decompress on List View

@prepare_for_read decorator (:174) runs _prepare_conversation_for_read() on every item:

  • copy.deepcopy() always (:96)
  • AES decrypt + zlib decompress + JSON parse for enhanced-protection conversations (:45-55, :99-106)

Full transcript data is processed even though the list view does not need it.

3. Offset-Based Pagination - O(offset)

.limit(limit).offset(offset) at :213 - Firestore offset reads and discards N documents. Page 50 (offset=5000) reads ~5,100 docs but returns only 100.

No cursor-based pagination exists anywhere in the codebase.

4. Heavy Response Serialization

response_model=List[Conversation] (:116) serializes full conversation objects including transcript_segments and photos - large payloads for a list endpoint.

Execution Trace

GET /v1/conversations?limit=100&offset=0&statuses=processing,completed
  -> auth (Firebase token verify)
  -> conversations_db.get_conversations()         # 1 Firestore query
    -> @with_photos -> 100x get_conversation_photos()  # 100 Firestore queries (N+1)
    -> @prepare_for_read -> 100x deepcopy + conditional decrypt/decompress
  -> post-process locked conversations
  -> FastAPI response_model validation (List[Conversation])

Estimated cost: 6.5-14s minimum for 100 conversations at offset=0. Compounds to 30-90s with large datasets, deep offsets, or enhanced-protection users.

Existing Optimization (Unused)

get_conversations_without_photos() exists at :220 but is only used for Wrapped generation (backend/utils/wrapped/generate_2025.py:750). The main list endpoint always takes the slow path.

Suggested Fixes (Priority Order)

  1. Switch list endpoint to get_conversations_without_photos() or create a lightweight list variant that skips photos and transcript decryption
  2. Implement cursor-based pagination using created_at + doc ID as composite cursor (replaces .offset())
  3. Create a list-specific response model that excludes transcript_segments and photos (move to detail endpoint)
  4. Add missing composite indexes for common filter combinations (discarded + status + created_at DESC, etc.)

Severity

CRITICAL - Causes 504 timeouts for users with moderate conversation counts. Affects the primary conversation listing used by the mobile app, web app, and all API consumers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintainerLane: High-risk, cross-system changesp1Priority: Critical (score 22-29)retrieval-actionLayer: Search, questions, tasks, exports

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions