Skip to content

Performance: /api/v1/workflows/runs endpoint takes 1.5-2+ minutes to respond #231

@fmzchao

Description

@fmzchao

Problem

The GET /api/v1/workflows/runs?workflowId=xxx&limit=50 endpoint takes 1.5-2+ minutes to respond, making the UI feel unresponsive when loading workflow run lists.

Root Cause Analysis

After investigating the code in backend/src/workflows/workflows.service.ts, I found the issue is in the listRuns method:

async listRuns(...) {
  const runs = await this.runRepository.list({...});
  const summaries = await Promise.all(
    runs.map((run) => this.buildRunSummary(run, organizationId)),
  );
  ...
}

The buildRunSummary method performs multiple operations for each run:

  1. Query workflow info from database
  2. Query version info from database
  3. Query trace event count from database
  4. Query event time range from database
  5. Call Temporal API to get workflow status ← This is the slowest part

With 7 workflows and 50+ runs total, each run triggers a separate Temporal describeWorkflow gRPC call. This creates an N+1 query problem with expensive external API calls.

Evidence from Logs

Backend logs show 53 sequential Temporal API calls:

[TemporalService] Describing workflow shipsec-run-2ee002f6-4472-489c-b506-2c88dc739ca4
[TemporalService] Describing workflow shipsec-run-a342035f-f0f4-4ce2-976a-e756838900dc
[TemporalService] Describing workflow shipsec-run-a8f32bc4-6801-4b3c-9794-bda14d215a90
... (50+ more)

Suggested Solutions

  1. Cache Temporal status for completed workflows - Once a workflow reaches a terminal state (COMPLETED, FAILED, CANCELLED, TERMINATED, TIMED_OUT), its status won't change. Store this in the database and skip Temporal API calls for these runs.

  2. Use Temporal's batch query API - Use listWorkflows with a filter instead of individual describeWorkflow calls.

  3. Async status updates - Return database-cached status immediately, then update Temporal status asynchronously in the background.

  4. Selective Temporal queries - Only query Temporal for runs with RUNNING status; use cached status for terminal states.

Environment

  • macOS
  • Local development setup
  • 7 workflows, 50+ total runs

Expected Behavior

The endpoint should respond within a few hundred milliseconds, not minutes.

Actual Behavior

Response time: 1.5-2.3 minutes per request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions