-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Problem
The GET /api/v1/workflows/runs?workflowId=xxx&limit=50 endpoint takes 1.5-2+ minutes to respond, making the UI feel unresponsive when loading workflow run lists.
Root Cause Analysis
After investigating the code in backend/src/workflows/workflows.service.ts, I found the issue is in the listRuns method:
async listRuns(...) {
const runs = await this.runRepository.list({...});
const summaries = await Promise.all(
runs.map((run) => this.buildRunSummary(run, organizationId)),
);
...
}The buildRunSummary method performs multiple operations for each run:
- Query workflow info from database
- Query version info from database
- Query trace event count from database
- Query event time range from database
- Call Temporal API to get workflow status ← This is the slowest part
With 7 workflows and 50+ runs total, each run triggers a separate Temporal describeWorkflow gRPC call. This creates an N+1 query problem with expensive external API calls.
Evidence from Logs
Backend logs show 53 sequential Temporal API calls:
[TemporalService] Describing workflow shipsec-run-2ee002f6-4472-489c-b506-2c88dc739ca4
[TemporalService] Describing workflow shipsec-run-a342035f-f0f4-4ce2-976a-e756838900dc
[TemporalService] Describing workflow shipsec-run-a8f32bc4-6801-4b3c-9794-bda14d215a90
... (50+ more)
Suggested Solutions
-
Cache Temporal status for completed workflows - Once a workflow reaches a terminal state (COMPLETED, FAILED, CANCELLED, TERMINATED, TIMED_OUT), its status won't change. Store this in the database and skip Temporal API calls for these runs.
-
Use Temporal's batch query API - Use
listWorkflowswith a filter instead of individualdescribeWorkflowcalls. -
Async status updates - Return database-cached status immediately, then update Temporal status asynchronously in the background.
-
Selective Temporal queries - Only query Temporal for runs with RUNNING status; use cached status for terminal states.
Environment
- macOS
- Local development setup
- 7 workflows, 50+ total runs
Expected Behavior
The endpoint should respond within a few hundred milliseconds, not minutes.
Actual Behavior
Response time: 1.5-2.3 minutes per request