feat: global FIFO queue for Evals runs (#7966) #7981

hannesrudolph · 2025-09-14T19:27:20Z

Related GitHub Issue

Closes: #7966

Roo Code Task Context (Optional)

No Roo Code task context for this PR

Description

This PR implements a Redis-backed global FIFO queue for Evals runs. It ensures that only one run executes at a time, queues additional runs automatically, auto-advances when the active run completes, and minimally updates the Web UI to display status and allow canceling queued runs.

Key design points:

Redis keys
- evals:run-queue (LIST) — FIFO of run IDs
- evals:active-run (STRING with TTL) — currently executing run
- evals:dispatcher:lock (STRING with TTL) — serializes dispatchers to avoid races
Separation of concerns
- Web enqueue/dispatch helpers live in apps/web-evals/src/actions/queue.ts
- CLI completion dispatch lives in packages/evals/src/cli/queue.ts
- apps/web-evals/src/actions/runs.ts enqueues instead of spawning directly, then triggers dispatch
Race safety
- After dequeue, if setting evals:active-run fails (rare race), the popped id is LPUSH’d back to preserve FIFO ordering
Auto-advance
- On completion, the CLI clears the active marker and dispatches the next run

Files changed:

Added queue actions and dispatcher (web): apps/web-evals/src/actions/queue.ts
Enqueue on createRun + trigger dispatch: apps/web-evals/src/actions/runs.ts
UI Status column + queued position + cancel:
- apps/web-evals/src/components/home/runs.tsx
- apps/web-evals/src/components/home/run.tsx
Auto-advance on completion (CLI) + queue helpers (CLI):
- packages/evals/src/cli/runEvals.ts
- packages/evals/src/cli/queue.ts

This PR supersedes and replaces the approach in PR #7971 by ensuring re-queue-on-failure after dequeue and providing a clearer separation of concerns between web and CLI sides.

Test Procedure

Unit tests (extension workspace):
- cd src && npx vitest run
- Result: 291 files, 3,804 tests passed; 48 skipped (baseline unchanged)
Manual verification (recommended):
1. Launch web evals UI, create multiple runs quickly
2. Observe:
  - First run shows “Running”
  - Subsequent runs show “Queued (#N)” with correct positions
  - Only one run executes at any time
3. Cancel a queued run via the row menu — it should be removed from the queue and deleted
4. Wait for a run to complete — next run should auto-dispatch

Pre-Submission Checklist

Issue Linked: Closes [ENHANCEMENT] Global FIFO queue for Evals runs (1 at a time) #7966
Scope: Changes are focused on global FIFO queue feature
Self-Review: Code reviewed and race conditions considered
Testing: Existing tests pass; manual verification steps included
Documentation Impact: No external docs required for minimal UI changes
Contribution Guidelines: Followed project conventions

Screenshots / Videos

No UI screenshots included — changes are minimal (Status column and Cancel action).

Documentation Updates

No documentation updates are required.

Additional Notes

TTL choices:
- Dispatcher lock TTL set to 30s for stability on slower hosts
- Active-run TTL is generous to reduce accidental expiry during long runs
Future improvement:
- Refresh evals:active-run TTL alongside heartbeat ticks to reduce worst-case stall after crashes

Get in Touch

@hannesrudolph

Important

Implements a Redis-backed global FIFO queue for Evals runs, ensuring single execution, auto-advancing, and UI updates for status and cancellation.

Behavior:
- Implements Redis-backed global FIFO queue for Evals runs, ensuring single execution at a time.
- Auto-advances queue on run completion, with status updates and cancellation options in UI.
Web:
- Adds queue actions in queue.ts and modifies runs.ts to enqueue runs and trigger dispatch.
- Updates run.tsx and runs.tsx to display run status, queue position, and allow cancellation.
CLI:
- Adds queue management in queue.ts and modifies runEvals.ts to auto-advance queue on completion.
Race Safety:
- Ensures re-queue-on-failure after dequeue to maintain FIFO order.
Misc:
- Supersedes PR feat: implement global FIFO queue for Evals runs #7971 with improved separation of concerns between web and CLI.

^{This description was created by}^{for ef4b530. You can customize this summary. It will automatically update as commits are pushed.}

Copilot

Pull Request Overview

This PR implements a Redis-backed global FIFO queue system for Evals runs to ensure only one evaluation executes at a time while automatically queuing and processing additional runs in order.

Replaces direct controller spawning with a queuing mechanism using Redis data structures
Adds queue status display and cancellation functionality to the web UI
Implements auto-advance logic that dispatches the next queued run when the active run completes

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pr-body.md	PR description documentation
packages/evals/src/cli/runEvals.ts	Adds queue cleanup call after run completion
packages/evals/src/cli/queue.ts	Implements CLI-side queue management and controller spawning
apps/web-evals/src/components/home/runs.tsx	Adds Status column to runs table
apps/web-evals/src/components/home/run.tsx	Adds queue status display and cancel functionality
apps/web-evals/src/actions/runs.ts	Replaces direct spawning with queue enqueue and dispatch
apps/web-evals/src/actions/queue.ts	Implements web-side queue operations and Redis interactions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-14T19:27:52Z

apps/web-evals/src/components/home/run.tsx

+
+	const isCompleted = !!run.taskMetricsId
+	const isRunning = !!heartbeat
+	const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined


The condition queuePosition !== null && queuePosition !== undefined can be simplified to queuePosition != null which checks for both null and undefined in a single comparison.

Suggested change

const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined

const isQueued = !isCompleted && !isRunning && queuePosition != null

Copilot · 2025-09-14T19:27:53Z

apps/web-evals/src/actions/queue.ts

+
+		await spawnController(Number(nextId))
+	} finally {
+		await redis.del(DISPATCH_LOCK_KEY).catch(() => {})


Using an empty catch block silently ignores all errors. Consider adding a comment explaining why errors are being ignored or log the error for debugging purposes.

Suggested change

await redis.del(DISPATCH_LOCK_KEY).catch(() => {})

await redis.del(DISPATCH_LOCK_KEY).catch((err) => {

console.error("Failed to delete dispatcher lock key:", err)

})

roomote

Thank you for implementing the global FIFO queue for Evals runs! The implementation looks solid overall and successfully addresses the requirements from issue #7966. I've reviewed the changes and found some areas that need attention, particularly around race condition handling and error management.

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/components/home/run.tsx

+
+	const isCompleted = !!run.taskMetricsId
+	const isRunning = !!heartbeat
+	const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined


I agree with Copilot's suggestion here. Could we simplify this to use queuePosition != null which checks for both null and undefined?

Suggested change

const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined

const isQueued = !isCompleted && !isRunning && queuePosition != null

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/actions/queue.ts

+
+		await spawnController(Number(nextId))
+	} finally {
+		await redis.del(DISPATCH_LOCK_KEY).catch(() => {})


I agree with Copilot - could we add error logging here for debugging purposes? Silent failures make troubleshooting difficult in production.

Suggested change

await redis.del(DISPATCH_LOCK_KEY).catch(() => {})

await redis.del(DISPATCH_LOCK_KEY).catch((err) => {

console.error("Failed to delete dispatcher lock:", err)

})

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/actions/queue.ts

+	const redis = await redisClient()
+
+	// Try to acquire dispatcher lock
+	const locked = await redis.set(DISPATCH_LOCK_KEY, "1", { NX: true, EX: DISPATCH_LOCK_TTL_SECONDS })


Is there a potential race condition here? If the dispatcher lock expires (30s) while we're still processing, another dispatcher could start processing the same queue. Consider either:

Extending the lock TTL if processing takes longer

Adding a heartbeat to refresh the lock

Using a longer initial TTL

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/actions/runs.ts

+		await enqueueRun(run.id)
+		await dispatchNextRun()
 	} catch (error) {
 		console.error(error)


This error handling only logs to console. Should we consider:

Throwing the error to surface it to the UI?

Adding telemetry/monitoring?

Returning an error status to the caller?

Silent failures could leave users confused about why their run isn't queued.

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/actions/queue.ts

+const RUN_QUEUE_KEY = "evals:run-queue"
+const ACTIVE_RUN_KEY = "evals:active-run"
+const DISPATCH_LOCK_KEY = "evals:dispatcher:lock"
+const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours


The 12-hour TTL seems quite generous. If a run crashes without clearing the active marker, the queue could be blocked for up to 12 hours. Consider:

A shorter TTL (e.g., 2-4 hours)

Implementing a heartbeat mechanism to refresh the TTL periodically

Adding a manual "unlock" admin action for stuck queues

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/actions/queue.ts

+const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours
+const DISPATCH_LOCK_TTL_SECONDS = 30
+
+async function spawnController(runId: number) {


This function is duplicated in packages/evals/src/cli/queue.ts. Could we extract this to a shared utility to maintain DRY principles and ensure consistency?

roomote · 2025-09-14T19:31:57Z

packages/evals/src/cli/queue.ts

+		const logStream = fs.createWriteStream("/tmp/roo-code-evals.log", { flags: "a" })
+		childProcess.stdout?.pipe(logStream)
+		childProcess.stderr?.pipe(logStream)
+	} catch {


Empty catch blocks here and at line 85. For consistency with the web implementation, should we at least add a comment explaining why errors are ignored, or consider logging them?

roomote · 2025-09-14T19:31:57Z

apps/web-evals/src/components/home/run.tsx

+	const { data: heartbeat } = useQuery({
+		queryKey: ["getHeartbeat", run.id],
+		queryFn: () => getHeartbeat(run.id),
+		refetchInterval: 10_000,


The 10-second polling interval might be excessive for long-running queues. Consider making this configurable or using a progressive interval (e.g., start at 10s, increase to 30s after a few polls)?

feat: global FIFO queue for Evals runs (#7966)

ef4b530

Copilot AI review requested due to automatic review settings September 14, 2025 19:27

hannesrudolph requested review from cte, jr and mrubens as code owners September 14, 2025 19:27

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Sep 14, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Sep 14, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Sep 14, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Sep 14, 2025

hannesrudolph mentioned this pull request Sep 14, 2025

feat: implement global FIFO queue for Evals runs #7971

Closed

Copilot AI reviewed Sep 14, 2025

View reviewed changes

hannesrudolph mentioned this pull request Sep 14, 2025

[ENHANCEMENT] Global FIFO queue for Evals runs (1 at a time) #7966

Closed

roomote bot reviewed Sep 14, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 14, 2025

daniel-lxs moved this from Triage to Issue [In Progress] in Roo Code Roadmap Sep 15, 2025

daniel-lxs moved this from Issue [In Progress] to Triage in Roo Code Roadmap Sep 15, 2025

hannesrudolph closed this Sep 15, 2025

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 15, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 15, 2025

	const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined
	const isQueued = !isCompleted && !isRunning && queuePosition != null

-		await redis.del(DISPATCH_LOCK_KEY).catch(() => {})
+		await redis.del(DISPATCH_LOCK_KEY).catch((err) => {
+			console.error("Failed to delete dispatcher lock key:", err)
+		})

feat: global FIFO queue for Evals runs (#7966) #7981

feat: global FIFO queue for Evals runs (#7966) #7981

Uh oh!

Conversation

hannesrudolph commented Sep 14, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Roo Code Task Context (Optional)

Description

Test Procedure

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hannesrudolph commented Sep 14, 2025 •

edited by ellipsis-dev bot

Loading