Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Sep 14, 2025

Related GitHub Issue

Closes: #7966

Roo Code Task Context (Optional)

No Roo Code task context for this PR

Description

This PR implements a Redis-backed global FIFO queue for Evals runs. It ensures that only one run executes at a time, queues additional runs automatically, auto-advances when the active run completes, and minimally updates the Web UI to display status and allow canceling queued runs.

Key design points:

  • Redis keys
    • evals:run-queue (LIST) — FIFO of run IDs
    • evals:active-run (STRING with TTL) — currently executing run
    • evals:dispatcher:lock (STRING with TTL) — serializes dispatchers to avoid races
  • Separation of concerns
    • Web enqueue/dispatch helpers live in apps/web-evals/src/actions/queue.ts
    • CLI completion dispatch lives in packages/evals/src/cli/queue.ts
    • apps/web-evals/src/actions/runs.ts enqueues instead of spawning directly, then triggers dispatch
  • Race safety
    • After dequeue, if setting evals:active-run fails (rare race), the popped id is LPUSH’d back to preserve FIFO ordering
  • Auto-advance
    • On completion, the CLI clears the active marker and dispatches the next run

Files changed:

  • Added queue actions and dispatcher (web): apps/web-evals/src/actions/queue.ts
  • Enqueue on createRun + trigger dispatch: apps/web-evals/src/actions/runs.ts
  • UI Status column + queued position + cancel:
    • apps/web-evals/src/components/home/runs.tsx
    • apps/web-evals/src/components/home/run.tsx
  • Auto-advance on completion (CLI) + queue helpers (CLI):
    • packages/evals/src/cli/runEvals.ts
    • packages/evals/src/cli/queue.ts

This PR supersedes and replaces the approach in PR #7971 by ensuring re-queue-on-failure after dequeue and providing a clearer separation of concerns between web and CLI sides.

Test Procedure

  • Unit tests (extension workspace):
    • cd src && npx vitest run
    • Result: 291 files, 3,804 tests passed; 48 skipped (baseline unchanged)
  • Manual verification (recommended):
    1. Launch web evals UI, create multiple runs quickly
    2. Observe:
      • First run shows “Running”
      • Subsequent runs show “Queued (#N)” with correct positions
      • Only one run executes at any time
    3. Cancel a queued run via the row menu — it should be removed from the queue and deleted
    4. Wait for a run to complete — next run should auto-dispatch

Pre-Submission Checklist

  • Issue Linked: Closes [ENHANCEMENT] Global FIFO queue for Evals runs (1 at a time) #7966
  • Scope: Changes are focused on global FIFO queue feature
  • Self-Review: Code reviewed and race conditions considered
  • Testing: Existing tests pass; manual verification steps included
  • Documentation Impact: No external docs required for minimal UI changes
  • Contribution Guidelines: Followed project conventions

Screenshots / Videos

No UI screenshots included — changes are minimal (Status column and Cancel action).

Documentation Updates

  • No documentation updates are required.

Additional Notes

  • TTL choices:
    • Dispatcher lock TTL set to 30s for stability on slower hosts
    • Active-run TTL is generous to reduce accidental expiry during long runs
  • Future improvement:
    • Refresh evals:active-run TTL alongside heartbeat ticks to reduce worst-case stall after crashes

Get in Touch

@hannesrudolph


Important

Implements a Redis-backed global FIFO queue for Evals runs, ensuring single execution, auto-advancing, and UI updates for status and cancellation.

  • Behavior:
    • Implements Redis-backed global FIFO queue for Evals runs, ensuring single execution at a time.
    • Auto-advances queue on run completion, with status updates and cancellation options in UI.
  • Web:
    • Adds queue actions in queue.ts and modifies runs.ts to enqueue runs and trigger dispatch.
    • Updates run.tsx and runs.tsx to display run status, queue position, and allow cancellation.
  • CLI:
    • Adds queue management in queue.ts and modifies runEvals.ts to auto-advance queue on completion.
  • Race Safety:
    • Ensures re-queue-on-failure after dequeue to maintain FIFO order.
  • Misc:

This description was created by Ellipsis for ef4b530. You can customize this summary. It will automatically update as commits are pushed.

Copilot AI review requested due to automatic review settings September 14, 2025 19:27
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Sep 14, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a Redis-backed global FIFO queue system for Evals runs to ensure only one evaluation executes at a time while automatically queuing and processing additional runs in order.

  • Replaces direct controller spawning with a queuing mechanism using Redis data structures
  • Adds queue status display and cancellation functionality to the web UI
  • Implements auto-advance logic that dispatches the next queued run when the active run completes

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pr-body.md PR description documentation
packages/evals/src/cli/runEvals.ts Adds queue cleanup call after run completion
packages/evals/src/cli/queue.ts Implements CLI-side queue management and controller spawning
apps/web-evals/src/components/home/runs.tsx Adds Status column to runs table
apps/web-evals/src/components/home/run.tsx Adds queue status display and cancel functionality
apps/web-evals/src/actions/runs.ts Replaces direct spawning with queue enqueue and dispatch
apps/web-evals/src/actions/queue.ts Implements web-side queue operations and Redis interactions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


const isCompleted = !!run.taskMetricsId
const isRunning = !!heartbeat
const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition queuePosition !== null && queuePosition !== undefined can be simplified to queuePosition != null which checks for both null and undefined in a single comparison.

Suggested change
const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined
const isQueued = !isCompleted && !isRunning && queuePosition != null

Copilot uses AI. Check for mistakes.

await spawnController(Number(nextId))
} finally {
await redis.del(DISPATCH_LOCK_KEY).catch(() => {})
Copy link

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an empty catch block silently ignores all errors. Consider adding a comment explaining why errors are being ignored or log the error for debugging purposes.

Suggested change
await redis.del(DISPATCH_LOCK_KEY).catch(() => {})
await redis.del(DISPATCH_LOCK_KEY).catch((err) => {
console.error("Failed to delete dispatcher lock key:", err)
})

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing the global FIFO queue for Evals runs! The implementation looks solid overall and successfully addresses the requirements from issue #7966. I've reviewed the changes and found some areas that need attention, particularly around race condition handling and error management.


const isCompleted = !!run.taskMetricsId
const isRunning = !!heartbeat
const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Copilot's suggestion here. Could we simplify this to use queuePosition != null which checks for both null and undefined?

Suggested change
const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined
const isQueued = !isCompleted && !isRunning && queuePosition != null


await spawnController(Number(nextId))
} finally {
await redis.del(DISPATCH_LOCK_KEY).catch(() => {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Copilot - could we add error logging here for debugging purposes? Silent failures make troubleshooting difficult in production.

Suggested change
await redis.del(DISPATCH_LOCK_KEY).catch(() => {})
await redis.del(DISPATCH_LOCK_KEY).catch((err) => {
console.error("Failed to delete dispatcher lock:", err)
})

const redis = await redisClient()

// Try to acquire dispatcher lock
const locked = await redis.set(DISPATCH_LOCK_KEY, "1", { NX: true, EX: DISPATCH_LOCK_TTL_SECONDS })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential race condition here? If the dispatcher lock expires (30s) while we're still processing, another dispatcher could start processing the same queue. Consider either:

  1. Extending the lock TTL if processing takes longer
  2. Adding a heartbeat to refresh the lock
  3. Using a longer initial TTL

await enqueueRun(run.id)
await dispatchNextRun()
} catch (error) {
console.error(error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error handling only logs to console. Should we consider:

  1. Throwing the error to surface it to the UI?
  2. Adding telemetry/monitoring?
  3. Returning an error status to the caller?

Silent failures could leave users confused about why their run isn't queued.

const RUN_QUEUE_KEY = "evals:run-queue"
const ACTIVE_RUN_KEY = "evals:active-run"
const DISPATCH_LOCK_KEY = "evals:dispatcher:lock"
const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 12-hour TTL seems quite generous. If a run crashes without clearing the active marker, the queue could be blocked for up to 12 hours. Consider:

  1. A shorter TTL (e.g., 2-4 hours)
  2. Implementing a heartbeat mechanism to refresh the TTL periodically
  3. Adding a manual "unlock" admin action for stuck queues

const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours
const DISPATCH_LOCK_TTL_SECONDS = 30

async function spawnController(runId: number) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is duplicated in packages/evals/src/cli/queue.ts. Could we extract this to a shared utility to maintain DRY principles and ensure consistency?

const logStream = fs.createWriteStream("/tmp/roo-code-evals.log", { flags: "a" })
childProcess.stdout?.pipe(logStream)
childProcess.stderr?.pipe(logStream)
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty catch blocks here and at line 85. For consistency with the web implementation, should we at least add a comment explaining why errors are ignored, or consider logging them?

const { data: heartbeat } = useQuery({
queryKey: ["getHeartbeat", run.id],
queryFn: () => getHeartbeat(run.id),
refetchInterval: 10_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10-second polling interval might be excessive for long-running queues. Consider making this configurable or using a progressive interval (e.g., start at 10s, increase to 30s after a few polls)?

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 14, 2025
@daniel-lxs daniel-lxs moved this from Triage to Issue [In Progress] in Roo Code Roadmap Sep 15, 2025
@daniel-lxs daniel-lxs moved this from Issue [In Progress] to Triage in Roo Code Roadmap Sep 15, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Sep 15, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Global FIFO queue for Evals runs (1 at a time)

2 participants