-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: global FIFO queue for Evals runs (#7966) #7981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a Redis-backed global FIFO queue system for Evals runs to ensure only one evaluation executes at a time while automatically queuing and processing additional runs in order.
- Replaces direct controller spawning with a queuing mechanism using Redis data structures
- Adds queue status display and cancellation functionality to the web UI
- Implements auto-advance logic that dispatches the next queued run when the active run completes
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pr-body.md | PR description documentation |
| packages/evals/src/cli/runEvals.ts | Adds queue cleanup call after run completion |
| packages/evals/src/cli/queue.ts | Implements CLI-side queue management and controller spawning |
| apps/web-evals/src/components/home/runs.tsx | Adds Status column to runs table |
| apps/web-evals/src/components/home/run.tsx | Adds queue status display and cancel functionality |
| apps/web-evals/src/actions/runs.ts | Replaces direct spawning with queue enqueue and dispatch |
| apps/web-evals/src/actions/queue.ts | Implements web-side queue operations and Redis interactions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| const isCompleted = !!run.taskMetricsId | ||
| const isRunning = !!heartbeat | ||
| const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined |
Copilot
AI
Sep 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition queuePosition !== null && queuePosition !== undefined can be simplified to queuePosition != null which checks for both null and undefined in a single comparison.
| const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined | |
| const isQueued = !isCompleted && !isRunning && queuePosition != null |
|
|
||
| await spawnController(Number(nextId)) | ||
| } finally { | ||
| await redis.del(DISPATCH_LOCK_KEY).catch(() => {}) |
Copilot
AI
Sep 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an empty catch block silently ignores all errors. Consider adding a comment explaining why errors are being ignored or log the error for debugging purposes.
| await redis.del(DISPATCH_LOCK_KEY).catch(() => {}) | |
| await redis.del(DISPATCH_LOCK_KEY).catch((err) => { | |
| console.error("Failed to delete dispatcher lock key:", err) | |
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing the global FIFO queue for Evals runs! The implementation looks solid overall and successfully addresses the requirements from issue #7966. I've reviewed the changes and found some areas that need attention, particularly around race condition handling and error management.
|
|
||
| const isCompleted = !!run.taskMetricsId | ||
| const isRunning = !!heartbeat | ||
| const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Copilot's suggestion here. Could we simplify this to use queuePosition != null which checks for both null and undefined?
| const isQueued = !isCompleted && !isRunning && queuePosition !== null && queuePosition !== undefined | |
| const isQueued = !isCompleted && !isRunning && queuePosition != null |
|
|
||
| await spawnController(Number(nextId)) | ||
| } finally { | ||
| await redis.del(DISPATCH_LOCK_KEY).catch(() => {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Copilot - could we add error logging here for debugging purposes? Silent failures make troubleshooting difficult in production.
| await redis.del(DISPATCH_LOCK_KEY).catch(() => {}) | |
| await redis.del(DISPATCH_LOCK_KEY).catch((err) => { | |
| console.error("Failed to delete dispatcher lock:", err) | |
| }) |
| const redis = await redisClient() | ||
|
|
||
| // Try to acquire dispatcher lock | ||
| const locked = await redis.set(DISPATCH_LOCK_KEY, "1", { NX: true, EX: DISPATCH_LOCK_TTL_SECONDS }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential race condition here? If the dispatcher lock expires (30s) while we're still processing, another dispatcher could start processing the same queue. Consider either:
- Extending the lock TTL if processing takes longer
- Adding a heartbeat to refresh the lock
- Using a longer initial TTL
| await enqueueRun(run.id) | ||
| await dispatchNextRun() | ||
| } catch (error) { | ||
| console.error(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error handling only logs to console. Should we consider:
- Throwing the error to surface it to the UI?
- Adding telemetry/monitoring?
- Returning an error status to the caller?
Silent failures could leave users confused about why their run isn't queued.
| const RUN_QUEUE_KEY = "evals:run-queue" | ||
| const ACTIVE_RUN_KEY = "evals:active-run" | ||
| const DISPATCH_LOCK_KEY = "evals:dispatcher:lock" | ||
| const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 12-hour TTL seems quite generous. If a run crashes without clearing the active marker, the queue could be blocked for up to 12 hours. Consider:
- A shorter TTL (e.g., 2-4 hours)
- Implementing a heartbeat mechanism to refresh the TTL periodically
- Adding a manual "unlock" admin action for stuck queues
| const ACTIVE_RUN_TTL_SECONDS = 60 * 60 * 12 // 12 hours | ||
| const DISPATCH_LOCK_TTL_SECONDS = 30 | ||
|
|
||
| async function spawnController(runId: number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is duplicated in packages/evals/src/cli/queue.ts. Could we extract this to a shared utility to maintain DRY principles and ensure consistency?
| const logStream = fs.createWriteStream("/tmp/roo-code-evals.log", { flags: "a" }) | ||
| childProcess.stdout?.pipe(logStream) | ||
| childProcess.stderr?.pipe(logStream) | ||
| } catch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty catch blocks here and at line 85. For consistency with the web implementation, should we at least add a comment explaining why errors are ignored, or consider logging them?
| const { data: heartbeat } = useQuery({ | ||
| queryKey: ["getHeartbeat", run.id], | ||
| queryFn: () => getHeartbeat(run.id), | ||
| refetchInterval: 10_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 10-second polling interval might be excessive for long-running queues. Consider making this configurable or using a progressive interval (e.g., start at 10s, increase to 30s after a few polls)?
Related GitHub Issue
Closes: #7966
Roo Code Task Context (Optional)
No Roo Code task context for this PR
Description
This PR implements a Redis-backed global FIFO queue for Evals runs. It ensures that only one run executes at a time, queues additional runs automatically, auto-advances when the active run completes, and minimally updates the Web UI to display status and allow canceling queued runs.
Key design points:
Files changed:
This PR supersedes and replaces the approach in PR #7971 by ensuring re-queue-on-failure after dequeue and providing a clearer separation of concerns between web and CLI sides.
Test Procedure
Pre-Submission Checklist
Screenshots / Videos
No UI screenshots included — changes are minimal (Status column and Cancel action).
Documentation Updates
Additional Notes
Get in Touch
@hannesrudolph
Important
Implements a Redis-backed global FIFO queue for Evals runs, ensuring single execution, auto-advancing, and UI updates for status and cancellation.
queue.tsand modifiesruns.tsto enqueue runs and trigger dispatch.run.tsxandruns.tsxto display run status, queue position, and allow cancellation.queue.tsand modifiesrunEvals.tsto auto-advance queue on completion.This description was created by
for ef4b530. You can customize this summary. It will automatically update as commits are pushed.