-
Notifications
You must be signed in to change notification settings - Fork 2.6k
GHA evals #4472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GHA evals #4472
Conversation
- Add comprehensive evals.yml workflow for full testing and evaluation runs - Add evals-quick-test.yml for fast Docker Compose networking validation - Include documentation in GITHUB_ACTIONS.md - Workflows trigger on PRs and support manual dispatch for testing
| # expose: | ||
| # - 5432 | ||
| ports: | ||
| - "${EVALS_DB_PORT:-5432}:5432" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that we still need this if you want to run the evals natively instead of inside Docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, forgot about that case.
packages/evals/src/cli/runCi.ts
Outdated
|
|
||
| export const runCi = async ({ | ||
| concurrency = 1, | ||
| exercisesPerLanguage = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expand this once I've switched to a more powerful GHA runner via Blacksmith.
Description
Run evals in CI.
Triggered by adding the evals label to your PR:

Example run: https://github.com/RooCodeInc/Roo-Code/actions/runs/15563240184/job/43820736051?pr=4472
Important
Adds CI workflow for running evaluations with Docker support and refactors task processing scripts for improved logging and execution.
.github/workflows/evals.ymlto run evaluations in CI when the 'evals' label is added to a PR or triggered manually.docker compose.Dockerfile.runnerandDockerfile.webto include necessary packages and configurations for running evaluations.docker-compose.ymlto expose necessary ports and configure services fordb,redis,web, andrunner.processTaskandrunTaskinrunTask.tsto handle task execution and logging.runCiinrunCi.tsto manage CI-specific evaluation runs.runEvals.tsto handle task processing in a queue with logging.Loggerclass inutils.tsfor improved logging capabilities.route.tsto report service status.package.jsonscripts to include new Docker and evaluation commands.This description was created by
for 41df128. You can customize this summary. It will automatically update as commits are pushed.