Skip to content

Conversation

@cte
Copy link
Collaborator

@cte cte commented Jun 9, 2025

Description

Run evals in CI.

Triggered by adding the evals label to your PR:
Screenshot 2025-06-10 at 8 48 07 AM

Example run: https://github.com/RooCodeInc/Roo-Code/actions/runs/15563240184/job/43820736051?pr=4472


Important

Adds CI workflow for running evaluations with Docker support and refactors task processing scripts for improved logging and execution.

  • CI Workflow:
    • Adds .github/workflows/evals.yml to run evaluations in CI when the 'evals' label is added to a PR or triggered manually.
    • Sets up Docker environment and runs evaluations using docker compose.
  • Docker:
    • Updates Dockerfile.runner and Dockerfile.web to include necessary packages and configurations for running evaluations.
    • Modifies docker-compose.yml to expose necessary ports and configure services for db, redis, web, and runner.
  • Scripts and Utilities:
    • Refactors processTask and runTask in runTask.ts to handle task execution and logging.
    • Introduces runCi in runCi.ts to manage CI-specific evaluation runs.
    • Updates runEvals.ts to handle task processing in a queue with logging.
    • Adds Logger class in utils.ts for improved logging capabilities.
  • Miscellaneous:
    • Adds health check endpoint in route.ts to report service status.
    • Updates package.json scripts to include new Docker and evaluation commands.

This description was created by Ellipsis for 41df128. You can customize this summary. It will automatically update as commits are pushed.

cte added 6 commits June 9, 2025 11:09
- Add comprehensive evals.yml workflow for full testing and evaluation runs
- Add evals-quick-test.yml for fast Docker Compose networking validation
- Include documentation in GITHUB_ACTIONS.md
- Workflows trigger on PRs and support manual dispatch for testing
@daniel-lxs daniel-lxs added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 9, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jun 10, 2025
@cte cte marked this pull request as ready for review June 10, 2025 15:45
@cte cte requested review from jr and mrubens as code owners June 10, 2025 15:45
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 10, 2025
@cte cte added the evals Run evals on this PR label Jun 10, 2025
# expose:
# - 5432
ports:
- "${EVALS_DB_PORT:-5432}:5432"
Copy link
Collaborator Author

@cte cte Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that we still need this if you want to run the evals natively instead of inside Docker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot about that case.


export const runCi = async ({
concurrency = 1,
exercisesPerLanguage = 1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll expand this once I've switched to a more powerful GHA runner via Blacksmith.

@cte cte removed the evals Run evals on this PR label Jun 10, 2025
@cte cte added the evals Run evals on this PR label Jun 10, 2025
@cte cte added evals Run evals on this PR and removed evals Run evals on this PR labels Jun 10, 2025
@cte cte added evals Run evals on this PR and removed evals Run evals on this PR labels Jun 10, 2025
@cte cte added evals Run evals on this PR and removed evals Run evals on this PR labels Jun 10, 2025
@cte cte added evals Run evals on this PR and removed evals Run evals on this PR labels Jun 10, 2025
@cte cte merged commit 8d5dab3 into main Jun 10, 2025
12 of 13 checks passed
@cte cte deleted the cte/gha-evals branch June 10, 2025 18:01
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Jun 10, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

evals Run evals on this PR Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants