Skip to content

Evaluations cloud run jobs runner#547

Merged
warmbowski merged 44 commits intomainfrom
olmo-eval_cloudrun
Mar 23, 2026
Merged

Evaluations cloud run jobs runner#547
warmbowski merged 44 commits intomainfrom
olmo-eval_cloudrun

Conversation

@warmbowski
Copy link
Copy Markdown
Contributor

@warmbowski warmbowski commented Mar 10, 2026

Setup of evaluations application used for maintaining a configuration of model evaluations by "tier" (eval runs based on frequency). The code builds and deploys a docker file to Google Cloud Run to run the evaluations on schedule. The only dependency is the olmo-eval-internal cli tool for running evaluations. See the Readme.md for details.

  • Docker image can be built and evals run locally with the run_local.py utility
  • Docker image can be deployed manually using typical docker push command.
  • Jobs and scheduling can be deployed manually using terraform apply
  • Skiff2 is leveraged running setup/build/deploy on push to main.
  • ad-hoc evals can be executed with gcloud command line passing --updated-env-vars

TODOs:

  • switch setup action to shared GitHub actions when publicly available
  • switch build action to shared GitHub actions when publicly available AND olmo-eval-internal is publicly available
    • Remove GH Token access to olmo-eval-internal
  • add in storage configurations when shared database is accessible from GCP

closes https://github.com/allenai/playground-issues-repo/issues/994
closes https://github.com/allenai/playground-issues-repo/issues/990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants