-
Notifications
You must be signed in to change notification settings - Fork 0
Evaluations cloud run jobs runner #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
7814d4a
initial evaluations app
warmbowski 8b0060b
set up tiered eval config and docker image build/run locally
warmbowski c9b9ec5
get cloud run working, start adding skiff2 configs
warmbowski 98f0e92
change tier name
warmbowski a4d1f41
configure skiff2 build, push, and deploy
warmbowski a179b7c
cleanup unused code and add .env support
warmbowski 98f80df
replace shell scripts with python
warmbowski 69de4ed
remove scheule prop
warmbowski 2170148
fix registry path and comment out storage from jobs for now
warmbowski f253e7e
refactor to use local model provider config instead of model presets
warmbowski 3e7d640
remove click dep, and remove tier-list and job-list args
warmbowski b769fed
remove some logger formatting
warmbowski 528cc75
refactor harness overrides
warmbowski 516febb
make ci deploy and local deploy use the same script
warmbowski 9ac16f7
remove unused method
warmbowski 7758de7
test ci build and deploy
warmbowski 7256d6f
fix lint issues
warmbowski d73f798
workaround for skiff2 setup action.yaml
warmbowski 501753b
fix path
warmbowski 075110b
fix action path
warmbowski fc9272a
troubleshooting actions.yaml
warmbowski 7211936
fix filename
warmbowski 742d5a9
troubleshooting
warmbowski 5c01dcc
fix running skiff2 action
warmbowski 3676ddf
test ci actions
warmbowski cc1b3af
trigger gcr build and deploy
warmbowski 0d85973
fix ci to only run on main
warmbowski b404573
fix some of the lint rules
warmbowski 70eddae
generate jobs from templaet and deploy
warmbowski 52a0dbf
convert to using terraform and pydantic settings
warmbowski d0b5420
add standard logger to replace print statements
warmbowski 7380688
add uv setup to ci build
warmbowski c9a0628
add github token to ci so that it can load private repo
warmbowski 07efd97
remove test branch from ci build on push
warmbowski 8530bab
refactor to one cloud job that takes args for tier, and add support f…
warmbowski 84fd8d8
fix adhoc api_base
warmbowski 1782d06
refactor updated-env-vars format and add validation to parsing
warmbowski 42da390
fix lint
warmbowski 7e40dc4
refactor build-and-push-evals action to use docker actions
warmbowski 2f417f4
fix delimiter in adhoc overriedes
warmbowski 0b4a597
add parser unit tests
warmbowski fd40615
fix tests in ci
warmbowski b52959a
use pydantic settings, + small fixes
warmbowski 8e1221b
move run-local into app and update settings defaults
warmbowski File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| name: Setup GCP and Docker | ||
| description: Sets up Google Cloud authentication, Cloud SDK, and Docker for GCR | ||
| author: Skiff | ||
| # This is a temp workaround until skiff2 shared actions are accessible from public repos | ||
| # Please remove this file and switch to shared action when available. | ||
|
|
||
| inputs: | ||
| workload_identity_provider: | ||
| description: "Workload Identity Provider resource name (e.g. projects/123/locations/global/workloadIdentityPools/my-pool/providers/my-provider)" | ||
| required: true | ||
| service_account: | ||
| description: "Service account email to impersonate" | ||
| required: true | ||
| project_id: | ||
| description: "GCP project ID" | ||
| required: true | ||
|
|
||
| runs: | ||
| using: composite | ||
| steps: | ||
| - name: Check branch is main | ||
| shell: bash | ||
| run: | | ||
| if [ "${{ github.ref }}" != "refs/heads/main" ]; then | ||
| echo "This action can only run on the main branch. Current ref: ${{ github.ref }}" | ||
| exit 1 | ||
| fi | ||
|
|
||
| - name: Checkout calling repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Authenticate to Google Cloud | ||
| uses: google-github-actions/auth@v2 | ||
| with: | ||
| workload_identity_provider: ${{ inputs.workload_identity_provider }} | ||
| service_account: ${{ inputs.service_account }} | ||
|
|
||
| - name: Set up Cloud SDK | ||
| uses: google-github-actions/setup-gcloud@v2 | ||
| with: | ||
| project_id: ${{ inputs.project_id }} | ||
|
|
||
| - name: Configure Docker for GCR | ||
| shell: bash | ||
| run: gcloud auth configure-docker | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
|
|
||
| branding: | ||
| icon: cloud | ||
| color: blue | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| name: Build and Deploy Evaluations Cloud Run Jobs | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| paths: | ||
| - 'apps/evaluations/**' | ||
| - '.github/workflows/build-and-push-evals.yml' | ||
| pull_request: | ||
| paths: | ||
| - 'apps/evaluations/**' | ||
| - '.github/workflows/build-and-push-evals.yml' | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
| id-token: write | ||
|
|
||
| env: | ||
| SERVICE_NAME: evaluations | ||
| REGISTRY: us-west1-docker.pkg.dev | ||
| REPO: model-evals | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v6 | ||
|
|
||
| - name: Setup uv | ||
| uses: astral-sh/setup-uv@v7 | ||
|
|
||
| - name: Run Tests | ||
| working-directory: apps/evaluations | ||
| run: uv run --only-group dev pytest -v | ||
|
|
||
| build-and-deploy: | ||
| needs: test | ||
| if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' | ||
| runs-on: ubuntu-latest | ||
| environment: | ||
| name: ${{ github.ref_name }} | ||
| steps: | ||
| - uses: actions/checkout@v6 # remove this when switching back to shared action | ||
|
|
||
| - name: Skiff2 Setup | ||
| id: setup | ||
| uses: ./.github/actions/skiff2/setup # temporary workaround until share action is available | ||
| with: | ||
| workload_identity_provider: ${{ vars.SKIFF2_WORKLOAD_IDENTITY_PROVIDER }} | ||
| service_account: ${{ vars.SKIFF2_SERVICE_ACCOUNT }} | ||
| project_id: ${{ vars.SKIFF2_PROJECT_ID }} | ||
|
|
||
| # Configure Docker for Artifact Registry | ||
| - name: Configure Docker | ||
| run: gcloud auth configure-docker ${REGISTRY} --quiet | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
|
|
||
| # Custom build step for evaluations (handles GITHUB_TOKEN for private repo) | ||
| # Once olmo-eval-internal is public, this can be replaced with Skiff2 Build | ||
| - name: Build and Push Evaluations Image | ||
warmbowski marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| id: build | ||
| uses: docker/build-push-action@v6 | ||
| with: | ||
| context: apps/evaluations | ||
| file: apps/evaluations/Dockerfile | ||
| platforms: linux/amd64 | ||
| push: true | ||
| tags: | | ||
| ${{ env.REGISTRY }}/${{ vars.SKIFF2_PROJECT_ID }}/${{ env.REPO }}/${{ env.SERVICE_NAME }}:latest | ||
| ${{ env.REGISTRY }}/${{ vars.SKIFF2_PROJECT_ID }}/${{ env.REPO }}/${{ env.SERVICE_NAME }}:${{ github.sha }} | ||
| cache-from: type=gha | ||
| cache-to: type=gha,mode=max | ||
| secrets: | | ||
| GITHUB_TOKEN=${{ secrets.OLMO_EVAL_INTERNAL_TOKEN }} | ||
|
|
||
| # Setup uv for Python package management | ||
| - name: Setup uv | ||
| uses: astral-sh/setup-uv@v7 | ||
|
|
||
| # Configure git to use token for private repo access | ||
| - name: Configure Git for Private Repos | ||
| run: git config --global url."https://${{ secrets.OLMO_EVAL_INTERNAL_TOKEN }}@github.com/".insteadOf "https://github.com/" | ||
|
|
||
| # Generate Terraform variables from Python tier configs | ||
| - name: Generate Terraform Variables | ||
| working-directory: apps/evaluations | ||
| run: uv run generate-tfvars -o terraform/terraform.tfvars.json | ||
|
|
||
| # Setup Terraform | ||
| - name: Setup Terraform | ||
| uses: hashicorp/setup-terraform@v3 | ||
| with: | ||
| terraform_version: "1.5" | ||
|
|
||
| # Deploy with Terraform | ||
| - name: Terraform Init | ||
| working-directory: apps/evaluations/terraform | ||
| run: terraform init | ||
|
|
||
| - name: Terraform Plan | ||
| working-directory: apps/evaluations/terraform | ||
| run: | | ||
| terraform plan \ | ||
| -var="project_id=${{ vars.SKIFF2_PROJECT_ID }}" \ | ||
| -var="image_tag=${{ github.sha }}" \ | ||
| -out=tfplan | ||
|
|
||
| - name: Terraform Apply | ||
| working-directory: apps/evaluations/terraform | ||
| run: terraform apply -auto-approve tfplan | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Local environment variables for evaluations | ||
| # Copy to .env.local and fill in values | ||
|
|
||
| # Required for Docker build (private repo access) | ||
| GITHUB_TOKEN= | ||
|
|
||
| # Required for running evaluations | ||
| LITELLM_PROXY_API_KEY= | ||
|
|
||
| # Required for storage (Postgres) | ||
| PGHOST= | ||
| PGPASSWORD= | ||
|
|
||
| # Required for storage (S3) | ||
| AWS_ACCESS_KEY_ID= | ||
| AWS_SECRET_ACCESS_KEY= | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *.egg-info/ | ||
|
|
||
| # Build artifacts | ||
| dist/ | ||
| build/ | ||
|
|
||
| # Local environment | ||
| .env.local |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # Evaluations Docker Image for Cloud Run Jobs | ||
| # | ||
| # Docker image with which to run a list of model evals basedb on tier configuration, | ||
| # and will run each individual model eval as it's own Google Cloud Job | ||
| # | ||
| # Build (requires GitHub token for private repo access): | ||
| # docker build --platform linux/amd64 --secret id=GITHUB_TOKEN \ | ||
| # -t evaluations -f apps/evaluations/Dockerfile apps/evaluations | ||
| # | ||
| # Once olmo-eval-internal is public, remove --secret GITHUB_TOKEN | ||
| # | ||
| # Run tier (local mode, no storage): | ||
| # docker run -e EVAL_TIER=standard -e CLOUD_RUN_TASK_INDEX=0 -e LOCAL=true \ | ||
| # -e LITELLM_PROXY_API_KEY=$LITELLM_PROXY_API_KEY evaluations | ||
| # | ||
| # Run builds/evals locally with helper script: | ||
| # uv run run-local --tier standard --build | ||
| # uv run run-local --build-only | ||
| # | ||
|
|
||
| # ============================================================================ | ||
| # Stage 1: Builder | ||
| # ============================================================================ | ||
| FROM --platform=linux/amd64 ghcr.io/astral-sh/uv:python3.14-bookworm-slim AS builder | ||
|
|
||
| ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy | ||
| ENV UV_PYTHON_DOWNLOADS=0 | ||
|
|
||
| # Install git for cloning olmo-eval-internal | ||
| RUN apt-get update -qq && \ | ||
| apt-get install -y --no-install-recommends git && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # GitHub token for private repo access (mounted as secret) | ||
| RUN --mount=type=secret,id=GITHUB_TOKEN \ | ||
| git config --global url."https://$(cat /run/secrets/GITHUB_TOKEN)@github.com/".insteadOf "https://github.com/" | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Copy evaluations package | ||
| COPY src /app/src | ||
| COPY pyproject.toml /app/pyproject.toml | ||
|
|
||
| # Install evaluations package (pulls olmo-eval-internal from git) | ||
| RUN --mount=type=cache,target=/root/.cache/uv \ | ||
| uv pip install --system /app | ||
|
|
||
| # ============================================================================ | ||
| # Stage 2: Runtime | ||
| # ============================================================================ | ||
| FROM --platform=linux/amd64 python:3.14-slim-bookworm AS runner | ||
|
|
||
| # Install runtime dependencies | ||
| RUN apt-get update -qq && \ | ||
| apt-get install -y --no-install-recommends ca-certificates && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Setup non-root user | ||
| RUN groupadd --system --gid 999 nonroot \ | ||
| && useradd --system --gid 999 --uid 999 --create-home nonroot | ||
|
|
||
| # Copy installed packages from builder | ||
| COPY --from=builder /usr/local/lib/python3.14/site-packages /usr/local/lib/python3.14/site-packages | ||
| COPY --from=builder /usr/local/bin/olmo-eval /usr/local/bin/olmo-eval | ||
| COPY --from=builder /usr/local/bin/evaluations /usr/local/bin/evaluations | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Use non-root user | ||
| USER nonroot | ||
|
|
||
| ENV PYTHONUNBUFFERED=1 | ||
| ENV TERM=dumb | ||
| ENV NO_COLOR=1 | ||
|
|
||
| # Use Python CLI as entrypoint | ||
| ENTRYPOINT ["python", "-m", "evaluations.cli"] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skiff2 shared actions aren't available to public repos. this will be removed when @CalebOuellette gets a fix in