Skip to content

Automate ECR image mirroring from ghcr.io/coder/coder-preview #7

@blink-so

Description

@blink-so

Problem

The Sept 30 workshop revealed that the private ECR repository can fall out of sync with the upstream ghcr.io/coder/coder-preview image, leading to image version mismatches between the control plane (us-east-2) and proxy clusters (us-west-2, eu-west-2).

Context

  • The platform uses ghcr.io/coder/coder-preview (not stable coder/coder) to access beta AI features
  • This image is mirrored to a private AWS ECR repository
  • During the workshop, the ECR mirror was out of sync, causing subdomain routing failures
  • Manual sync process is error-prone and doesn't scale

Current Manual Process

# Pull from GitHub Container Registry
docker pull ghcr.io/coder/coder-preview:latest

# Tag for ECR
docker tag ghcr.io/coder/coder-preview:latest <aws-account-id>.dkr.ecr.us-east-2.amazonaws.com/coder-preview:latest

# Authenticate with ECR
aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-2.amazonaws.com

# Push to ECR
docker push <aws-account-id>.dkr.ecr.us-east-2.amazonaws.com/coder-preview:latest

# Restart Coder pods in all regions
kubectl rollout restart deployment/coder -n coder --context=us-east-2
kubectl rollout restart deployment/coder -n coder --context=us-west-2
kubectl rollout restart deployment/coder -n coder --context=eu-west-2

Requirements

Automated Image Mirroring

  • Implement automated job to sync ghcr.io/coder/coder-preview to ECR
  • Run sync on a schedule (daily or on new image push)
  • Use GitHub Actions, AWS Lambda, or similar automation
  • Include digest/tag verification to ensure successful sync
  • Notify team on sync failures

Image Consistency Validation

  • Add pre-deployment validation to verify image digests match across:
    • GHCR source
    • Private ECR mirror
    • us-east-2 control plane
    • us-west-2 proxy cluster
    • eu-west-2 proxy cluster
  • Block deployments if image inconsistencies detected
  • Add to pre-workshop checklist (Create pre-workshop validation checklist and runbook #4)

Workspace Image Management

  • Document which workspace template images are stored in ECR:
    • Build from Scratch w/ Claude
    • Build from Scratch w/ Goose
  • Document which use public registries:
    • Real World App w/ Claude (uses codercom/example-universal:ubuntu from DockerHub)
  • Consider mirroring workspace images to ECR for consistency

Rollback Strategy

  • Document rollback procedure if bad image is mirrored
  • Implement image tagging strategy (not just latest)
  • Consider using immutable tags or digests in deployment

Success Criteria

  • ECR mirror automatically syncs with GHCR without manual intervention
  • Image consistency validated before every workshop
  • All clusters always run identical image digests
  • Zero subdomain routing failures due to image mismatch
  • Clear documentation for emergency manual sync if automation fails

Implementation Options

Option 1: GitHub Actions

  • Trigger on new release of coder/coder-preview
  • Pull, tag, push to ECR
  • Create PR to update image references in Terraform

Option 2: AWS Lambda + EventBridge

  • Scheduled Lambda function (daily)
  • Pull latest from GHCR, push to ECR
  • Send SNS notification on failure

Option 3: Kubernetes CronJob

  • Run in us-east-2 cluster
  • Use service account with ECR push permissions
  • Monitor via existing Kubernetes alerting

Related

Sept 30 Workshop Postmortem
#2 (Image management standardization)
Incident Runbook - Subdomain Routing Failures
Incident Runbook - Image Pull Failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions