Skip to content

How to Build, Test and Deploy a New Windows AMI

Andrey Talman edited this page Mar 30, 2026 · 1 revision

Overview

Windows CI runners use custom AMIs (Amazon Machine Images) that are built on
the LF account and shared across both the PyTorch and LF AWS accounts. The deployment process involves building the AMI, testing it on canary, and then
rolling it out to production.

Workflow Diagram

┌─────────────────────────┐                                                   
│  1. Build AMI           │                                     
│  (LF account)           │
│  AMI built once and     │
│  shared publicly        │                                                   
└───────────┬─────────────┘
            │                                                                 
            ▼                                                   
┌─────────────────────────┐                                                   
│  2. Deploy to Canary    │                                     
│  (account 391835788720) │
│  Verify AMI available   │                                                   
│  in account 308535385114│
└───────────┬─────────────┘                                                   
            │                                                   
            ▼                                                                 
┌─────────────────────────┐                                     
│  3. Test AMI            │
│  Run trunk & binaries   │
│  workflows on test PR   │                                                   
└───────────┬─────────────┘
            │                                                                 
            ▼                                                   
┌─────────────────────────┐                                                   
│  4. Deploy to Prod      │
│  Land ci-infra and      │                                                   
│  gha-infra PRs          │                                     
└─────────────────────────┘                                                   

Step-by-Step Instructions

Step 1: Build the AMI

Run the https://github.com/pytorch/test-infra/blob/main/.github/workflows/build-windows-ami.yml workflow.

Once the workflow completes, it will display the AMI ID and AMI Name in the
output. Note these down — you will need them for the following steps.

Step 2: Verify AMI availability in AWS

Log into AWS account 308535385114 and confirm the new AMI is available.

Look for the AMI with:

  • Owner: 391835788720
  • Name pattern: e.g. Windows 2019 GHA CI - 20260325213408

Step 3: Land the test-infra PR

Land a PR in https://github.com/pytorch/test-infra that references the new AMI, enabled wincanarylf testing.
Example: https://github.com/pytorch/test-infra/commit/241f90abba732b5be91590f3

Step 4: Create a ci-infra PR on a separate branch

Create a PR in https://github.com/pytorch/ci-infra on a dedicated branch (e.g. atalman-win-20260325) with the new AMI configuration.

Example PR: https://github.com/pytorch/ci-infra/pull/413
Example branch: https://github.com/pytorch/ci-infra/tree/atalman-win-20260325

Step 5: Deploy to canary

Run the https://github.com/pytorch/ci-infra/actions/workflows/ali-deploy-canary.yml workflow on branch from Step 4 to deploy the new AMI to the canary environment for testing.

Step 6: Enable canary experiments

Edit the experiments list in https://github.com/pytorch/test-infra/issues/5132 and add:

@youruser,wincanarylf,lf

Replace youruser with your GitHub username.

Step 7: Test on a PyTorch PR

Open a test PR in https://github.com/pytorch/pytorch (example: https://github.com/pytorch/pytorch/pull/178531) and:

  1. Verify the Windows runner is using the new AMI (check the runner info in the job logs).
  2. Assign the labels ciflow/trunk and ciflow/binaries to trigger the relevant CI workflows.
  3. Confirm that both trunk and binaries jobs pass on Windows.

Step 8: Deploy to production

Once testing is successful:

  1. Land the https://github.com/pytorch/ci-infra/pull/413 and deploy it.
  2. Land the https://github.com/meta-pytorch/pytorch-gha-infra/pull/1027 and deploy it.

Clone this wiki locally