|
| 1 | +# VMR Build Topology and Staleness Diagnosis |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +When backflow PRs are missing across multiple repositories simultaneously, the root cause |
| 6 | +is usually not Maestro — it's that the VMR can't build successfully, so no new channel |
| 7 | +builds are produced, and subscriptions have nothing to trigger on. |
| 8 | + |
| 9 | +This reference explains how to diagnose that situation using publicly available signals. |
| 10 | + |
| 11 | +## Build Pipeline Structure |
| 12 | + |
| 13 | +The VMR (`dotnet/dotnet`) has two tiers of builds: |
| 14 | + |
| 15 | +### Public CI (validation only) |
| 16 | +- **AzDO org**: `dnceng-public` |
| 17 | +- **Project**: `public` (ID: `cbb18261-c48f-4abb-8651-8cdcb5474649`) |
| 18 | +- **Pipeline**: `dotnet-unified-build` (definition 278) |
| 19 | +- **Purpose**: Validates PRs and runs scheduled CI on `refs/heads/main` and release branches |
| 20 | +- **Does NOT publish** to Maestro channels — cannot trigger subscriptions |
| 21 | + |
| 22 | +### Official builds (channel publishing) |
| 23 | +- **AzDO org**: `dnceng` (internal, requires auth) |
| 24 | +- **Purpose**: Produces signed builds that publish to Maestro channels (e.g., `.NET 11.0.1xx SDK`) |
| 25 | +- **These are the builds that trigger Maestro subscriptions and create backflow PRs** |
| 26 | +- Not queryable without internal access |
| 27 | + |
| 28 | +### Key insight |
| 29 | +When investigating stale backflow, the **public CI builds are a useful proxy**. If the public |
| 30 | +scheduled build on `refs/heads/main` is failing, the official build is almost certainly |
| 31 | +failing too (they build the same source). A string of failed public builds strongly suggests |
| 32 | +the official pipeline is also broken. |
| 33 | + |
| 34 | +## Checking Official Build Freshness (aka.ms) |
| 35 | + |
| 36 | +The most direct way to check if official VMR builds are producing output is to query |
| 37 | +the SDK blob storage via `aka.ms` shortlinks. When official builds succeed, they publish |
| 38 | +SDK artifacts to `ci.dot.net`. We can check when the latest build was published. |
| 39 | + |
| 40 | +### How it works |
| 41 | + |
| 42 | +1. Resolve the aka.ms redirect (returns 301 with the blob URL): |
| 43 | + ``` |
| 44 | + https://aka.ms/dotnet/{channel}/daily/dotnet-sdk-win-x64.zip |
| 45 | + ``` |
| 46 | + Example channels: `11.0.1xx`, `11.0.1xx-preview1`, `10.0.3xx`, `10.0.1xx` |
| 47 | + |
| 48 | +2. The 301 Location header gives the actual blob URL on `ci.dot.net`, which includes |
| 49 | + the version number in the path. |
| 50 | + |
| 51 | +3. HEAD the blob URL — the `Last-Modified` header tells you exactly when the build was |
| 52 | + published. |
| 53 | + |
| 54 | +### Example (PowerShell) |
| 55 | + |
| 56 | +```powershell |
| 57 | +Add-Type -AssemblyName System.Net.Http |
| 58 | +$handler = [System.Net.Http.HttpClientHandler]::new() |
| 59 | +$handler.AllowAutoRedirect = $false |
| 60 | +$client = [System.Net.Http.HttpClient]::new($handler) |
| 61 | +
|
| 62 | +# Step 1: Resolve aka.ms → ci.dot.net blob URL |
| 63 | +$resp = $client.GetAsync("https://aka.ms/dotnet/11.0.1xx/daily/dotnet-sdk-win-x64.zip").Result |
| 64 | +$blobUrl = $resp.Headers.Location.ToString() # Only if StatusCode is 301 |
| 65 | +$resp.Dispose() |
| 66 | +
|
| 67 | +# Step 2: HEAD the blob for Last-Modified |
| 68 | +$head = Invoke-WebRequest -Uri $blobUrl -Method Head -UseBasicParsing |
| 69 | +$published = [DateTimeOffset]::Parse($head.Headers['Last-Modified']).UtcDateTime |
| 70 | +$age = [DateTime]::UtcNow - $published |
| 71 | +
|
| 72 | +$client.Dispose() |
| 73 | +$handler.Dispose() |
| 74 | +``` |
| 75 | + |
| 76 | +### Interpreting results |
| 77 | +- **< 1 day old**: Official builds are healthy for this channel |
| 78 | +- **1-2 days old**: Normal for daily builds, especially over weekends |
| 79 | +- **3+ days old**: Official builds are likely failing — investigate further |
| 80 | +- **Multiple channels stale simultaneously**: Strong signal of a systemic VMR build problem |
| 81 | + |
| 82 | +### Validating with darc (when auth is available) |
| 83 | + |
| 84 | +The aka.ms approach is an auth-free proxy. When `darc` is installed and authenticated, |
| 85 | +you can get the authoritative answer directly from Maestro: |
| 86 | + |
| 87 | +```bash |
| 88 | +# Latest build on a channel (exact match for what triggers subscriptions) |
| 89 | +darc get-latest-build --repo dotnet/dotnet --channel ".NET 11.0.1xx SDK" |
| 90 | + |
| 91 | +# Check what build a subscription last acted on |
| 92 | +darc get-subscriptions --source-repo dotnet/dotnet --target-repo dotnet/aspnetcore |
| 93 | +``` |
| 94 | + |
| 95 | +The `Date Produced` from `darc get-latest-build` will be ~6 hours earlier than the |
| 96 | +aka.ms blob `Last-Modified` (due to signing/publishing delay), but they refer to the |
| 97 | +same build. If the subscription's `Last Build` SHA matches the channel's latest build, |
| 98 | +then Maestro already fired — no newer builds exist. |
| 99 | + |
| 100 | +### Channel-to-branch mapping |
| 101 | + |
| 102 | +Channel names follow the pattern `{major}.0.{band}xx` where `{band}` is typically `1`, `2`, or `3`. |
| 103 | +The major version tracks .NET version = current year - 2015 (e.g., 2026 → .NET 11). |
| 104 | + |
| 105 | +| Channel Pattern | VMR branch | Notes | |
| 106 | +|---------|-----------|-----------------| |
| 107 | +| `{N}.0.1xx` | `main` (during development) | Primary SDK band, runtime + sdk + aspnetcore | |
| 108 | +| `{N}.0.1xx-preview{X}` | `release/{N}.0.1xx-preview{X}` | Preview branches | |
| 109 | +| `{N}.0.{band}xx` | `release/{N}.0.{band}xx` | Servicing bands (2xx, 3xx) | |
| 110 | +| `{N-1}.0.{band}xx` | `release/{N-1}.0.{band}xx` | Previous major servicing | |
| 111 | + |
| 112 | +Example (2026): `11.0.1xx` → `main`, `10.0.3xx` → `release/10.0.3xx` |
| 113 | + |
| 114 | +### Cross-referencing with Version.Details.xml and PR metadata |
| 115 | + |
| 116 | +There are two sources of truth for what VMR build a repo is synced to: |
| 117 | + |
| 118 | +**1. `eng/Version.Details.xml` in the target repo (authoritative):** |
| 119 | +```xml |
| 120 | +<Source Uri="https://github.com/dotnet/dotnet" Mapping="sdk" |
| 121 | + Sha="ec846aee7f12180381c444dfeeba0c5022e1d110" BarId="297974" /> |
| 122 | +``` |
| 123 | +- `Sha` = the exact VMR commit the repo is synced to |
| 124 | +- `BarId` = the Maestro build ID (queryable via `darc get-build --id 297974` for date/channel) |
| 125 | +- Dependency version strings encode build dates (e.g., `26069.105` → year 26, day-code 069) |
| 126 | + |
| 127 | +**2. Backflow PR body (when a PR is open):** |
| 128 | +``` |
| 129 | +- **Date Produced**: February 4, 2026 11:05:10 AM UTC |
| 130 | +- **Build**: [20260203.11](...) ([300217](https://maestro.dot.net/channel/8298/.../build/300217)) |
| 131 | +``` |
| 132 | + |
| 133 | +**Comparing against aka.ms build date:** |
| 134 | +- If they match → the backflow PR is based on the latest successful build |
| 135 | +- If the aka.ms build is newer → a newer build succeeded but hasn't triggered backflow yet |
| 136 | +- If the aka.ms build matches the PR but is old → no new successful builds since |
| 137 | + |
| 138 | +## Querying Public VMR CI Builds |
| 139 | + |
| 140 | +Public CI builds (separate from official builds) can confirm whether the VMR source is |
| 141 | +buildable. These don't publish to channels but use the same source. |
| 142 | + |
| 143 | +### AzDO REST API endpoints |
| 144 | + |
| 145 | +Recent scheduled builds on a branch: |
| 146 | +``` |
| 147 | +GET https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=278&branchName=refs/heads/main&$top=5&api-version=7.0 |
| 148 | +``` |
| 149 | + |
| 150 | +Last successful build: |
| 151 | +``` |
| 152 | +GET https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=278&branchName=refs/heads/main&resultFilter=succeeded&$top=1&api-version=7.0 |
| 153 | +``` |
| 154 | + |
| 155 | +Build timeline (to find failing jobs): |
| 156 | +``` |
| 157 | +GET https://dev.azure.com/dnceng-public/public/_apis/build/builds/{buildId}/timeline?api-version=7.0 |
| 158 | +``` |
| 159 | + |
| 160 | +### Interpreting results |
| 161 | +- **`reason: schedule`** — Scheduled daily builds, closest proxy to official builds |
| 162 | +- **`reason: pullRequest`** — PR validation only |
| 163 | +- **`result: failed`** with consecutive scheduled builds — strong signal of broken VMR |
| 164 | +- Check the timeline for which jobs/stages failed to understand the root cause |
| 165 | + |
| 166 | +## Diagnosing Widespread Backflow Staleness |
| 167 | + |
| 168 | +### Pattern: Multiple repos missing backflow simultaneously |
| 169 | + |
| 170 | +When `CheckMissing` shows missing backflow across 3+ repos (e.g., runtime, SDK, aspnetcore |
| 171 | +all stale), this is almost always a VMR build problem, not a Maestro problem. |
| 172 | + |
| 173 | +**Diagnosis steps:** |
| 174 | + |
| 175 | +1. **Check public VMR builds**: Query the last 5 scheduled builds on the affected branch. |
| 176 | + If all are failing, the VMR build is broken. |
| 177 | + |
| 178 | +2. **Find the failure**: Get the timeline of the most recent failed build. Look for failed |
| 179 | + stages/jobs — common failures include: |
| 180 | + - **macOS signing** (SignTool crashes on non-PE files) |
| 181 | + - **Windows build** (individual repo build failures within the VMR) |
| 182 | + - **Source-build validation** (packaging or dependency issues) |
| 183 | + |
| 184 | +3. **Check for known issues**: Search `dotnet/dotnet` issues with label `[Operational Issue]` |
| 185 | + or search for the error message. |
| 186 | + |
| 187 | +4. **Check the last successful build date**: A gap of days or weeks confirms the VMR has been |
| 188 | + broken for an extended period. |
| 189 | + |
| 190 | +### Pattern: Single repo missing backflow |
| 191 | + |
| 192 | +When only one repo is missing backflow but others are healthy, the issue is more likely: |
| 193 | +- Maestro subscription disabled or misconfigured |
| 194 | +- The specific repo's forward flow is blocking (conflict or staleness) |
| 195 | +- Channel mismatch |
| 196 | + |
| 197 | +Use `darc get-subscriptions --source-repo dotnet/dotnet --target-repo dotnet/<repo>` to check. |
| 198 | + |
| 199 | +## The Bootstrap / Chicken-and-Egg Problem |
| 200 | + |
| 201 | +The VMR builds arcade and other infrastructure from source. When an infrastructure fix |
| 202 | +(e.g., in `dotnet/arcade`) is needed to unblock the VMR build itself, a circular dependency |
| 203 | +can occur: |
| 204 | + |
| 205 | +1. Arcade fix merges in `dotnet/arcade` |
| 206 | +2. Arcade forward-flows to VMR (`dotnet/dotnet`) |
| 207 | +3. VMR now has the fix **in source**, but the build tooling used to build may still be the |
| 208 | + old version (from a previous successful bootstrap) |
| 209 | +4. The build fails because the **bootstrap SDK** (cached from a prior build) doesn't have |
| 210 | + the fix yet |
| 211 | + |
| 212 | +**Resolution** (by VMR maintainers): |
| 213 | +- Re-bootstrap: Build a new `source-built-sdks` package from a working state |
| 214 | +- Manual intervention: Patch the bootstrap or skip the failing step |
| 215 | +- Wait for a full re-bootstrap cycle after a milestone release |
| 216 | + |
| 217 | +This is not something that can be fixed by triggering subscriptions or resolving conflicts. |
| 218 | +When you see this pattern, flag it as needing VMR infrastructure team intervention. |
| 219 | + |
| 220 | +## Channels and Subscription Flow |
| 221 | + |
| 222 | +``` |
| 223 | +dotnet/arcade ──forward flow──► dotnet/dotnet (VMR) |
| 224 | +dotnet/runtime ─forward flow──► dotnet/dotnet (VMR) |
| 225 | +dotnet/sdk ────forward flow──► dotnet/dotnet (VMR) |
| 226 | + ...other repos... |
| 227 | +
|
| 228 | +dotnet/dotnet (VMR) |
| 229 | + │ |
| 230 | + ├── official build succeeds |
| 231 | + │ │ |
| 232 | + │ ▼ |
| 233 | + │ publishes to channel (e.g., ".NET 11.0.1xx SDK") |
| 234 | + │ │ |
| 235 | + │ ▼ |
| 236 | + │ Maestro fires subscriptions |
| 237 | + │ │ |
| 238 | + │ ├──► dotnet/runtime backflow PR |
| 239 | + │ ├──► dotnet/sdk backflow PR |
| 240 | + │ ├──► dotnet/aspnetcore backflow PR |
| 241 | + │ └──► ...etc |
| 242 | + │ |
| 243 | + └── official build FAILS |
| 244 | + │ |
| 245 | + ▼ |
| 246 | + nothing publishes → no subscriptions fire → all backflow stalls |
| 247 | +``` |
| 248 | + |
| 249 | +## Quick Reference: Common VMR Build Failures |
| 250 | + |
| 251 | +| Failure | Symptom | Root cause | |
| 252 | +|---------|---------|------------| |
| 253 | +| SignTool crash | `Unknown file format` in Sign.proj on macOS | Non-PE file in signing input (e.g., tar.gz) | |
| 254 | +| Repo build failure | `error MSB...` in a specific repo's build | Source incompatibility within VMR | |
| 255 | +| Source-build validation | Packaging or prebuilt detection errors | New prebuilt dependency introduced | |
| 256 | +| Infrastructure timeout | Build exceeds time limit | Resource contention or build perf regression | |
0 commit comments