Skip to content

Commit 40e915e

Browse files
committed
Update 3 skills: ci-analysis-flow-analysis-flow-tracing
Synced from copilot-skills
1 parent 022d89a commit 40e915e

File tree

12 files changed

+4606
-4
lines changed

12 files changed

+4606
-4
lines changed

plugins/dotnet-dnceng/skills/ci-analysis/scripts/Get-CIStatus.ps1

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<#
1+
<#
22
.SYNOPSIS
33
Retrieves test failures from Azure DevOps builds and Helix test runs.
44
@@ -1584,7 +1584,7 @@ try {
15841584
# Handle direct Helix job query
15851585
if ($PSCmdlet.ParameterSetName -eq 'HelixJob') {
15861586
Write-Host "`n=== Helix Job $HelixJob ===" -ForegroundColor Yellow
1587-
Write-Host "URL: https://helix.dot.net/api/2019-06-17/jobs/$HelixJob" -ForegroundColor Gray
1587+
Write-Host "URL: https://helix.dot.net/api/jobs/$HelixJob" -ForegroundColor Gray
15881588

15891589
# Get job details
15901590
$jobDetails = Get-HelixJobDetails -JobId $HelixJob
@@ -1625,8 +1625,7 @@ try {
16251625
}
16261626

16271627
# Fetch console log
1628-
$escapedWorkItem = [uri]::EscapeDataString($WorkItem)
1629-
$consoleUrl = "https://helix.dot.net/api/2019-06-17/jobs/$HelixJob/workitems/$escapedWorkItem/console"
1628+
$consoleUrl = "https://helix.dot.net/api/2019-06-17/jobs/$HelixJob/workitems/$WorkItem/console"
16301629
Write-Host "`n Console Log: $consoleUrl" -ForegroundColor Yellow
16311630

16321631
$consoleLog = Get-HelixConsoleLog -Url $consoleUrl

plugins/dotnet-dnceng/skills/flow-analysis/SKILL.md

Lines changed: 301 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# VMR Build Topology and Staleness Diagnosis
2+
3+
## Overview
4+
5+
When backflow PRs are missing across multiple repositories simultaneously, the root cause
6+
is usually not Maestro — it's that the VMR can't build successfully, so no new channel
7+
builds are produced, and subscriptions have nothing to trigger on.
8+
9+
This reference explains how to diagnose that situation using publicly available signals.
10+
11+
## Build Pipeline Structure
12+
13+
The VMR (`dotnet/dotnet`) has two tiers of builds:
14+
15+
### Public CI (validation only)
16+
- **AzDO org**: `dnceng-public`
17+
- **Project**: `public` (ID: `cbb18261-c48f-4abb-8651-8cdcb5474649`)
18+
- **Pipeline**: `dotnet-unified-build` (definition 278)
19+
- **Purpose**: Validates PRs and runs scheduled CI on `refs/heads/main` and release branches
20+
- **Does NOT publish** to Maestro channels — cannot trigger subscriptions
21+
22+
### Official builds (channel publishing)
23+
- **AzDO org**: `dnceng` (internal, requires auth)
24+
- **Purpose**: Produces signed builds that publish to Maestro channels (e.g., `.NET 11.0.1xx SDK`)
25+
- **These are the builds that trigger Maestro subscriptions and create backflow PRs**
26+
- Not queryable without internal access
27+
28+
### Key insight
29+
When investigating stale backflow, the **public CI builds are a useful proxy**. If the public
30+
scheduled build on `refs/heads/main` is failing, the official build is almost certainly
31+
failing too (they build the same source). A string of failed public builds strongly suggests
32+
the official pipeline is also broken.
33+
34+
## Checking Official Build Freshness (aka.ms)
35+
36+
The most direct way to check if official VMR builds are producing output is to query
37+
the SDK blob storage via `aka.ms` shortlinks. When official builds succeed, they publish
38+
SDK artifacts to `ci.dot.net`. We can check when the latest build was published.
39+
40+
### How it works
41+
42+
1. Resolve the aka.ms redirect (returns 301 with the blob URL):
43+
```
44+
https://aka.ms/dotnet/{channel}/daily/dotnet-sdk-win-x64.zip
45+
```
46+
Example channels: `11.0.1xx`, `11.0.1xx-preview1`, `10.0.3xx`, `10.0.1xx`
47+
48+
2. The 301 Location header gives the actual blob URL on `ci.dot.net`, which includes
49+
the version number in the path.
50+
51+
3. HEAD the blob URL — the `Last-Modified` header tells you exactly when the build was
52+
published.
53+
54+
### Example (PowerShell)
55+
56+
```powershell
57+
Add-Type -AssemblyName System.Net.Http
58+
$handler = [System.Net.Http.HttpClientHandler]::new()
59+
$handler.AllowAutoRedirect = $false
60+
$client = [System.Net.Http.HttpClient]::new($handler)
61+
62+
# Step 1: Resolve aka.ms → ci.dot.net blob URL
63+
$resp = $client.GetAsync("https://aka.ms/dotnet/11.0.1xx/daily/dotnet-sdk-win-x64.zip").Result
64+
$blobUrl = $resp.Headers.Location.ToString() # Only if StatusCode is 301
65+
$resp.Dispose()
66+
67+
# Step 2: HEAD the blob for Last-Modified
68+
$head = Invoke-WebRequest -Uri $blobUrl -Method Head -UseBasicParsing
69+
$published = [DateTimeOffset]::Parse($head.Headers['Last-Modified']).UtcDateTime
70+
$age = [DateTime]::UtcNow - $published
71+
72+
$client.Dispose()
73+
$handler.Dispose()
74+
```
75+
76+
### Interpreting results
77+
- **< 1 day old**: Official builds are healthy for this channel
78+
- **1-2 days old**: Normal for daily builds, especially over weekends
79+
- **3+ days old**: Official builds are likely failing — investigate further
80+
- **Multiple channels stale simultaneously**: Strong signal of a systemic VMR build problem
81+
82+
### Validating with darc (when auth is available)
83+
84+
The aka.ms approach is an auth-free proxy. When `darc` is installed and authenticated,
85+
you can get the authoritative answer directly from Maestro:
86+
87+
```bash
88+
# Latest build on a channel (exact match for what triggers subscriptions)
89+
darc get-latest-build --repo dotnet/dotnet --channel ".NET 11.0.1xx SDK"
90+
91+
# Check what build a subscription last acted on
92+
darc get-subscriptions --source-repo dotnet/dotnet --target-repo dotnet/aspnetcore
93+
```
94+
95+
The `Date Produced` from `darc get-latest-build` will be ~6 hours earlier than the
96+
aka.ms blob `Last-Modified` (due to signing/publishing delay), but they refer to the
97+
same build. If the subscription's `Last Build` SHA matches the channel's latest build,
98+
then Maestro already fired — no newer builds exist.
99+
100+
### Channel-to-branch mapping
101+
102+
Channel names follow the pattern `{major}.0.{band}xx` where `{band}` is typically `1`, `2`, or `3`.
103+
The major version tracks .NET version = current year - 2015 (e.g., 2026 → .NET 11).
104+
105+
| Channel Pattern | VMR branch | Notes |
106+
|---------|-----------|-----------------|
107+
| `{N}.0.1xx` | `main` (during development) | Primary SDK band, runtime + sdk + aspnetcore |
108+
| `{N}.0.1xx-preview{X}` | `release/{N}.0.1xx-preview{X}` | Preview branches |
109+
| `{N}.0.{band}xx` | `release/{N}.0.{band}xx` | Servicing bands (2xx, 3xx) |
110+
| `{N-1}.0.{band}xx` | `release/{N-1}.0.{band}xx` | Previous major servicing |
111+
112+
Example (2026): `11.0.1xx``main`, `10.0.3xx``release/10.0.3xx`
113+
114+
### Cross-referencing with Version.Details.xml and PR metadata
115+
116+
There are two sources of truth for what VMR build a repo is synced to:
117+
118+
**1. `eng/Version.Details.xml` in the target repo (authoritative):**
119+
```xml
120+
<Source Uri="https://github.com/dotnet/dotnet" Mapping="sdk"
121+
Sha="ec846aee7f12180381c444dfeeba0c5022e1d110" BarId="297974" />
122+
```
123+
- `Sha` = the exact VMR commit the repo is synced to
124+
- `BarId` = the Maestro build ID (queryable via `darc get-build --id 297974` for date/channel)
125+
- Dependency version strings encode build dates (e.g., `26069.105` → year 26, day-code 069)
126+
127+
**2. Backflow PR body (when a PR is open):**
128+
```
129+
- **Date Produced**: February 4, 2026 11:05:10 AM UTC
130+
- **Build**: [20260203.11](...) ([300217](https://maestro.dot.net/channel/8298/.../build/300217))
131+
```
132+
133+
**Comparing against aka.ms build date:**
134+
- If they match → the backflow PR is based on the latest successful build
135+
- If the aka.ms build is newer → a newer build succeeded but hasn't triggered backflow yet
136+
- If the aka.ms build matches the PR but is old → no new successful builds since
137+
138+
## Querying Public VMR CI Builds
139+
140+
Public CI builds (separate from official builds) can confirm whether the VMR source is
141+
buildable. These don't publish to channels but use the same source.
142+
143+
### AzDO REST API endpoints
144+
145+
Recent scheduled builds on a branch:
146+
```
147+
GET https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=278&branchName=refs/heads/main&$top=5&api-version=7.0
148+
```
149+
150+
Last successful build:
151+
```
152+
GET https://dev.azure.com/dnceng-public/public/_apis/build/builds?definitions=278&branchName=refs/heads/main&resultFilter=succeeded&$top=1&api-version=7.0
153+
```
154+
155+
Build timeline (to find failing jobs):
156+
```
157+
GET https://dev.azure.com/dnceng-public/public/_apis/build/builds/{buildId}/timeline?api-version=7.0
158+
```
159+
160+
### Interpreting results
161+
- **`reason: schedule`** — Scheduled daily builds, closest proxy to official builds
162+
- **`reason: pullRequest`** — PR validation only
163+
- **`result: failed`** with consecutive scheduled builds — strong signal of broken VMR
164+
- Check the timeline for which jobs/stages failed to understand the root cause
165+
166+
## Diagnosing Widespread Backflow Staleness
167+
168+
### Pattern: Multiple repos missing backflow simultaneously
169+
170+
When `CheckMissing` shows missing backflow across 3+ repos (e.g., runtime, SDK, aspnetcore
171+
all stale), this is almost always a VMR build problem, not a Maestro problem.
172+
173+
**Diagnosis steps:**
174+
175+
1. **Check public VMR builds**: Query the last 5 scheduled builds on the affected branch.
176+
If all are failing, the VMR build is broken.
177+
178+
2. **Find the failure**: Get the timeline of the most recent failed build. Look for failed
179+
stages/jobs — common failures include:
180+
- **macOS signing** (SignTool crashes on non-PE files)
181+
- **Windows build** (individual repo build failures within the VMR)
182+
- **Source-build validation** (packaging or dependency issues)
183+
184+
3. **Check for known issues**: Search `dotnet/dotnet` issues with label `[Operational Issue]`
185+
or search for the error message.
186+
187+
4. **Check the last successful build date**: A gap of days or weeks confirms the VMR has been
188+
broken for an extended period.
189+
190+
### Pattern: Single repo missing backflow
191+
192+
When only one repo is missing backflow but others are healthy, the issue is more likely:
193+
- Maestro subscription disabled or misconfigured
194+
- The specific repo's forward flow is blocking (conflict or staleness)
195+
- Channel mismatch
196+
197+
Use `darc get-subscriptions --source-repo dotnet/dotnet --target-repo dotnet/<repo>` to check.
198+
199+
## The Bootstrap / Chicken-and-Egg Problem
200+
201+
The VMR builds arcade and other infrastructure from source. When an infrastructure fix
202+
(e.g., in `dotnet/arcade`) is needed to unblock the VMR build itself, a circular dependency
203+
can occur:
204+
205+
1. Arcade fix merges in `dotnet/arcade`
206+
2. Arcade forward-flows to VMR (`dotnet/dotnet`)
207+
3. VMR now has the fix **in source**, but the build tooling used to build may still be the
208+
old version (from a previous successful bootstrap)
209+
4. The build fails because the **bootstrap SDK** (cached from a prior build) doesn't have
210+
the fix yet
211+
212+
**Resolution** (by VMR maintainers):
213+
- Re-bootstrap: Build a new `source-built-sdks` package from a working state
214+
- Manual intervention: Patch the bootstrap or skip the failing step
215+
- Wait for a full re-bootstrap cycle after a milestone release
216+
217+
This is not something that can be fixed by triggering subscriptions or resolving conflicts.
218+
When you see this pattern, flag it as needing VMR infrastructure team intervention.
219+
220+
## Channels and Subscription Flow
221+
222+
```
223+
dotnet/arcade ──forward flow──► dotnet/dotnet (VMR)
224+
dotnet/runtime ─forward flow──► dotnet/dotnet (VMR)
225+
dotnet/sdk ────forward flow──► dotnet/dotnet (VMR)
226+
...other repos...
227+
228+
dotnet/dotnet (VMR)
229+
230+
├── official build succeeds
231+
│ │
232+
│ ▼
233+
│ publishes to channel (e.g., ".NET 11.0.1xx SDK")
234+
│ │
235+
│ ▼
236+
│ Maestro fires subscriptions
237+
│ │
238+
│ ├──► dotnet/runtime backflow PR
239+
│ ├──► dotnet/sdk backflow PR
240+
│ ├──► dotnet/aspnetcore backflow PR
241+
│ └──► ...etc
242+
243+
└── official build FAILS
244+
245+
246+
nothing publishes → no subscriptions fire → all backflow stalls
247+
```
248+
249+
## Quick Reference: Common VMR Build Failures
250+
251+
| Failure | Symptom | Root cause |
252+
|---------|---------|------------|
253+
| SignTool crash | `Unknown file format` in Sign.proj on macOS | Non-PE file in signing input (e.g., tar.gz) |
254+
| Repo build failure | `error MSB...` in a specific repo's build | Source incompatibility within VMR |
255+
| Source-build validation | Packaging or prebuilt detection errors | New prebuilt dependency introduced |
256+
| Infrastructure timeout | Build exceeds time limit | Resource contention or build perf regression |

0 commit comments

Comments
 (0)