dotnet · dalexsoto · Mar 5, 2026
diff --git a/.agents/skills/macios-ci-failure-inspector/SKILL.md b/.agents/skills/macios-ci-failure-inspector/SKILL.md
@@ -0,0 +1,231 @@
+---
+name: macios-ci-failure-inspector
+description: Investigate and triage CI failures for dotnet/macios from Azure DevOps build URLs. Use this skill whenever the user shares a DevOps build link, asks about CI failures, wants to understand why a build failed, or asks to investigate test failures on any platform (iOS, tvOS, macOS, Mac Catalyst). Also use when the user says things like "CI is red", "tests are failing", "build broke", or "what happened in CI".
+---
+
+# macios CI Failure Inspector
+
+Investigate Azure DevOps CI failures for the dotnet/macios repository, extract root causes, and report findings.
+
+## References
+
+Read these as needed during investigation:
+
+- `references/azure-devops-cli.md` — az CLI commands, artifact naming conventions, and JSON parsing caveats. Read this when you need to construct `az` commands or download artifacts.
+
+## Inputs
+
+Collect from the user:
+
+- **Build URL** — an Azure DevOps build results link, e.g. `https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=<ID>&view=results`
+- **Scope** — whether to investigate only, or also attempt fixes (default: investigate only)
+
+Extract the `buildId` from the URL query parameter.
+
+## Investigation workflow
+
+### Phase 1: Build overview
+
+Fetch the build metadata to understand the big picture:
+
+```bash
+az pipelines build show --id <buildId> --org https://devdiv.visualstudio.com --project DevDiv -o json > /tmp/build_show.json
+```
+
+Extract from the output:
+- `result` (succeeded, failed, partiallySucceeded, canceled)
+- `sourceBranch` — what branch triggered the build
+- `definition.name` — which pipeline ran
+- `triggerInfo` or `reason` — what triggered it (PR, push, schedule)
+
+If the build succeeded, tell the user and stop.
+
+### Phase 2: Timeline — identify failing jobs and tasks
+
+The timeline gives you every job and task in the build with its result:
+
+```bash
+az devops invoke --area build --resource timeline --route-parameters project=DevDiv buildId=<buildId> --org https://devdiv.visualstudio.com -o json > /tmp/build_timeline.json
+```
+
+Parse the timeline to find failed records. Use Python for robust JSON parsing because `az devops invoke` output can include trailing non-JSON text:
+
+```python
+import json
+
+with open('/tmp/build_timeline.json', 'r') as f:
+    content = f.read()
+data = json.JSONDecoder().raw_decode(content)[0]
+
+failed = [r for r in data.get('records', []) if r.get('result') == 'failed']
+for r in failed:
+    print(f"  [{r['type']}] {r['name']} (id={r['id']}, logId={r.get('log', {}).get('id', 'N/A')})")
+```
+
+Group failures into categories:
+- **Test failures** — tasks named "Run tests" or jobs like `T: monotouch_ios`, `T: monotouch_tvos`, `macOS tests`
+- **Infrastructure failures** — tasks like "Provision Xcode", "Reserve bot", setup tasks
+- **Build failures** — compilation or packaging tasks
+
+### Phase 3: Download TestSummary artifacts (primary failure source)
+
+The xharness test runner logs are 40K+ lines and don't contain standard NUnit failure patterns inline. **TestSummary artifacts are the fastest and most reliable way to identify failures.** Always start with these before digging into raw logs.
+
+List all artifacts:
+
+```bash
+az pipelines runs artifact list --run-id <buildId> --org https://devdiv.visualstudio.com --project DevDiv -o json
+```
+
+Download TestSummary artifacts for each failing job. **Each artifact must go to a separate directory** to avoid overwriting (they all contain a file named `TestSummary.md`):
+
+```bash
+artifact="TestSummary-simulator_testsmonotouch_macos-1"
+mkdir -p "/tmp/ci-artifacts/${artifact}"
+az pipelines runs artifact download \
+  --artifact-name "$artifact" \
+  --path "/tmp/ci-artifacts/${artifact}" \
+  --run-id <buildId> \
+  --org https://devdiv.visualstudio.com --project DevDiv
+cat "/tmp/ci-artifacts/${artifact}/TestSummary.md"
+```
+
+The TestSummary.md file contains a structured markdown report with:
+- Count of passed/failed tests
+- For each failure: test configuration name, failure type (BuildFailure, Failed, Crashed, TimedOut), and brief error message
+- Build failures show the configuration variant (e.g. "monotouch-test/macOS/Debug (ARM64): BuildFailure")
+- Test failures may include the failing test class and assertion message
+
+Common artifact names map to timeline jobs:
+- `TestSummary-simulator_testsmonotouch_ios-1` → monotouch_ios
+- `TestSummary-simulator_testsmonotouch_tvos-1` → monotouch_tvos
+- `TestSummary-simulator_testsmonotouch_macos-1` → monotouch_macos
+- `TestSummary-simulator_testsmonotouch_maccatalyst-1` → monotouch_maccatalyst
+- `TestSummary-simulator_testsdotnettests_ios-1` → dotnettests_ios
+- `TestSummary-simulator_testsdotnettests_tvos-1` → dotnettests_tvos
+- `TestSummary-simulator_testsdotnettests_macos-1` → dotnettests_macos
+- `TestSummary-simulator_testsdotnettests_maccatalyst-1` → dotnettests_maccatalyst
+
+Download these in parallel for all failing jobs to save time.
+
+### Phase 4: Get detailed test failure info from HtmlReport artifacts
+
+For test failures (not build failures), download the corresponding HtmlReport artifact to get NUnit XML with exact test names, assertion messages, and stack traces:
+
+```bash
+artifact="HtmlReport-simulator_testsmonotouch_tvos-1"
+mkdir -p "/tmp/ci-artifacts/${artifact}"
+az pipelines runs artifact download \
+  --artifact-name "$artifact" \
+  --path "/tmp/ci-artifacts/${artifact}" \
+  --run-id <buildId> \
+  --org https://devdiv.visualstudio.com --project DevDiv
+cd "/tmp/ci-artifacts/${artifact}" && unzip -o HtmlReport.zip -d htmlreport
+```
+
+Parse the NUnit XML files inside for specific test failures:
+
+```python
+import xml.etree.ElementTree as ET
+import glob
+
+xml_files = glob.glob('htmlreport/tests/monotouch-test/*/test-ios-*.xml')
+for xf in sorted(xml_files):
+    tree = ET.parse(xf)
+    for tc in tree.getroot().iter('test-case'):
+        if tc.get('result') == 'Failed':
+            fullname = tc.get('fullname', 'unknown')
+            msg_el = tc.find('.//message')
+            msg = msg_el.text[:200] if msg_el is not None and msg_el.text else ''
+            trace_el = tc.find('.//stack-trace')
+            trace = trace_el.text[:300] if trace_el is not None and trace_el.text else ''
+            print(f"FAILED: {fullname}")
+            print(f"  Message: {msg}")
+            if trace:
+                print(f"  Stack: {trace}")
+```
+
+### Phase 5: Extract build error details from raw logs (for BuildFailure cases)
+
+Only use raw task logs when TestSummary shows BuildFailure and you need the specific compiler/build error. The logs are typically 40K+ lines — search narrowly:
+
+```bash
+az devops invoke --area build --resource logs \
+  --route-parameters project=DevDiv buildId=<buildId> logId=<logId> \
+  --org https://devdiv.visualstudio.com -o json > /tmp/build_log.json
+```
+
+Parse and search for build errors only:
+
+```python
+import json
+
+with open('/tmp/build_log.json', 'r') as f:
+    data = json.JSONDecoder().raw_decode(f.read())[0]
+lines = data.get('value', [])
+
+for i, line in enumerate(lines):
+    s = line.strip()
+    if ': error MSB' in s or ': error CS' in s or ': error NU' in s:
+        print(f"L{i}: {s[:300]}")
+    elif 'cannot execute tool' in s or 'MetalToolchain' in s:
+        print(f"L{i}: {s[:300]}")
+    elif 'Build FAILED' in s:
+        print(f"L{i}: {s[:300]}")
+```
+
+The xharness summary section near the end of the log also provides a task-level overview. Search backwards from the end for `Summary:`:
+
+```python
+for i in range(len(lines)-1, -1, -1):
+    if 'Summary:' in lines[i]:
+        for j in range(i, min(len(lines), i+10)):
+            print(lines[j].strip())
+        break
+```
+
+This shows `Executed N tasks`, `Succeeded: N`, `Failed: N`, `Crashed: N`, etc.
+
+### Phase 6: Categorize and report
+
+Group all findings by category and severity. Use this report structure:
+
+```
+## CI Failure Report — Build <buildId>
+
+**Pipeline:** <pipeline-name>
+**Branch:** <branch>
+**Result:** <result>
+
+### Failing Jobs
+
+#### <Job Name> (logId: <id>)
+- **Category:** Test failure | Infrastructure | Build error
+- **Failing tests:**
+  - `<TestClass.TestMethod>` — <assertion message>
+  - ...
+- **Root cause:** <concise explanation>
+
+### Infrastructure Issues
+- <any provisioning/bot/setup failures>
+
+### Summary
+- Total failing jobs: N
+- Test failures: N (list unique test names)
+- Infrastructure failures: N
+- Recommended actions: ...
+```
+
+## Troubleshooting the investigation
+
+### `az devops invoke` returns non-JSON output
+The output can contain trailing text after the JSON object. Always use `json.JSONDecoder().raw_decode()` for parsing rather than `json.loads()`.
+
+### Timeline has many records
+Filter by `result == 'failed'` first. If you need to understand job hierarchy, use the `parentId` field to trace task → job → stage relationships.
+
+### Artifact download fails
+Some artifacts may only be available for a limited time or may require specific permissions. If download fails, fall back to log-based analysis.
+
+### Multiple build URLs provided
+Investigate each build independently but cross-reference failures — if the same test fails across multiple builds, it's likely a real regression rather than flakiness.
diff --git a/.agents/skills/macios-ci-failure-inspector/references/azure-devops-cli.md b/.agents/skills/macios-ci-failure-inspector/references/azure-devops-cli.md
@@ -0,0 +1,95 @@
+# Azure DevOps CLI Reference for macios CI
+
+## Authentication
+
+The `az devops` CLI must be authenticated. Typically this is done via:
+```bash
+az devops configure --defaults organization=https://devdiv.visualstudio.com project=DevDiv
+```
+
+Or by passing `--org` and `--project` on each command.
+
+## Key Commands
+
+### Build metadata
+```bash
+az pipelines build show --id <buildId> -o json
+```
+Returns: result, status, sourceBranch, definition, requestedFor, startTime, finishTime.
+
+### Build timeline (jobs and tasks)
+```bash
+az devops invoke --area build --resource timeline \
+  --route-parameters project=DevDiv buildId=<buildId> \
+  --org https://devdiv.visualstudio.com -o json
+```
+Returns: records array with type (Stage/Job/Task), name, result, state, log.id, parentId.
+
+**Important:** `az pipelines build log list` is NOT a valid command. Use the `az devops invoke` approach above.
+
+### Task logs
+```bash
+az devops invoke --area build --resource logs \
+  --route-parameters project=DevDiv buildId=<buildId> logId=<logId> \
+  --org https://devdiv.visualstudio.com -o json
+```
+Returns: value array of log line strings.
+
+### Artifact listing
+```bash
+az pipelines runs artifact list --run-id <buildId> -o json
+```
+
+### Artifact download
+```bash
+az pipelines runs artifact download \
+  --artifact-name "<name>" \
+  --path /tmp/ci-artifacts/ \
+  --run-id <buildId>
+```
+
+## Common Pipeline Names
+
+- `xamarin-macios-sim-pr-tests` — PR validation with simulator tests
+- Other pipeline names may vary; check `definition.name` from build show.
+
+## Common Job Names in Timeline
+
+- `T: monotouch_ios` — iOS monotouch tests
+- `T: monotouch_tvos` — tvOS monotouch tests
+- `macOS tests` — macOS and Mac Catalyst tests
+- `Reserve macOS bot for tests` — bot provisioning
+- Various build/packaging jobs
+
+## JSON Parsing Caveat
+
+`az devops invoke` output may include trailing non-JSON text. Always parse with:
+```python
+import json
+with open('file.json', 'r') as f:
+    content = f.read()
+data = json.JSONDecoder().raw_decode(content)[0]
+```
+
+Do NOT use `json.loads(content)` directly — it will fail on the trailing text.
+
+## Test Artifact Names
+
+TestSummary and HtmlReport artifacts follow a naming convention:
+- `TestSummary-simulator_tests<jobname>-1` — Markdown summary with pass/fail counts and failure details
+- `HtmlReport-simulator_tests<jobname>-1` — ZIP containing HTML report and NUnit XML files
+
+Common job names:
+- `monotouch_ios`, `monotouch_tvos`, `monotouch_macos`, `monotouch_maccatalyst`
+- `dotnettests_ios`, `dotnettests_tvos`, `dotnettests_macos`, `dotnettests_maccatalyst`
+- `cecil`, `framework`, `xtro`, `msbuild`, `generator`, `sharpie`, `fsharp`, `linker`
+- `introspection`, `xcframework`, `interdependent_binding_projects`
+
+**Important:** Each artifact download overwrites `TestSummary.md` in the target directory. Always download to separate subdirectories named after the artifact.
+
+## Key Investigation Strategy
+
+1. **Start with TestSummary artifacts** — they are the fastest way to identify what failed and why. Raw task logs are 40K+ lines and don't contain standard NUnit patterns inline.
+2. **For test failures (not build failures)**, download HtmlReport artifacts and parse the NUnit XML files inside for exact test names, assertions, and stack traces.
+3. **Only use raw task logs** when you need build error details (MSB/CS/NU errors) or infrastructure error context.
+4. **Map timeline logIds to jobs** using the `parentId` field to trace task → job relationships.