Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions .agents/skills/macios-ci-failure-inspector/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
---
name: macios-ci-failure-inspector
description: Investigate and triage CI failures for dotnet/macios from Azure DevOps build URLs. Use this skill whenever the user shares a DevOps build link, asks about CI failures, wants to understand why a build failed, or asks to investigate test failures on any platform (iOS, tvOS, macOS, Mac Catalyst). Also use when the user says things like "CI is red", "tests are failing", "build broke", or "what happened in CI".
---

# macios CI Failure Inspector

Investigate Azure DevOps CI failures for the dotnet/macios repository, extract root causes, and report findings.

## References

Read these as needed during investigation:

- `references/azure-devops-cli.md` — az CLI commands, artifact naming conventions, and JSON parsing caveats. Read this when you need to construct `az` commands or download artifacts.

## Inputs

Collect from the user:

- **Build URL** — an Azure DevOps build results link, e.g. `https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=<ID>&view=results`
- **Scope** — whether to investigate only, or also attempt fixes (default: investigate only)

Extract the `buildId` from the URL query parameter.

## Investigation workflow

### Phase 1: Build overview

Fetch the build metadata to understand the big picture:

```bash
az pipelines build show --id <buildId> --org https://devdiv.visualstudio.com --project DevDiv -o json > /tmp/build_show.json
```

Extract from the output:
- `result` (succeeded, failed, partiallySucceeded, canceled)
- `sourceBranch` — what branch triggered the build
- `definition.name` — which pipeline ran
- `triggerInfo` or `reason` — what triggered it (PR, push, schedule)

If the build succeeded, tell the user and stop.

### Phase 2: Timeline — identify failing jobs and tasks

The timeline gives you every job and task in the build with its result:

```bash
az devops invoke --area build --resource timeline --route-parameters project=DevDiv buildId=<buildId> --org https://devdiv.visualstudio.com -o json > /tmp/build_timeline.json
```

Parse the timeline to find failed records. Use Python for robust JSON parsing because `az devops invoke` output can include trailing non-JSON text:

```python
import json

with open('/tmp/build_timeline.json', 'r') as f:
content = f.read()
data = json.JSONDecoder().raw_decode(content)[0]

failed = [r for r in data.get('records', []) if r.get('result') == 'failed']
for r in failed:
print(f" [{r['type']}] {r['name']} (id={r['id']}, logId={r.get('log', {}).get('id', 'N/A')})")
```

Group failures into categories:
- **Test failures** — tasks named "Run tests" or jobs like `T: monotouch_ios`, `T: monotouch_tvos`, `macOS tests`
- **Infrastructure failures** — tasks like "Provision Xcode", "Reserve bot", setup tasks
- **Build failures** — compilation or packaging tasks

### Phase 3: Download TestSummary artifacts (primary failure source)

The xharness test runner logs are 40K+ lines and don't contain standard NUnit failure patterns inline. **TestSummary artifacts are the fastest and most reliable way to identify failures.** Always start with these before digging into raw logs.

List all artifacts:

```bash
az pipelines runs artifact list --run-id <buildId> --org https://devdiv.visualstudio.com --project DevDiv -o json
```

Download TestSummary artifacts for each failing job. **Each artifact must go to a separate directory** to avoid overwriting (they all contain a file named `TestSummary.md`):

```bash
artifact="TestSummary-simulator_testsmonotouch_macos-1"
mkdir -p "/tmp/ci-artifacts/${artifact}"
az pipelines runs artifact download \
--artifact-name "$artifact" \
--path "/tmp/ci-artifacts/${artifact}" \
--run-id <buildId> \
--org https://devdiv.visualstudio.com --project DevDiv
cat "/tmp/ci-artifacts/${artifact}/TestSummary.md"
```

The TestSummary.md file contains a structured markdown report with:
- Count of passed/failed tests
- For each failure: test configuration name, failure type (BuildFailure, Failed, Crashed, TimedOut), and brief error message
- Build failures show the configuration variant (e.g. "monotouch-test/macOS/Debug (ARM64): BuildFailure")
- Test failures may include the failing test class and assertion message

Common artifact names map to timeline jobs:
- `TestSummary-simulator_testsmonotouch_ios-1` → monotouch_ios
- `TestSummary-simulator_testsmonotouch_tvos-1` → monotouch_tvos
- `TestSummary-simulator_testsmonotouch_macos-1` → monotouch_macos
- `TestSummary-simulator_testsmonotouch_maccatalyst-1` → monotouch_maccatalyst
- `TestSummary-simulator_testsdotnettests_ios-1` → dotnettests_ios
- `TestSummary-simulator_testsdotnettests_tvos-1` → dotnettests_tvos
- `TestSummary-simulator_testsdotnettests_macos-1` → dotnettests_macos
- `TestSummary-simulator_testsdotnettests_maccatalyst-1` → dotnettests_maccatalyst

Download these in parallel for all failing jobs to save time.

### Phase 4: Get detailed test failure info from HtmlReport artifacts

For test failures (not build failures), download the corresponding HtmlReport artifact to get NUnit XML with exact test names, assertion messages, and stack traces:

```bash
artifact="HtmlReport-simulator_testsmonotouch_tvos-1"
mkdir -p "/tmp/ci-artifacts/${artifact}"
az pipelines runs artifact download \
--artifact-name "$artifact" \
--path "/tmp/ci-artifacts/${artifact}" \
--run-id <buildId> \
--org https://devdiv.visualstudio.com --project DevDiv
cd "/tmp/ci-artifacts/${artifact}" && unzip -o HtmlReport.zip -d htmlreport
```

Parse the NUnit XML files inside for specific test failures:

```python
import xml.etree.ElementTree as ET
import glob

xml_files = glob.glob('htmlreport/tests/monotouch-test/*/test-ios-*.xml')
for xf in sorted(xml_files):
tree = ET.parse(xf)
for tc in tree.getroot().iter('test-case'):
if tc.get('result') == 'Failed':
fullname = tc.get('fullname', 'unknown')
msg_el = tc.find('.//message')
msg = msg_el.text[:200] if msg_el is not None and msg_el.text else ''
trace_el = tc.find('.//stack-trace')
trace = trace_el.text[:300] if trace_el is not None and trace_el.text else ''
print(f"FAILED: {fullname}")
print(f" Message: {msg}")
if trace:
print(f" Stack: {trace}")
```

### Phase 5: Extract build error details from raw logs (for BuildFailure cases)

Only use raw task logs when TestSummary shows BuildFailure and you need the specific compiler/build error. The logs are typically 40K+ lines — search narrowly:

```bash
az devops invoke --area build --resource logs \
--route-parameters project=DevDiv buildId=<buildId> logId=<logId> \
--org https://devdiv.visualstudio.com -o json > /tmp/build_log.json
```

Parse and search for build errors only:

```python
import json

with open('/tmp/build_log.json', 'r') as f:
data = json.JSONDecoder().raw_decode(f.read())[0]
lines = data.get('value', [])

for i, line in enumerate(lines):
s = line.strip()
if ': error MSB' in s or ': error CS' in s or ': error NU' in s:
print(f"L{i}: {s[:300]}")
elif 'cannot execute tool' in s or 'MetalToolchain' in s:
print(f"L{i}: {s[:300]}")
elif 'Build FAILED' in s:
print(f"L{i}: {s[:300]}")
```

The xharness summary section near the end of the log also provides a task-level overview. Search backwards from the end for `Summary:`:

```python
for i in range(len(lines)-1, -1, -1):
if 'Summary:' in lines[i]:
for j in range(i, min(len(lines), i+10)):
print(lines[j].strip())
break
```

This shows `Executed N tasks`, `Succeeded: N`, `Failed: N`, `Crashed: N`, etc.

### Phase 6: Categorize and report

Group all findings by category and severity. Use this report structure:

```
## CI Failure Report — Build <buildId>

**Pipeline:** <pipeline-name>
**Branch:** <branch>
**Result:** <result>

### Failing Jobs

#### <Job Name> (logId: <id>)
- **Category:** Test failure | Infrastructure | Build error
- **Failing tests:**
- `<TestClass.TestMethod>` — <assertion message>
- ...
- **Root cause:** <concise explanation>

### Infrastructure Issues
- <any provisioning/bot/setup failures>

### Summary
- Total failing jobs: N
- Test failures: N (list unique test names)
- Infrastructure failures: N
- Recommended actions: ...
```

## Troubleshooting the investigation

### `az devops invoke` returns non-JSON output
The output can contain trailing text after the JSON object. Always use `json.JSONDecoder().raw_decode()` for parsing rather than `json.loads()`.

### Timeline has many records
Filter by `result == 'failed'` first. If you need to understand job hierarchy, use the `parentId` field to trace task → job → stage relationships.

### Artifact download fails
Some artifacts may only be available for a limited time or may require specific permissions. If download fails, fall back to log-based analysis.

### Multiple build URLs provided
Investigate each build independently but cross-reference failures — if the same test fails across multiple builds, it's likely a real regression rather than flakiness.
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Azure DevOps CLI Reference for macios CI

## Authentication

The `az devops` CLI must be authenticated. Typically this is done via:
```bash
az devops configure --defaults organization=https://devdiv.visualstudio.com project=DevDiv
```

Or by passing `--org` and `--project` on each command.

## Key Commands

### Build metadata
```bash
az pipelines build show --id <buildId> -o json
```
Returns: result, status, sourceBranch, definition, requestedFor, startTime, finishTime.

### Build timeline (jobs and tasks)
```bash
az devops invoke --area build --resource timeline \
--route-parameters project=DevDiv buildId=<buildId> \
--org https://devdiv.visualstudio.com -o json
```
Returns: records array with type (Stage/Job/Task), name, result, state, log.id, parentId.

**Important:** `az pipelines build log list` is NOT a valid command. Use the `az devops invoke` approach above.

### Task logs
```bash
az devops invoke --area build --resource logs \
--route-parameters project=DevDiv buildId=<buildId> logId=<logId> \
--org https://devdiv.visualstudio.com -o json
```
Returns: value array of log line strings.

### Artifact listing
```bash
az pipelines runs artifact list --run-id <buildId> -o json
```

### Artifact download
```bash
az pipelines runs artifact download \
--artifact-name "<name>" \
--path /tmp/ci-artifacts/ \
--run-id <buildId>
```

## Common Pipeline Names

- `xamarin-macios-sim-pr-tests` — PR validation with simulator tests
- Other pipeline names may vary; check `definition.name` from build show.

## Common Job Names in Timeline

- `T: monotouch_ios` — iOS monotouch tests
- `T: monotouch_tvos` — tvOS monotouch tests
- `macOS tests` — macOS and Mac Catalyst tests
- `Reserve macOS bot for tests` — bot provisioning
- Various build/packaging jobs

## JSON Parsing Caveat

`az devops invoke` output may include trailing non-JSON text. Always parse with:
```python
import json
with open('file.json', 'r') as f:
content = f.read()
data = json.JSONDecoder().raw_decode(content)[0]
```

Do NOT use `json.loads(content)` directly — it will fail on the trailing text.

## Test Artifact Names

TestSummary and HtmlReport artifacts follow a naming convention:
- `TestSummary-simulator_tests<jobname>-1` — Markdown summary with pass/fail counts and failure details
- `HtmlReport-simulator_tests<jobname>-1` — ZIP containing HTML report and NUnit XML files

Common job names:
- `monotouch_ios`, `monotouch_tvos`, `monotouch_macos`, `monotouch_maccatalyst`
- `dotnettests_ios`, `dotnettests_tvos`, `dotnettests_macos`, `dotnettests_maccatalyst`
- `cecil`, `framework`, `xtro`, `msbuild`, `generator`, `sharpie`, `fsharp`, `linker`
- `introspection`, `xcframework`, `interdependent_binding_projects`

**Important:** Each artifact download overwrites `TestSummary.md` in the target directory. Always download to separate subdirectories named after the artifact.

## Key Investigation Strategy

1. **Start with TestSummary artifacts** — they are the fastest way to identify what failed and why. Raw task logs are 40K+ lines and don't contain standard NUnit patterns inline.
2. **For test failures (not build failures)**, download HtmlReport artifacts and parse the NUnit XML files inside for exact test names, assertions, and stack traces.
3. **Only use raw task logs** when you need build error details (MSB/CS/NU errors) or infrastructure error context.
4. **Map timeline logIds to jobs** using the `parentId` field to trace task → job relationships.
Loading