Skip to content

Commit e0d9c3b

Browse files
authored
Merge branch 'main' into c2c/skills
2 parents c184e68 + cc82eeb commit e0d9c3b

File tree

170 files changed

+9875
-2523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

170 files changed

+9875
-2523
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
---
2+
name: analyze-test-run
3+
description: "Analyze a GitHub Actions integration test run and produce a skill invocation report with failure root-cause issues. TRIGGERS: analyze test run, skill invocation rate, test run report, compare test runs, skill invocation summary, test failure analysis, run report, test results, action run report"
4+
license: MIT
5+
metadata:
6+
author: Microsoft
7+
version: "1.0.0"
8+
---
9+
10+
# Analyze Test Run
11+
12+
Downloads artifacts from a GitHub Actions integration test run, generates a summarized skill invocation report, and files GitHub issues for each test failure with root-cause analysis.
13+
14+
## When to Use
15+
16+
- Summarize results of a GitHub Actions integration test run
17+
- Calculate skill invocation rates for the skill under test
18+
- For azure-deploy tests: track the full deployment chain (azure-prepare → azure-validate → azure-deploy)
19+
- Compare skill invocation across two runs
20+
- File issues for test failures with root-cause context
21+
22+
## Input
23+
24+
| Parameter | Required | Description |
25+
|-----------|----------|-------------|
26+
| **Run ID or URL** | Yes | GitHub Actions run ID (e.g. `22373768875`) or full URL |
27+
| **Comparison Run** | No | Second run ID/URL for side-by-side comparison |
28+
29+
## Workflow
30+
31+
### Phase 1 — Download & Parse
32+
33+
1. Extract the numeric run ID from the input (strip URL prefix if needed)
34+
2. Fetch run metadata:
35+
```bash
36+
gh run view <run-id> --repo microsoft/GitHub-Copilot-for-Azure --json jobs,status,conclusion,name
37+
```
38+
3. Download artifacts to a temp directory:
39+
```bash
40+
gh run download <run-id> --repo microsoft/GitHub-Copilot-for-Azure --dir "$TMPDIR/gh-run-<run-id>"
41+
```
42+
4. Locate these files in the downloaded artifacts:
43+
- `junit.xml` — test pass/fail/skip/error results
44+
- `*-SKILL-REPORT.md` — generated skill report with per-test details
45+
- `agent-metadata-*.md` files — raw agent session logs per test
46+
47+
### Phase 2 — Build Summary Report
48+
49+
Produce a markdown report with four sections. See [report-format.md](references/report-format.md) for the exact template.
50+
51+
**Section 1 — Test Results Overview**
52+
53+
Parse `junit.xml` to build:
54+
55+
| Metric | Value |
56+
|--------|-------|
57+
| Total tests | count from `<testsuites tests=…>` |
58+
| Executed | total − skipped |
59+
| Skipped | count of `<skipped/>` elements |
60+
| Passed | executed − failures − errors |
61+
| Failed | count of `<failure>` elements |
62+
| Test Pass Rate | passed / executed as % |
63+
64+
Include a per-test table with name, duration (from `time` attribute, convert seconds to `Xm Ys`), and Pass/Fail result.
65+
66+
**Section 2 — Skill Invocation Rate**
67+
68+
Read the SKILL-REPORT.md "Per-Test Case Results" sections. For each executed test determine whether the skill under test was invoked.
69+
70+
The skills to track depend on which integration test suite the run belongs to:
71+
72+
**azure-deploy integration tests** — track the full deployment chain:
73+
74+
| Skill | How to detect |
75+
|-------|---------------|
76+
| `azure-prepare` | Mentioned as invoked in the narrative or agent-metadata |
77+
| `azure-validate` | Mentioned as invoked in the narrative or agent-metadata |
78+
| `azure-deploy` | Mentioned as invoked in the narrative or agent-metadata |
79+
80+
Build a per-test invocation matrix (Yes/No for each skill) and compute rates:
81+
82+
| Skill | Invocation Rate |
83+
|-------|----------------|
84+
| azure-deploy | X% (n/total) |
85+
| azure-prepare | X% (n/total) |
86+
| azure-validate | X% (n/total) |
87+
| Full skill chain (P→V→D) | X% (n/total) |
88+
89+
> The azure-deploy integration tests exercise the full deployment workflow where the agent is expected to invoke azure-prepare, azure-validate, and azure-deploy in sequence. This three-skill chain tracking is **specific to azure-deploy tests only**.
90+
91+
**All other integration tests** — track only the skill under test:
92+
93+
| Skill | Invocation Rate |
94+
|-------|----------------|
95+
| {skill-under-test} | X% (n/total) |
96+
97+
For non-deploy tests (e.g. azure-prepare, azure-ai, azure-kusto), only track whether the primary skill under test was invoked. Do not include azure-prepare/azure-validate/azure-deploy chain columns.
98+
99+
**Section 3 — Report Confidence & Pass Rate**
100+
101+
Extract from SKILL-REPORT.md:
102+
- Overall Test Pass Rate (from the report's statistics section)
103+
- Average Confidence (from the report's statistics section)
104+
105+
**Section 4 — Comparison** (only when a second run is provided)
106+
107+
Repeat Phase 1–3 for the second run, then produce a side-by-side delta table. See [report-format.md](references/report-format.md) § Comparison.
108+
109+
### Phase 3 — File Issues for Failures
110+
111+
For every test with a `<failure>` element in `junit.xml`:
112+
113+
1. Read the failure message and file:line from the XML
114+
2. Read the actual line of code from the test file at that location
115+
3. Read the `agent-metadata-*.md` for that test from the artifacts
116+
4. Read the corresponding section in the SKILL-REPORT.md for context on what the agent did
117+
5. Determine root cause category:
118+
- **Skill not invoked** — agent bypassed skills and used manual commands
119+
- **Deployment failure** — infrastructure or RBAC error during deployment
120+
- **Timeout** — test exceeded time limit
121+
- **Assertion mismatch** — expected files/links not found
122+
- **Quota exhaustion** — Azure region quota prevented deployment
123+
6. Create a GitHub issue:
124+
125+
```
126+
gh issue create --repo microsoft/GitHub-Copilot-for-Azure \
127+
--title "Integration test failure: <skill> – <keywords> [<root-cause-category>]" \
128+
--label "bug,integration-test" \
129+
--body "<body>"
130+
```
131+
132+
**Title format:** `Integration test failure: {skill} – {keywords} [{root-cause-category}]`
133+
- `{keywords}`: 2-4 words from the test name — app type (function app, static web app) + IaC type (Terraform, Bicep) + trigger if relevant
134+
- `{root-cause-category}`: one of the categories from step 5 in brackets
135+
136+
Issue body template — see [issue-template.md](references/issue-template.md).
137+
138+
> ⚠️ **Note:** Do NOT include the Error Details (JUnit XML) or Agent Metadata sections in the issue body. Keep issues concise with the diagnosis, prompt context, skill report context, and environment sections only.
139+
140+
> For azure-deploy integration tests, include an "azure-deploy Skill Invocation" section showing whether azure-deploy was invoked (Yes/No), with a note that the full chain is azure-prepare → azure-validate → azure-deploy. For all other integration tests, include a "{skill} Skill Invocation" section showing only whether the primary skill under test was invoked.
141+
142+
## Error Handling
143+
144+
| Error | Cause | Fix |
145+
|-------|-------|-----|
146+
| `gh: command not found` | GitHub CLI not installed | Install with `winget install GitHub.cli` or `brew install gh` |
147+
| `no artifacts found` | Run has no uploadable reports | Verify the run completed the "Export report" step |
148+
| `HTTP 404` on run view | Invalid run ID or no access | Check the run ID and ensure `gh auth status` is authenticated |
149+
| `rate limit exceeded` | Too many GitHub API calls | Wait and retry, or use `--limit` on searches |
150+
151+
## References
152+
153+
- [report-format.md](references/report-format.md) — Output report template
154+
- [issue-template.md](references/issue-template.md) — GitHub issue body template
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Issue Template
2+
3+
Use this template for every GitHub issue created for a test failure.
4+
5+
## Issue Title
6+
7+
```
8+
Integration test failure: {skill} – {keywords} [{root-cause-category}]
9+
```
10+
11+
### Title Construction
12+
13+
| Component | Description | Example |
14+
|-----------|-------------|--------|
15+
| `{skill}` | The skill under test | `azure-deploy` |
16+
| `{keywords}` | 2-4 keywords extracted from the test name — app type + IaC type | `function app Terraform`, `function app Bicep`, `static web app` |
17+
| `{root-cause-category}` | Root cause in brackets | `Skill not invoked`, `Timeout`, `Deployment failure` |
18+
19+
**Keyword extraction rules:**
20+
- Include the app/service type: `function app`, `static web app`, `container app`, `URL shortener`
21+
- Include IaC type if specified: `Terraform`, `Bicep`
22+
- Include trigger type if relevant: `Service Bus trigger`, `Event Grid trigger`
23+
- Omit filler words like "creates", "deploys", "with", "using", "infrastructure"
24+
- Keep to 2-4 meaningful keywords
25+
26+
**Examples:**
27+
- `Integration test failure: azure-deploy – function app Terraform [Timeout]`
28+
- `Integration test failure: azure-deploy – event-driven function app [Skill not invoked]`
29+
- `Integration test failure: azure-deploy – function app Service Bus Terraform [Deployment failure]`
30+
- `Integration test failure: azure-prepare – static web app Bicep [Assertion mismatch]`
31+
32+
## Labels
33+
34+
```
35+
bug, integration-test
36+
```
37+
38+
## Issue Body
39+
40+
````markdown
41+
## Prompt
42+
43+
```
44+
{the prompt string from the test}
45+
```
46+
47+
## Summary
48+
49+
**Run:** [{run-name} #{run-number}]({run-url})
50+
**Test:** `{full-test-name}`
51+
**Result:** {Fail / Timeout}
52+
**Duration:** {Xm Ys}
53+
54+
## Root Cause Category
55+
56+
<!-- One of: Skill not invoked, Deployment failure, Timeout, Assertion mismatch, Quota exhaustion -->
57+
**{category}**
58+
59+
## Diagnosis
60+
61+
### What was expected
62+
{Describe what the test asserts — e.g., "expects deploy links in assistant output", "expects .bicep files in workspace"}
63+
64+
### What actually happened
65+
{Describe the agent's actual behavior based on agent-metadata — e.g., "agent manually ran azd up instead of invoking azure-deploy skill", "deployment timed out after 30 minutes"}
66+
67+
### Why it failed
68+
{1-2 sentences root cause — e.g., "The agent did not recognize the prompt as a skill-triggering intent and fell back to manual commands. No deploy links were produced because the deployment hit RBAC errors."}
69+
70+
### Suggested fix
71+
{One of:}
72+
- Update skill triggers/description to better match this prompt pattern
73+
- Update test assertions or timeout
74+
- Fix RBAC/quota pre-flight checks in skill
75+
- Investigate model behavior (agent bypassing skill)
76+
77+
## {skill} Skill Invocation
78+
79+
<!-- For azure-deploy integration tests, show the deploy skill status with a note about the full chain -->
80+
<!-- For all other integration tests, show only the primary skill under test -->
81+
82+
### azure-deploy tests
83+
84+
| Skill | Invoked |
85+
|-------|---------|
86+
| **azure-deploy** | **Yes** / No |
87+
88+
> In azure-deploy integration tests, the full skill chain is azure-prepare → azure-validate → azure-deploy. {Describe whether the full chain was invoked or which skills were bypassed.}
89+
90+
### All other integration tests
91+
92+
| Skill | Invoked |
93+
|-------|---------|
94+
| **{skill-under-test}** | **Yes** / No |
95+
96+
## Skill Report Context
97+
98+
<details>
99+
<summary>Per-test section from SKILL-REPORT.md (click to expand)</summary>
100+
101+
{PASTE THE RELEVANT PER-TEST SECTION FROM THE SKILL-REPORT HERE}
102+
103+
</details>
104+
105+
## Environment
106+
107+
- **Runner OS:** ubuntu-latest
108+
- **Node.js:** (from workflow)
109+
- **Model:** claude-sonnet-4.5 (or as overridden)
110+
- **Run URL:** {run-url}
111+
- **Commit:** {commit-sha}
112+
````
113+
114+
## Rules
115+
116+
- One issue per failed test. Do not batch multiple failures into one issue.
117+
- If multiple failures share the same root cause, note this in each issue body with a cross-reference.
118+
- Do NOT include Error Details (JUnit XML) or Agent Metadata sections — keep issues concise.
119+
- For azure-deploy integration tests, include the skill invocation section showing azure-deploy status and note the full chain (azure-prepare → azure-validate → azure-deploy).
120+
- For all other integration tests, include a skill invocation section showing only the primary skill under test.

0 commit comments

Comments
 (0)