microsoft
diff --git a/‎.github/skills/analyze-test-run/SKILL.md‎
Lines changed: 154 additions & 0 deletions b/‎.github/skills/analyze-test-run/SKILL.md‎
Lines changed: 154 additions & 0 deletions
diff --git a/‎.github/skills/analyze-test-run/references/issue-template.md‎
Lines changed: 120 additions & 0 deletions b/‎.github/skills/analyze-test-run/references/issue-template.md‎
Lines changed: 120 additions & 0 deletions
@@ -0,0 +1,154 @@
+---
+name: analyze-test-run
+description: "Analyze a GitHub Actions integration test run and produce a skill invocation report with failure root-cause issues. TRIGGERS: analyze test run, skill invocation rate, test run report, compare test runs, skill invocation summary, test failure analysis, run report, test results, action run report"
+license: MIT
+metadata:
+  author: Microsoft
+  version: "1.0.0"
+---
+
+# Analyze Test Run
+
+Downloads artifacts from a GitHub Actions integration test run, generates a summarized skill invocation report, and files GitHub issues for each test failure with root-cause analysis.
+
+## When to Use
+
+- Summarize results of a GitHub Actions integration test run
+- Calculate skill invocation rates for the skill under test
+- For azure-deploy tests: track the full deployment chain (azure-prepare → azure-validate → azure-deploy)
+- Compare skill invocation across two runs
+- File issues for test failures with root-cause context
+
+## Input
+
+| Parameter | Required | Description |
+|-----------|----------|-------------|
+| **Run ID or URL** | Yes | GitHub Actions run ID (e.g. `22373768875`) or full URL |
+| **Comparison Run** | No | Second run ID/URL for side-by-side comparison |
+
+## Workflow
+
+### Phase 1 — Download & Parse
+
+1. Extract the numeric run ID from the input (strip URL prefix if needed)
+2. Fetch run metadata:
+   ```bash
+   gh run view <run-id> --repo microsoft/GitHub-Copilot-for-Azure --json jobs,status,conclusion,name
+   ```
+3. Download artifacts to a temp directory:
+   ```bash
+   gh run download <run-id> --repo microsoft/GitHub-Copilot-for-Azure --dir "$TMPDIR/gh-run-<run-id>"
+   ```
+4. Locate these files in the downloaded artifacts:
+   - `junit.xml` — test pass/fail/skip/error results
+   - `*-SKILL-REPORT.md` — generated skill report with per-test details
+   - `agent-metadata-*.md` files — raw agent session logs per test
+
+### Phase 2 — Build Summary Report
+
+Produce a markdown report with four sections. See [report-format.md](references/report-format.md) for the exact template.
+
+**Section 1 — Test Results Overview**
+
+Parse `junit.xml` to build:
+
+| Metric | Value |
+|--------|-------|
+| Total tests | count from `<testsuites tests=…>` |
+| Executed | total − skipped |
+| Skipped | count of `<skipped/>` elements |
+| Passed | executed − failures − errors |
+| Failed | count of `<failure>` elements |
+| Test Pass Rate | passed / executed as % |
+
+Include a per-test table with name, duration (from `time` attribute, convert seconds to `Xm Ys`), and Pass/Fail result.
+
+**Section 2 — Skill Invocation Rate**
+
+Read the SKILL-REPORT.md "Per-Test Case Results" sections. For each executed test determine whether the skill under test was invoked.
+
+The skills to track depend on which integration test suite the run belongs to:
+
+**azure-deploy integration tests** — track the full deployment chain:
+
+| Skill | How to detect |
+|-------|---------------|
+| `azure-prepare` | Mentioned as invoked in the narrative or agent-metadata |
+| `azure-validate` | Mentioned as invoked in the narrative or agent-metadata |
+| `azure-deploy` | Mentioned as invoked in the narrative or agent-metadata |
+
+Build a per-test invocation matrix (Yes/No for each skill) and compute rates:
+
+| Skill | Invocation Rate |
+|-------|----------------|
+| azure-deploy | X% (n/total) |
+| azure-prepare | X% (n/total) |
+| azure-validate | X% (n/total) |
+| Full skill chain (P→V→D) | X% (n/total) |
+
+> The azure-deploy integration tests exercise the full deployment workflow where the agent is expected to invoke azure-prepare, azure-validate, and azure-deploy in sequence. This three-skill chain tracking is **specific to azure-deploy tests only**.
+
+**All other integration tests** — track only the skill under test:
+
+| Skill | Invocation Rate |
+|-------|----------------|
+| {skill-under-test} | X% (n/total) |
+
+For non-deploy tests (e.g. azure-prepare, azure-ai, azure-kusto), only track whether the primary skill under test was invoked. Do not include azure-prepare/azure-validate/azure-deploy chain columns.
+
+**Section 3 — Report Confidence & Pass Rate**
+
+Extract from SKILL-REPORT.md:
+- Overall Test Pass Rate (from the report's statistics section)
+- Average Confidence (from the report's statistics section)
+
+**Section 4 — Comparison** (only when a second run is provided)
+
+Repeat Phase 1–3 for the second run, then produce a side-by-side delta table. See [report-format.md](references/report-format.md) § Comparison.
+
+### Phase 3 — File Issues for Failures
+
+For every test with a `<failure>` element in `junit.xml`:
+
+1. Read the failure message and file:line from the XML
+2. Read the actual line of code from the test file at that location
+3. Read the `agent-metadata-*.md` for that test from the artifacts
+4. Read the corresponding section in the SKILL-REPORT.md for context on what the agent did
+5. Determine root cause category:
+   - **Skill not invoked** — agent bypassed skills and used manual commands
+   - **Deployment failure** — infrastructure or RBAC error during deployment
+   - **Timeout** — test exceeded time limit
+   - **Assertion mismatch** — expected files/links not found
+   - **Quota exhaustion** — Azure region quota prevented deployment
+6. Create a GitHub issue:
+
+```
+gh issue create --repo microsoft/GitHub-Copilot-for-Azure \
+  --title "Integration test failure: <skill> – <keywords> [<root-cause-category>]" \
+  --label "bug,integration-test" \
+  --body "<body>"
+```
+
+   **Title format:** `Integration test failure: {skill} – {keywords} [{root-cause-category}]`
+   - `{keywords}`: 2-4 words from the test name — app type (function app, static web app) + IaC type (Terraform, Bicep) + trigger if relevant
+   - `{root-cause-category}`: one of the categories from step 5 in brackets
+
+Issue body template — see [issue-template.md](references/issue-template.md).
+
+> ⚠️ **Note:** Do NOT include the Error Details (JUnit XML) or Agent Metadata sections in the issue body. Keep issues concise with the diagnosis, prompt context, skill report context, and environment sections only.
+
+> For azure-deploy integration tests, include an "azure-deploy Skill Invocation" section showing whether azure-deploy was invoked (Yes/No), with a note that the full chain is azure-prepare → azure-validate → azure-deploy. For all other integration tests, include a "{skill} Skill Invocation" section showing only whether the primary skill under test was invoked.
+
+## Error Handling
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| `gh: command not found` | GitHub CLI not installed | Install with `winget install GitHub.cli` or `brew install gh` |
+| `no artifacts found` | Run has no uploadable reports | Verify the run completed the "Export report" step |
+| `HTTP 404` on run view | Invalid run ID or no access | Check the run ID and ensure `gh auth status` is authenticated |
+| `rate limit exceeded` | Too many GitHub API calls | Wait and retry, or use `--limit` on searches |
+
+## References
+
+- [report-format.md](references/report-format.md) — Output report template
+- [issue-template.md](references/issue-template.md) — GitHub issue body template
@@ -0,0 +1,120 @@
+# Issue Template
+
+Use this template for every GitHub issue created for a test failure.
+
+## Issue Title
+
+```
+Integration test failure: {skill} – {keywords} [{root-cause-category}]
+```
+
+### Title Construction
+
+| Component | Description | Example |
+|-----------|-------------|--------|
+| `{skill}` | The skill under test | `azure-deploy` |
+| `{keywords}` | 2-4 keywords extracted from the test name — app type + IaC type | `function app Terraform`, `function app Bicep`, `static web app` |
+| `{root-cause-category}` | Root cause in brackets | `Skill not invoked`, `Timeout`, `Deployment failure` |
+
+**Keyword extraction rules:**
+- Include the app/service type: `function app`, `static web app`, `container app`, `URL shortener`
+- Include IaC type if specified: `Terraform`, `Bicep`
+- Include trigger type if relevant: `Service Bus trigger`, `Event Grid trigger`
+- Omit filler words like "creates", "deploys", "with", "using", "infrastructure"
+- Keep to 2-4 meaningful keywords
+
+**Examples:**
+- `Integration test failure: azure-deploy – function app Terraform [Timeout]`
+- `Integration test failure: azure-deploy – event-driven function app [Skill not invoked]`
+- `Integration test failure: azure-deploy – function app Service Bus Terraform [Deployment failure]`
+- `Integration test failure: azure-prepare – static web app Bicep [Assertion mismatch]`
+
+## Labels
+
+```
+bug, integration-test
+```
+
+## Issue Body
+
+````markdown
+## Prompt
+
+```
+{the prompt string from the test}
+```
+
+## Summary
+
+**Run:** [{run-name} #{run-number}]({run-url})
+**Test:** `{full-test-name}`
+**Result:** {Fail / Timeout}
+**Duration:** {Xm Ys}
+
+## Root Cause Category
+
+<!-- One of: Skill not invoked, Deployment failure, Timeout, Assertion mismatch, Quota exhaustion -->
+**{category}**
+
+## Diagnosis
+
+### What was expected
+{Describe what the test asserts — e.g., "expects deploy links in assistant output", "expects .bicep files in workspace"}
+
+### What actually happened
+{Describe the agent's actual behavior based on agent-metadata — e.g., "agent manually ran azd up instead of invoking azure-deploy skill", "deployment timed out after 30 minutes"}
+
+### Why it failed
+{1-2 sentences root cause — e.g., "The agent did not recognize the prompt as a skill-triggering intent and fell back to manual commands. No deploy links were produced because the deployment hit RBAC errors."}
+
+### Suggested fix
+{One of:}
+- Update skill triggers/description to better match this prompt pattern
+- Update test assertions or timeout
+- Fix RBAC/quota pre-flight checks in skill
+- Investigate model behavior (agent bypassing skill)
+
+## {skill} Skill Invocation
+
+<!-- For azure-deploy integration tests, show the deploy skill status with a note about the full chain -->
+<!-- For all other integration tests, show only the primary skill under test -->
+
+### azure-deploy tests
+
+| Skill | Invoked |
+|-------|---------|
+| **azure-deploy** | **Yes** / No |
+
+> In azure-deploy integration tests, the full skill chain is azure-prepare → azure-validate → azure-deploy. {Describe whether the full chain was invoked or which skills were bypassed.}
+
+### All other integration tests
+
+| Skill | Invoked |
+|-------|---------|
+| **{skill-under-test}** | **Yes** / No |
+
+## Skill Report Context
+
+<details>
+<summary>Per-test section from SKILL-REPORT.md (click to expand)</summary>
+
+{PASTE THE RELEVANT PER-TEST SECTION FROM THE SKILL-REPORT HERE}
+
+</details>
+
+## Environment
+
+- **Runner OS:** ubuntu-latest
+- **Node.js:** (from workflow)
+- **Model:** claude-sonnet-4.5 (or as overridden)
+- **Run URL:** {run-url}
+- **Commit:** {commit-sha}
+````
+
+## Rules
+
+- One issue per failed test. Do not batch multiple failures into one issue.
+- If multiple failures share the same root cause, note this in each issue body with a cross-reference.
+- Do NOT include Error Details (JUnit XML) or Agent Metadata sections — keep issues concise.
+- For azure-deploy integration tests, include the skill invocation section showing azure-deploy status and note the full chain (azure-prepare → azure-validate → azure-deploy).
+- For all other integration tests, include a skill invocation section showing only the primary skill under test.