diff --git a/docs/README.skills.md b/docs/README.skills.md index 0fd5a74ba..18786860d 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -102,6 +102,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [convert-plaintext-to-md](../skills/convert-plaintext-to-md/SKILL.md)
`gh skills install github/awesome-copilot convert-plaintext-to-md` | Convert a text-based document to markdown following instructions from prompt, or if a documented option is passed, follow the instructions for that option. | None | | [copilot-cli-quickstart](../skills/copilot-cli-quickstart/SKILL.md)
`gh skills install github/awesome-copilot copilot-cli-quickstart` | Use this skill when someone wants to learn GitHub Copilot CLI from scratch. Offers interactive step-by-step tutorials with separate Developer and Non-Developer tracks, plus on-demand Q&A. Just say "start tutorial" or ask a question! Note: This skill targets GitHub Copilot CLI specifically and uses CLI-specific tools (ask_user, sql, fetch_copilot_cli_documentation). | None | | [copilot-instructions-blueprint-generator](../skills/copilot-instructions-blueprint-generator/SKILL.md)
`gh skills install github/awesome-copilot copilot-instructions-blueprint-generator` | Technology-agnostic blueprint generator for creating comprehensive copilot-instructions.md files that guide GitHub Copilot to produce code consistent with project standards, architecture patterns, and exact technology versions by analyzing existing codebase patterns and avoiding assumptions. | None | +| [copilot-pr-autopilot](../skills/copilot-pr-autopilot/SKILL.md)
`gh skills install github/awesome-copilot copilot-pr-autopilot` | Copilot left 14 review comments on your PR — half are nits. Hours of fix → reply → resolve → re-request, and each round lands MORE comments. This skill runs loop engineering: auto-triggers Copilot Code Review via GraphQL (no @copilot mention), triages every open thread (Copilot, humans, advanced-security) with a fix / decline / escalate rubric, dispatches parallel fix sub-agents that obey the repo build/test/lint conventions, commits per iteration, replies+resolves citing the pushed SHA, then re-triggers until HEAD is reviewed with zero threads awaiting the agent's reply (remaining open threads are explicit hand-offs to the human — escalated declines, design tradeoffs). You merge a clean PR; the bot runs it. Trigger phrases: "address copilot comments", "run a copilot review loop", "fix this PR", "iterate on copilot feedback". Repo-agnostic, gh CLI + PowerShell. Full autopilot needs repo Triage/Write; external PR authors get single-iteration mode plus manual re-trigger (UI 🔄 or substantive-commit push). | `assets/reply-decline.md`
`assets/reply-drift.md`
`assets/reply-fix.md`
`assets/reply-partial.md`
`references/02-wait.md`
`references/03-list-threads.md`
`references/04-triage.md`
`references/05-fix.md`
`references/06-build-test.md`
`references/08-reply-resolve.md`
`references/09-convergence.md`
`references/api-quirks.md`
`references/orchestration.md`
`scripts/01-request-review.ps1`
`scripts/02-check-review-status.ps1`
`scripts/03-list-open-threads.ps1`
`scripts/08-reply-and-resolve.ps1`
`scripts/10-cleanup-outdated.ps1`
`scripts/_lib.ps1` | | [copilot-sdk](../skills/copilot-sdk/SKILL.md)
`gh skills install github/awesome-copilot copilot-sdk` | Build agentic applications with GitHub Copilot SDK. Use when embedding AI agents in apps, creating custom tools, implementing streaming responses, managing sessions, connecting to MCP servers, or creating custom agents. Triggers on Copilot SDK, GitHub SDK, agentic app, embed Copilot, programmable agent, MCP server, custom agent. | None | | [copilot-spaces](../skills/copilot-spaces/SKILL.md)
`gh skills install github/awesome-copilot copilot-spaces` | Use Copilot Spaces to provide project-specific context to conversations. Use this skill when users mention a "Copilot space", want to load context from a shared knowledge base, discover available spaces, or ask questions grounded in curated project documentation, code, and instructions. | None | | [copilot-usage-metrics](../skills/copilot-usage-metrics/SKILL.md)
`gh skills install github/awesome-copilot copilot-usage-metrics` | Retrieve and display GitHub Copilot usage metrics for organizations and enterprises using the GitHub CLI and REST API. | `get-enterprise-metrics.sh`
`get-enterprise-user-metrics.sh`
`get-org-metrics.sh`
`get-org-user-metrics.sh` | diff --git a/skills/copilot-pr-autopilot/SKILL.md b/skills/copilot-pr-autopilot/SKILL.md new file mode 100644 index 000000000..c4f89524d --- /dev/null +++ b/skills/copilot-pr-autopilot/SKILL.md @@ -0,0 +1,157 @@ +--- +name: copilot-pr-autopilot +description: 'Copilot left 14 review comments on your PR — half are nits. Hours of fix → reply → resolve → re-request, and each round lands MORE comments. This skill runs loop engineering: auto-triggers Copilot Code Review via GraphQL (no @copilot mention), triages every open thread (Copilot, humans, advanced-security) with a fix / decline / escalate rubric, dispatches parallel fix sub-agents that obey the repo build/test/lint conventions, commits per iteration, replies+resolves citing the pushed SHA, then re-triggers until HEAD is reviewed with zero threads awaiting the agent''s reply (remaining open threads are explicit hand-offs to the human — escalated declines, design tradeoffs). You merge a clean PR; the bot runs it. Trigger phrases: "address copilot comments", "run a copilot review loop", "fix this PR", "iterate on copilot feedback". Repo-agnostic, gh CLI + PowerShell. Full autopilot needs repo Triage/Write; external PR authors get single-iteration mode plus manual re-trigger (UI 🔄 or substantive-commit push).' +--- + +# Copilot PR Autopilot + +Drive any GitHub pull request through repeated rounds of Copilot code +review until the agent has done its job — every Copilot finding has +a reply from the agent (fix-acknowledgement, decline-with-rationale, +or explicit escalate-to-user hand-off). Remaining open threads, if +any, are deliberate hand-offs to the human merge owner — they're +not loop failures. Repository-agnostic — works on any repo that has +Copilot Code Review enabled, run from a machine with `gh` CLI +installed and authenticated (see Prerequisites). + +## When to Use This Skill + +- The user asks to "request Copilot review" or "run a Copilot review loop" + on a PR. +- A PR is functionally complete and the user wants a final correctness pass + via repeated automated review rounds. +- A previous Copilot review on the PR has left open threads that need + triage, fixing, replying, and resolving. + +## When NOT to Use This Skill + +- The PR is still under active design — wait until the structure is stable; + otherwise findings churn round-over-round. +- The user wants human reviewer feedback, not Copilot's. + +## Prerequisites + +- `gh` CLI installed and authenticated against the target repository. +- PowerShell on PATH — Windows PowerShell 5.1+ (`powershell.exe`) or + PowerShell 7+ (`pwsh`). Both are tested. Examples throughout this + skill (including reference docs) invoke scripts as `pwsh ...`; on + Windows-only machines that ship 5.1 and do not have PowerShell 7 + installed, substitute `powershell` (or `powershell.exe`) for `pwsh` + — the scripts themselves are runtime-agnostic. +- Copilot Code Review is the primary use case (`01-request-review.ps1` + uses GraphQL `requestReviewsByLogin` to trigger Copilot). It is + **NOT a hard requirement** — if `01-request-review.ps1` fails + because Copilot isn't enabled on the repo / account, the agent can + still drive existing review threads (human, advanced-security, etc.) + to completion by running steps 3–8 once as a single iteration; just + skip the trigger + wait. There is no auto-detect for "Copilot + unavailable" — the agent makes that decision after the trigger + fails (the script can't reliably tell "Copilot disabled" from + "Copilot enabled but not yet triggered" from API state alone). + +### Permissions: who can run the full loop + +The full multi-round autopilot (steps 1 → 9 → 1) needs **Triage or Write** permission on the target repo, because GitHub's only public API for adding the Copilot bot as a reviewer (`requestReviewsByLogin`) is gated on that permission. The exact public-API surface that was enumerated to reach this conclusion (REST `requested_reviewers`, GraphQL `requestReviews`, GraphQL `requestReviewsByLogin`, etc.) is documented in [`references/api-quirks.md`](references/api-quirks.md) — there is no public-API path for bot reviewers without write permission. + +| You are… | What works | +|---|---| +| **Repo collaborator with Triage / Write** | Full loop: `01` triggers Copilot, `02` waits, `04`–`08` triage / fix / reply, loop back to `01`. Hands-off. | +| **External PR author (no write permission)** | `01` will throw a clear actionable error. Use `-SingleIteration` mode: address all current findings in one pass, then either click the UI 🔄 next to Copilot, **or** push a substantive commit (the `synchronize` event auto-triggers Copilot on most repos). Then re-run `02` to verify. | + +In single-iteration mode the loop's convergence boolean is `Converged: true` iff `OpenThreadsAwaitingReply == 0` (the agent's side is done). The maintainer-side re-trigger then drives any additional rounds. + +Every script dot-sources [scripts/_lib.ps1](scripts/_lib.ps1) which +runs `Assert-GhReady` on load: if `gh` is missing OR `gh auth status` +fails, the script halts **before any work** with a single actionable +error message naming the install command and `gh auth login`. The +agent should surface that message to the user verbatim and stop the +loop — do not retry or work around it. + +## Step-by-Step Workflow + +> **The loop:** steps 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9, then **back to step 1** if `Converged: false`. Repeat the 1→9 round until step 9 returns `Converged: true`; only then run step 10 once and call `task_complete`. + +Each round runs steps 1–9; step 10 is a one-time cleanup after convergence. The parent agent coordinates; every sub-agent step runs in a fresh context with a bounded budget. Cross-cutting protocol (time-boxing, extension, single-iteration fallback): [orchestration.md](references/orchestration.md). + +1. **Request review** _(parent)_ — see [orchestration.md#step-1-request-review](references/orchestration.md#step-1-request-review) +2. **Wait for review** _(sub-agent, 20-min cap)_ — see [02-wait.md](references/02-wait.md) +3. **List + categorize open threads** _(sub-agent, 5 min)_ — see [03-list-threads.md](references/03-list-threads.md) +4. **Triage** _(sub-agent, 5 min per ≤5 threads)_ — see [04-triage.md](references/04-triage.md) +5. **Fix** _(sub-agents, parallel max 5, 5 min each)_ — see [05-fix.md](references/05-fix.md) +6. **Build + test per repo conventions** _(sub-agent, 10 min)_ — see [06-build-test.md](references/06-build-test.md) +7. **Commit + push** _(parent)_ — see [orchestration.md#step-7-commit-and-push](references/orchestration.md#step-7-commit-and-push) +8. **Reply (always) + resolve (conditional)** _(sub-agent drafts, parent posts)_ — see [08-reply-resolve.md](references/08-reply-resolve.md) +9. **Convergence verify** _(sub-agent, 3 min)_ — see [09-convergence.md](references/09-convergence.md) + - **`Converged: false` → loop back to step 1** for another round (re-trigger, wait, list, triage, fix, push, reply, re-check). Each round addresses Copilot's findings on the previous round's HEAD; the loop terminates as soon as Copilot has nothing new to say AND every open thread has a reply from the agent. + - **`Converged: true` → exit the loop**, run step 10 once, call `task_complete` with the proof. +10. **Cleanup outdated** _(parent, post-convergence, once)_ — see [orchestration.md#step-10-cleanup-outdated](references/orchestration.md#step-10-cleanup-outdated) + +Convergence is computed by [scripts/02-check-review-status.ps1](scripts/02-check-review-status.ps1) as a single `Converged: true` boolean. Do **not** call `task_complete` until it returns true; print the proof (`HeadOid`, `LatestCopilotReview.commitOid`, `submittedAt`) in the completion message. + +## Gotchas + +The bundled scripts enforce the hard correctness invariants (trigger landing via `copilot_work_started` event id, `Converged` requiring HEAD-match + zero-awaiting + at-HEAD review, single-iteration fallback semantics, PR-state guard). Trust them — don't re-derive. The notes below cover decisions the scripts can't make for you: + +- **Reply to every open thread; resolve only when the loop owns the disposition.** For `fix` and `decline` threads, reply + resolve. For `escalate-to-user` threads, reply with the analysis but leave the thread OPEN (`08-reply-and-resolve.ps1 -NoResolve`) so the human merge owner can act on it. See [08-reply-resolve.md](references/08-reply-resolve.md). +- **Copilot threads are loop-owned; human / advanced-security / other-bot threads default to `escalate-to-user`.** Auto-resolving a human review thread can hide unaddressed concerns. See [04-triage.md](references/04-triage.md) for the rubric. +- **One focused commit per round, not one per PR.** Bundling rounds destroys the audit trail of which finding drove which change and breaks `git bisect`. See [orchestration.md#step-7-commit-and-push](references/orchestration.md#step-7-commit-and-push). +- **Build/test/lint with the repo's own commands** (per its `CONTRIBUTING` / `AGENTS` / `README` / `package.json` / `Makefile`) before pushing a fix. Discovery procedure: [06-build-test.md](references/06-build-test.md). +- **Push back with written rationale** when a Copilot finding would over-engineer the design for a hypothetical edge case. Auto-accepting every suggestion erodes the design — see the `decline` path in [04-triage.md](references/04-triage.md). +- **Scripting traps** (`gh api graphql -F` type-coercion, `git stash push -m` positional parsing, the three GraphQL traps for the reviewer mutation) are documented in [references/api-quirks.md](references/api-quirks.md). Read before modifying any script. + +## Troubleshooting + +| Issue | Solution | +|-------|----------| +| Script throws `prerequisite missing — gh CLI is not on PATH` | Install `gh` (`winget install GitHub.cli` on Windows; `brew install gh` on macOS; package manager on Linux; or download from https://cli.github.com). Then `gh auth login`. Surface the message to the user and STOP the loop — do not retry. | +| Script throws `prerequisite missing — gh CLI is not authenticated` | Run `gh auth login`. STOP the loop until the user completes auth. | +| Trigger fails or no `copilot_work_started` event lands | Push a substantive (non-whitespace) commit — auto-assign on `synchronize` is the most reliable trigger. Persistent failure indicates Copilot Code Review may not be enabled on the repo / account (check repo Settings → Code & automation → Copilot, or account-level Copilot Pro/Pro+). | +| No new review after waiting ~10 min | Quiet-period after recent dismissal or trivial-diff suppression. Push a substantive commit and retry. Do not blindly re-run `01-request-review.ps1` — it reports `InFlight` while Copilot is still a requested reviewer. | +| Outdated-but-unresolved threads in the open list | Expected: unresolved state is the source of truth. Reply + resolve them like any other open thread. `10-cleanup-outdated.ps1` is only a final safety net. | +| Unsure whether to fix or decline a finding | See [references/04-triage.md](references/04-triage.md). | +| Need a reply phrasing for "fixed", "declined", or "drift" | See the templates under [assets/](assets/) — [reply-fix.md](assets/reply-fix.md), [reply-decline.md](assets/reply-decline.md), [reply-drift.md](assets/reply-drift.md), [reply-partial.md](assets/reply-partial.md). | + +## References + +- [references/orchestration.md](references/orchestration.md) — + parent-owned loop control: time-boxing & extension protocol, + sub-agent delegation map, steps 1 / 7 / 10 contracts, + single-iteration fallback, and loop-wide notes. +- Per-step sub-agent contracts: + [references/02-wait.md](references/02-wait.md), + [references/03-list-threads.md](references/03-list-threads.md), + [references/04-triage.md](references/04-triage.md) (includes the + fix-vs-decline rubric), + [references/05-fix.md](references/05-fix.md), + [references/06-build-test.md](references/06-build-test.md), + [references/08-reply-resolve.md](references/08-reply-resolve.md), + [references/09-convergence.md](references/09-convergence.md). +- [references/api-quirks.md](references/api-quirks.md) — verified + GitHub API behavior, dead-ends, and the GraphQL traps for the + reviewer mutation. +- Reply templates (one per reply type): + [assets/reply-fix.md](assets/reply-fix.md) — accepted-fix + pattern; [assets/reply-decline.md](assets/reply-decline.md) — + declined-with-rationale pattern; + [assets/reply-drift.md](assets/reply-drift.md) — + PR-description / comment / test-plan drift acknowledgement; + [assets/reply-partial.md](assets/reply-partial.md) — + partial fix with deferred follow-up. Cross-cutting reply guidance + and anti-patterns live in + [references/08-reply-resolve.md](references/08-reply-resolve.md#reply-guidance). +- [scripts/_lib.ps1](scripts/_lib.ps1) — shared helpers (`Invoke-Gh`, + `Invoke-GhGraphQL`, `Resolve-RepoCoords`); dot-sourced by every + script. +- [scripts/01-request-review.ps1](scripts/01-request-review.ps1) — + trigger Copilot review and verify pickup via the + `copilot_work_started` event. +- [scripts/02-check-review-status.ps1](scripts/02-check-review-status.ps1) — + single-shot snapshot of the PR's Copilot review state; emits + `Converged: true` only when all three conditions hold. +- [scripts/03-list-open-threads.ps1](scripts/03-list-open-threads.ps1) — + every unresolved PR review thread from **all reviewers** (Copilot, + humans, github-advanced-security, etc.). +- [scripts/08-reply-and-resolve.ps1](scripts/08-reply-and-resolve.ps1) — + post a reply and resolve in one call. +- [scripts/10-cleanup-outdated.ps1](scripts/10-cleanup-outdated.ps1) — + safety net for outdated Copilot threads. diff --git a/skills/copilot-pr-autopilot/assets/reply-decline.md b/skills/copilot-pr-autopilot/assets/reply-decline.md new file mode 100644 index 000000000..2aa85a31f --- /dev/null +++ b/skills/copilot-pr-autopilot/assets/reply-decline.md @@ -0,0 +1,21 @@ +# Reply: declined with rationale + +Use when triage decided `decline`. The reply must explain WHY +declining is the right call — not just that you considered it. Always +resolve the thread after replying; an open thread with no reply +signals avoidance. + +``` +Considered this, but declining: . . Happy to revisit if . +``` + +Example (domain-neutral): + +> Considered extending the lock into the initialization path, but +> declining: initialization runs to completion before any concurrent +> caller can reach this code, so the race window only opens after +> the init callback has returned. Sharing the lock across modules +> costs more in coupling than the actual exposure justifies. Happy +> to revisit if telemetry shows a real interleaving. diff --git a/skills/copilot-pr-autopilot/assets/reply-drift.md b/skills/copilot-pr-autopilot/assets/reply-drift.md new file mode 100644 index 000000000..6b1aa817a --- /dev/null +++ b/skills/copilot-pr-autopilot/assets/reply-drift.md @@ -0,0 +1,19 @@ +# Reply: documentation / test-plan drift + +Use when Copilot points out the PR description, a comment, or the +test plan no longer matches the code. + +Pick ONE of the three forms below (delete the other two) and fill in +the bracketed placeholders: + +``` +Good catch — updated the PR description to match the implemented behavior: . +``` + +``` +Good catch — updated the comment in path/to/file to match the implemented behavior: . +``` + +``` +Good catch — updated the test plan to match the implemented behavior: . +``` diff --git a/skills/copilot-pr-autopilot/assets/reply-fix.md b/skills/copilot-pr-autopilot/assets/reply-fix.md new file mode 100644 index 000000000..8f52a43b6 --- /dev/null +++ b/skills/copilot-pr-autopilot/assets/reply-fix.md @@ -0,0 +1,25 @@ +# Reply: accepted fix + +Use after the loop has committed and pushed a fix for the finding. Cite +the pushed commit SHA from step 7. + +``` +. +. +Fixed in . +``` + +Example (language-neutral): + +> The lock did not cover the install side of the path, so two +> parallel writers could read the same baseline and clobber each +> other. Promoted the per-instance lock to a process-wide +> function-local static so all read-modify-write paths share it. +> Fixed in abc1234. + +When the fix is in a tested area, add a one-line test confirmation: + +> Replaced the platform UUID dependency with a PID + monotonic-clock +> + atomic counter so the test target no longer pulls in the +> platform UUID library. All 42 tests in the affected suite still +> pass. Fixed in abc1234. diff --git a/skills/copilot-pr-autopilot/assets/reply-partial.md b/skills/copilot-pr-autopilot/assets/reply-partial.md new file mode 100644 index 000000000..51b6c0eee --- /dev/null +++ b/skills/copilot-pr-autopilot/assets/reply-partial.md @@ -0,0 +1,9 @@ +# Reply: partial fix with deferred follow-up + +Use when the finding has both an immediate fix and a deeper +structural concern — address the immediate part now and acknowledge +the rest. + +``` +Fixed the immediate in . The broader would benefit from , which I'd prefer to land separately because . Tracking as . +``` diff --git a/skills/copilot-pr-autopilot/references/02-wait.md b/skills/copilot-pr-autopilot/references/02-wait.md new file mode 100644 index 000000000..7ef2c18fe --- /dev/null +++ b/skills/copilot-pr-autopilot/references/02-wait.md @@ -0,0 +1,49 @@ +# Step 2: Wait for review + +Sub-agent type: `general-purpose`; budget: **20-minute hard cap** (one +bounded sub-agent, NOT extension-driven). + +**Skipped** when the loop is in [single-iteration +mode](orchestration.md#single-iteration-fallback) — there's no Copilot +review to wait for. + +## Inputs + +From step 1: +- `PrNumber`. +- `baseline` — the `LatestCopilotReview.submittedAt` string captured + before the trigger fired (empty string if no prior Copilot review). + +## Return contract + +- `02-check-review-status.ps1` JSON snapshot. +- `recommendation` ∈ {`ready`, `give-up-push-commit`}. +- `ready` iff **both** `LatestCopilotReview.submittedAt > baseline` + AND `ReviewAtHead: true`. + +## Procedure + +Poll `02-check-review-status.ps1` approximately every **3 minutes** +until `ready` or the 20-minute cap is hit: + +```pwsh +pwsh ./scripts/02-check-review-status.ps1 -PrNumber +``` + +- Extract `submittedAt` and `ReviewAtHead` from the JSON each tick. +- Stop and return `ready` on the first tick that satisfies both + conditions vs. the captured `baseline`. +- On cap reached without `ready`, return `give-up-push-commit`. + +## Gotchas + +- **Don't poll faster than ~3 minutes.** There is no progress signal + from the API; faster polling only burns budget. +- **`give-up-push-commit` fallback is parent-driven.** When the + sub-agent returns this recommendation, the **parent** pushes a + substantive (non-whitespace) commit — auto-assign on `synchronize` is + the most reliable trigger. Then the parent re-enters the loop at + step 1 with a fresh `baseline`. +- **Single bounded run, not extension-driven.** Do not request + extensions on this step — if 20 min isn't enough, the right move is + the `give-up-push-commit` fallback, not more polling. diff --git a/skills/copilot-pr-autopilot/references/03-list-threads.md b/skills/copilot-pr-autopilot/references/03-list-threads.md new file mode 100644 index 000000000..b19ec03af --- /dev/null +++ b/skills/copilot-pr-autopilot/references/03-list-threads.md @@ -0,0 +1,63 @@ +# Step 3: List + categorize open threads + +Sub-agent type: `explore`; budget: 5 min. + +## Inputs + +- `PrNumber`. + +## Return contract + +Table of rows, one per open thread: + +``` +{ thread_id, file, line, author, author_class, severity, summary } +``` + +Where `author_class` ∈ `copilot` | `human-or-bot`, derived from the +raw `author.login` (see Gotchas). + +## Procedure + +Run the listing script: + +```pwsh +pwsh ./scripts/03-list-open-threads.ps1 -PrNumber +``` + +This returns every unresolved review thread from **all reviewers** +(Copilot, humans, `github-advanced-security`, other bots). The script +emits `Path` as one of: + +- `:` when the comment is anchored to a specific line + (e.g. `src/foo.js:42`) +- `` when the comment is file-level (no line anchor) +- `(pull request)` (literal sentinel string) when the comment has no + file path at all (PR-level / general review comments) +- `(pull request):` (rare) when the comment has no file path but + does carry a line number — GitHub occasionally emits this shape for + diff-anchored PR-level comments; the line is kept so the anchor + context isn't dropped + +Callers should split on the last `:` **only when the suffix parses as +an integer**, treat `Path` as the file alone otherwise, and treat +strings starting with `(pull request)` as the PR-level sentinel. For each row, +classify the `author`: + +- `copilot-pull-request-reviewer` or + `copilot-pull-request-reviewer[bot]` → `author_class: copilot` +- everything else → `author_class: human-or-bot` + +Pass the classified table to step 4 — the triage rubric depends on it. + +## Gotchas + +- **The `[bot]` suffix appears on some surfaces.** Match BOTH + `copilot-pull-request-reviewer` AND + `copilot-pull-request-reviewer[bot]` — they're the same actor. +- **Default human / advanced-security threads to `escalate-to-user` in + step 4.** Classification here just flags them; triage applies the + policy. See [04-triage.md](04-triage.md). +- **Unresolved is the source of truth.** Outdated-but-unresolved + threads still show up — that's correct. Don't filter them out; + they're handled like any other open thread in step 8. diff --git a/skills/copilot-pr-autopilot/references/04-triage.md b/skills/copilot-pr-autopilot/references/04-triage.md new file mode 100644 index 000000000..62d3aa2bc --- /dev/null +++ b/skills/copilot-pr-autopilot/references/04-triage.md @@ -0,0 +1,190 @@ +# Step 4: Triage + +Sub-agent type: `general-purpose`; budget: **5 min per ≤5 threads** +(parent batches in waves of ≤5 if step 3 returned more). + +## Inputs + +From step 3: +- Classified thread table — `{ thread_id, file, line, author, + author_class, severity, summary }` per open thread. +- Original PR context (description, recent commits to the affected + files) for re-deriving from principles in oscillation cases. + +## Return contract + +Table of rows, one per thread: + +``` +{ thread_id, action, rationale } +``` + +Where `action` ∈ `fix` | `decline` | `escalate-to-user` and +`rationale` is a single line citing the rule from the rubric below +that fired. + +## Procedure + +For each thread, apply the rubric in this file in order: + +1. **Reviewer-type policy** — `human-or-bot` threads default to + `escalate-to-user` unless the user explicitly scoped them in. +2. **ROI vs Risk** — score and decide for `copilot` threads. +3. **Fix / Decline rule lists** — match the finding pattern. +4. **Project-specific policy hooks** — research repo conventions + before applying a generic "fix". +5. **Conflicting-comments resolution** — if the finding flips a + prior round's edit, apply the oscillation rules; if the agent is + about to re-do an edit it already reverted, hard-stop and escalate. +6. **Escalation rules** — escalate on design-level tradeoffs, + large/cross-cutting changes, or high-risk/irreversible actions. + +Return `{ thread_id, action, rationale }` per thread to the parent; +step 5 consumes only the `fix` rows, step 8 consumes the full table. + +## Gotchas + +- **Batch in waves of ≤5 per sub-agent invocation.** Larger batches + blur the rationale and overrun the 5-min budget. +- **Human / advanced-security default is `escalate-to-user`.** + Auto-replying or auto-resolving a human review thread hides + unaddressed concerns and is socially wrong. +- **Cite the rule that fired in `rationale`** — one line, concrete. + "Style nit per rubric §Decline" is fine; "looks small" is not. +- **The oscillation hard-stop is non-negotiable.** If the agent is + about to make an edit it already reverted in a prior round, that's + the unambiguous signature of an oscillation loop — escalate. + +--- + +# Triage Rubric + +Decision rubric for whether to fix or decline each Copilot finding. +The goal is correctness, not appeasement — decline confidently when +warranted. + +## ROI vs Risk + +Score every proposed change on two axes, then decide: + +- **ROI** = value of the fix MINUS the cost to implement MINUS the + user's cost to review. +- **Risk** = blast radius, reversibility, effect on unrelated code. + +| ROI | Risk | Action | +|---|---|---| +| Clear positive | Low | Fix unilaterally. | +| Marginal | Low | Fix if cheap; otherwise decline with rationale. | +| Clear positive | High / Irreversible | Propose to the user first. | +| Marginal | High | Decline. | +| Negative (over-engineering for hypothetical) | Any | Decline. | + +## Reviewer-type policy (foundation, not a nit) + +Each thread's `author` (returned by `03-list-open-threads.ps1`) drives +who can decide it: + +| Reviewer | Default action | +|----------|----------------| +| `copilot-pull-request-reviewer` / `copilot-pull-request-reviewer[bot]` | Loop-owned — triage with the rubric below. (`03-list-open-threads.ps1` reports the raw `author.login`, which may carry the `[bot]` suffix on some surfaces; match both forms.) | +| Human reviewer | **Default `escalate-to-user`** unless the user explicitly scoped them into the loop. Auto-replying or auto-resolving a human thread can hide unaddressed concerns and is socially wrong. | +| `github-advanced-security` / other automated bots | **Default `escalate-to-user`** unless the project has a documented suppression / fix convention you can follow. | + +## Fix when the finding is... + +- A **real correctness bug**: use-after-free / lifetime violation, + race that drops user intent, gating logic that skips legitimate + transitions, missing link dependency, off-by-one, null deref, + unhandled error path. +- A **cross-cutting concern with a clean local fix**: moving a mutex + one scope up, pulling a duplicated check into a helper. +- **Documentation / test-plan drift**: PR description claims behavior + the code no longer matches; comment block describes the wrong + thing; test-plan checkbox is inverted vs. the code. + +## Decline when the finding is... + +- A **purely hypothetical race** requiring cross-class plumbing, + where actual exposure is negligible. Cite the interleaving you + ruled out. +- **Style, naming, or formatting**. Out of scope for the review loop. +- **Suggestions to add abstractions** ("introduce a strategy pattern", + "extract an interface") that don't pay for themselves at the + current scale of the codebase. +- **Suggestions that contradict an established project convention** — + consistency with surrounding code is usually more valuable than the + suggestion in isolation. +- **Micro-optimizations** in code that is not on a hot path. + +## Project-specific policy hooks + +Some findings are decided by **project policy**, not by general +correctness reasoning. Before applying a generic "fix", **research +the repo's own conventions first** — `.github/instructions/*.md` +files (often have an `applyTo` glob that pins them to the changed +file's path), `.github/skills/`, `AGENTS.md`, `CONTRIBUTING.md`, CI +config, and recent commits to similar files. Fan out multiple +`explore` sub-agents when several axes need checking (lint, format, +spell-check, license header, etc.) — don't invent answers. What +looks like an obvious fix may violate a project rule: + +- **Spell-check / dictionary findings.** If the project uses a + spell-checker (`check-spelling`, `cspell`, `typos`, or similar), + inspect its config and recent commits to learn the local + convention (reword the document, add a pattern/regex, extend a + dictionary/allowlist, use an inline ignore). Follow that + convention; don't invent a new mechanism. +- **Lint suppressions / inline ignore directives.** Most projects + require an inline rationale comment. Bare suppressions get pushed + back. +- **License headers / file boilerplate.** Project-specific format, + often CI-enforced — copy the existing header from a neighbor file. +- **Test framework, mock library, formatter choices.** Follow the + convention of the surrounding test files; never introduce a new + framework because Copilot suggested one. + +Cite the project's config file or convention in your reply. + +## Conflicting comments — break oscillation early + +Failure mode: round N "fixes" what comment A asked for; round N+1 +comment B objects and asks to revert it. Blindly flip-flopping +ships oscillation and burns rounds. + +Resolution rules — apply in order, stop at the first that fires: + +1. **Re-derive from principles.** Pick the side that wins on the + function's contract + surrounding patterns + user's stated + preferences — not the latest comment. +2. **Prefer the position with the explicit rationale.** Concrete + failure modes (security, correctness, data loss, perf) win over + stylistic positions. +3. **Prefer human comments over bot comments** when they directly + conflict. +4. **If still ambiguous, escalate to the user.** Summarize both + positions side-by-side with a recommendation; do not silently + pick. + +**Hard stop**: if you are about to make the same edit you +*reverted* in an earlier round, stop and escalate. That is the +unambiguous signature of an oscillation loop. + +## Reply hygiene + +Every reply — fix or decline — states the reasoning. This makes the +PR self-documenting and gives the next review visible context. +Declined findings cite the prior round and explain why the existing +form is correct, so the next reviewer doesn't raise it again. + +## Escalation rules + +Stay in autopilot by default; escalate to the user when: + +- The finding identifies a **design-level tradeoff** with multiple + reasonable resolutions and no clear winner. +- The fix would be **large or cross-cutting** (hundreds of lines, + new architecture, refactor across modules). +- The action is **high-risk or irreversible** (force-push, deletion, + credentials, production). + +Otherwise, decide and proceed. diff --git a/skills/copilot-pr-autopilot/references/05-fix.md b/skills/copilot-pr-autopilot/references/05-fix.md new file mode 100644 index 000000000..fb2db5e47 --- /dev/null +++ b/skills/copilot-pr-autopilot/references/05-fix.md @@ -0,0 +1,59 @@ +# Step 5: Apply fixes + +Sub-agent type: `general-purpose`, **one sub-agent per finding**, +parallel **max 5 concurrent**; budget: 5 min each. If step 4 returned +more than 5 `fix` rows, the parent runs step 5 in waves of ≤5. + +## Inputs + +Per sub-agent (one finding per invocation): +- `thread_id`, `file`, `line` from step 3. +- The finding `summary` and Copilot's suggested fix (if any). +- The triage `rationale` from step 4 — the agent already decided this + is a `fix`; this sub-agent only implements it. + +## Return contract + +``` +{ thread_id, files_touched, summary, status } +``` + +Where `status` ∈ `complete` | `partial` | `blocked` and `summary` is +a one-line description of the change. + +## Procedure + +1. **Discover repo conventions for the area being edited — first.** + Before writing any code, read: + - `.github/instructions/*.md` whose `applyTo` glob matches the + file's path, + - `.github/skills/`, + - `AGENTS.md`, + - `CONTRIBUTING.md`, + - neighbor-file patterns in the same directory and recent commits + touching similar files. +2. Apply the fix in line with those conventions. +3. Return `files_touched` + a one-line `summary` + `status`. + +## Gotchas + +- **Max 5 concurrent fix sub-agents.** The cap prevents fix-fanout + chaos; the parent merges results and reconciles file conflicts + between waves before step 6. +- **Never invent a generic answer that contradicts repo practice.** + That's the "elephant in school" anti-pattern — a Copilot suggestion + in isolation looks right but breaks the project's lint, format, + spell-check, license-header, or framework conventions. The + discovery step is mandatory, not optional. +- **One finding per sub-agent.** Fixes that need to touch the same + file get serialized by the parent between waves — don't merge + multiple findings into one sub-agent invocation. +- **Project policy beats Copilot suggestion.** If discovery surfaces + a documented convention (spell-check allowlist mechanism, lint + suppression style, etc.) that contradicts the suggested fix, follow + the convention and reflect that in the reply body drafted in step 8 + (see [04-triage.md](04-triage.md#project-specific-policy-hooks)). +- **Push back with written rationale** if implementing the fix would + over-engineer the design for a hypothetical edge case — flip the + triage to `decline` and return `status: blocked` with the rationale + so step 8 drafts the right reply. diff --git a/skills/copilot-pr-autopilot/references/06-build-test.md b/skills/copilot-pr-autopilot/references/06-build-test.md new file mode 100644 index 000000000..308e5b0ff --- /dev/null +++ b/skills/copilot-pr-autopilot/references/06-build-test.md @@ -0,0 +1,59 @@ +# Step 6: Build + test per repo conventions + +Sub-agent type: `task` (may fan out to several `explore` sub-agents in +parallel for discovery); budget: 10 min (extension cap up to 2× for +slow suites). + +## Inputs + +- The set of files touched in step 5 (from each fix sub-agent's + `files_touched`). +- Whatever the parent has cached from prior rounds about the repo's + build / test / lint command set. + +## Return contract + +``` +{ status, failures } +``` + +Where `status` ∈ `pass` | `fail` and `failures` is the relevant +excerpt from the failing tool's output (build errors, test failures, +lint diagnostics) — enough for the parent to decide whether to loop +back to step 5 for a follow-up fix or push as-is. + +## Procedure + +**Discovery first** — read and combine: + +- `.github/instructions/*.md`, +- `AGENTS.md`, +- `CONTRIBUTING.md`, +- `README.md`, +- `package.json` scripts, +- `Makefile`, +- language-specific tooling configs, +- AND recent CI workflow runs (`gh run list`, `gh run view`) to learn + the *actual* command set in use. + +THEN run those exact commands on the changed code. Independent +discovery axes (build tool / test runner / lint / spelling / format) +can be dispatched as separate `explore` sub-agents in parallel; cache +the discovered commands per round so re-runs don't re-discover. + +## Gotchas + +- **Never invent generic build commands.** A broken build wastes the + next full review cycle (3–10 min). If discovery turns up nothing, + surface the gap — don't guess. +- **Respect repo-specific spell-check / lint / format policies.** + Some repos prefer rewording over allowlist entries; some have a + patterns/regex file; some accept inline-ignore directives. Inspect + the repo's existing config and recent commits before applying a + generic fix. +- **Cache discovered commands per round, not per loop.** Repo configs + can change between rounds (a fix may add a new lint), so re-discover + at the start of each round, but reuse within the round. +- **Failures route back to step 5.** When `status: fail`, the parent + re-enters step 5 with the failure excerpts as a new finding — don't + push a broken build to satisfy step 7. diff --git a/skills/copilot-pr-autopilot/references/08-reply-resolve.md b/skills/copilot-pr-autopilot/references/08-reply-resolve.md new file mode 100644 index 000000000..9fe334eaf --- /dev/null +++ b/skills/copilot-pr-autopilot/references/08-reply-resolve.md @@ -0,0 +1,103 @@ +# Step 8: Reply (always) + resolve (conditional) + +Sub-agent type: `general-purpose` **drafts** the reply bodies; the +**parent** posts them (mutations stay parent-owned). Budget for the +drafting sub-agent: 5 min. + +Runs AFTER step 7 (commit + push) so every reply can cite the +**pushed commit SHA**. + +## Inputs + +- The full triage table from step 4 — `{ thread_id, action, + rationale }` per open thread (including `escalate-to-user`). +- The pushed `HeadOid` from step 7. +- The per-thread fix `summary` and `files_touched` from step 5 (for + `fix` rows). + +## Return contract + +One row per open thread: + +``` +{ thread_id, action, reply_body } +``` + +Where `action` ∈ `fix` | `decline` | `escalate-to-user`. The parent +consumes this to drive the resolve/no-resolve decision (see +Procedure). + +## Procedure + +1. **Drafting sub-agent** produces a `reply_body` per thread by + selecting the appropriate template (see [#templates](#templates)) + based on the triage `action`. Cite the pushed SHA from step 7 in + `fix` replies. For `escalate-to-user`, explain the disposition and + the open question for the human merge owner; do not promise a + resolve. +2. **Parent posts each reply**, choosing whether to resolve: + + ```pwsh + pwsh ./scripts/08-reply-and-resolve.ps1 -ThreadId -Body + ``` + + - `action ∈ { fix, decline }` → run as above (resolve happens). + - `action == escalate-to-user` → **add `-NoResolve`** so the thread + stays open for the human: + + ```pwsh + pwsh ./scripts/08-reply-and-resolve.ps1 -ThreadId -Body -NoResolve + ``` + +## Gotchas + +- **Reply to every open thread; resolve only when the loop owns the + disposition** (`fix` or `decline`). Resolving without a reply + leaves no record of why the issue was considered addressed. +- **Escalated threads stay open *with our reply* explaining the + disposition.** They're explicit hand-offs to the human merge + owner, not loop failures — that's why convergence in step 9 can + succeed with `OpenThreadCount > 0`. +- **Mutations are parent-owned.** The sub-agent only drafts; it never + posts. This keeps the audit trail of mutations on the parent and + avoids double-post races between concurrent sub-agents. +- **Cite the pushed SHA, not a local commit.** Step 7's recorded + `HeadOid` is the only SHA reviewers can browse to. +- **Reply hygiene matters for the next round.** Declines that don't + cite reasoning get re-raised by the next Copilot review. See + [04-triage.md](04-triage.md#reply-hygiene). + +## Templates + +Pick by triage action: + +| Triage action | Template | +|---------------|----------| +| `fix` | [reply-fix.md](../assets/reply-fix.md) | +| `decline` | [reply-decline.md](../assets/reply-decline.md) | +| PR-description / comment drift acknowledgement | [reply-drift.md](../assets/reply-drift.md) | +| Partial fix with deferred follow-up | [reply-partial.md](../assets/reply-partial.md) | + +For `escalate-to-user`, there is no template — write a bespoke reply +explaining the disposition and the open question, then post with +`-NoResolve` so the thread stays open. + +## Reply guidance + +The reply has to do real work — it documents the decision for future +maintainers and shapes what the next Copilot review will surface. + +Be **concrete** (cite file paths, commit SHAs, function names), +**direct** (no hedging when you have a position), and **brief** (2–4 +sentences is typical). Long replies usually mean the round should +have been broken up. + +### Anti-patterns — DO NOT use + +- ❌ `"Thanks!"` / `"Good point."` with no substance. +- ❌ `"Will fix later."` Either fix it now or decline with rationale; + deferred fixes that aren't tracked anywhere get lost. +- ❌ Resolve-without-reply. The next reviewer cannot reconstruct why + the thread was closed. +- ❌ `"I disagree."` with no reasoning. State the actual technical + disagreement. \ No newline at end of file diff --git a/skills/copilot-pr-autopilot/references/09-convergence.md b/skills/copilot-pr-autopilot/references/09-convergence.md new file mode 100644 index 000000000..27b028437 --- /dev/null +++ b/skills/copilot-pr-autopilot/references/09-convergence.md @@ -0,0 +1,101 @@ +# Step 9: Convergence verify + +Sub-agent type: `explore`; budget: 3 min. + +## Inputs + +- `PrNumber`. +- The pushed `HeadOid` from step 7 (for the independent sanity check). +- Whether the loop is in normal mode or [single-iteration + mode](orchestration.md#single-iteration-fallback) (decided at step 1). + +## Return contract + +``` +{ converged, head_oid, latest_review_commit_oid, submitted_at, + open_thread_count, open_threads_awaiting_reply, escalated_threads } +``` + +`converged` is the single source-of-truth boolean — `Converged: true` +returned by `02-check-review-status.ps1`. + +## Procedure + +Run the status check, passing `-SingleIteration` iff the loop took the +fallback at step 1: + +```pwsh +pwsh ./scripts/02-check-review-status.ps1 -PrNumber +# single-iteration variant: +pwsh ./scripts/02-check-review-status.ps1 -PrNumber -SingleIteration +``` + +Then run an **independent HEAD-vs-`LatestCopilotReview.commitOid` +sanity check** — the parent's recorded `HeadOid` from step 7 should +match `HEAD` and (in normal mode) match the latest review's +`commitOid`. + +## Decision: loop back or exit + +After the status check, the parent agent **must** branch on `converged`: + +``` +if converged == true: + run step 10 once (cleanup outdated) + call task_complete with proof (HeadOid, LatestCopilotReview.commitOid, submittedAt) + DONE — exit the loop +else: + GO BACK TO STEP 1 — start another round + (re-trigger via 01-request-review.ps1, wait via 02-wait, + list via 03-list-threads, triage, fix, push, reply+resolve, + re-check via this step) +``` + +A non-converged result is **never** terminal. Each round addresses +Copilot's findings on the previous round's HEAD; the loop terminates +only when Copilot has nothing new to say AND every open thread has a +reply from the agent. There is no "max rounds" cap built into the +scripts — the parent agent SHOULD apply a sane cap (e.g., 10 rounds) +and escalate to the user if the loop is oscillating (same finding +re-raised across rounds — see [04-triage.md](04-triage.md#conflicting-comments--break-oscillation-early)). + +`-SingleIteration` mode is the **one** exception: by definition, it +runs one round only (the trigger path is unavailable), and the +`converged` result is taken as terminal whichever way it goes. + +### Convergence semantics + +`02-check-review-status.ps1` implements a PR-state guard plus three Converged branches (see the `Converged = if (...)` block near the end of that script for the canonical source): + +- **PR State guard (overrides everything)** — if `State != 'OPEN'` (CLOSED / MERGED), `Converged: false` regardless of all other flags. The agent cannot push to a non-OPEN PR; surface the state change to the user and abort the loop rather than calling `task_complete`. +- **Normal (Copilot-driven) mode** — a Copilot review exists OR `CopilotPending: true`: + `Converged: true` iff + `ReviewAtHead && NoNewComments && OpenThreadsAwaitingReply == 0`. +- **Single-iteration mode** (`-SingleIteration` passed because the loop took the [fallback at step 1](orchestration.md#single-iteration-fallback)): + `Converged: true` iff `OpenThreadsAwaitingReply == 0`. The stale-review checks can never advance without a new Copilot review, so they're omitted. +- **No Copilot review ever observed AND not pending** (brand-new PRs with zero findings, or PRs where the trigger silently failed and the script wasn't called with `-SingleIteration`): + `Converged: true` iff `OpenThreadsAwaitingReply == 0`. **Do NOT trust this as "loop done" before step 1 has fired** — it just means there's no human-thread work pending. The parent agent MUST run `01-request-review.ps1` first (per [orchestration.md step 1](orchestration.md#step-1-request-review)) and re-check; treating brand-new-PR convergence as terminal will short-circuit the entire loop. + +`OpenThreadCount` MAY be `> 0` when escalated-to-user threads stay +open — that's an explicit human hand-off, not a loop failure. Return +the list of escalated `thread_id`s so the parent can include them in +the convergence proof. + +## Gotchas + +- **Trust `02-check-review-status.ps1`'s `Converged` flag, not your + own re-derivation.** The script enforces all three conditions + (normal mode) or the simplified condition (single-iteration) and + is the canonical source. +- **Don't call `task_complete` until `converged == true`.** Print + the proof (`HeadOid`, `LatestCopilotReview.commitOid`, + `submittedAt`, `OpenThreadsAwaitingReply: 0`, list of escalated + threads if `OpenThreadCount > 0`) in the completion message. +- **`-SingleIteration` is sticky to the fallback decision.** If + step 1 took the fallback, every step 9 in this loop uses + `-SingleIteration`; don't flip it mid-loop. +- **PR State != OPEN aborts the loop.** If `State` is `CLOSED` or + `MERGED`, `Converged` is forced `false` by the script's state + guard. The parent agent cannot push to a non-OPEN PR — surface + the state change to the user and stop the loop rather than + retrying or calling `task_complete`. diff --git a/skills/copilot-pr-autopilot/references/api-quirks.md b/skills/copilot-pr-autopilot/references/api-quirks.md new file mode 100644 index 000000000..3fc182f71 --- /dev/null +++ b/skills/copilot-pr-autopilot/references/api-quirks.md @@ -0,0 +1,221 @@ +# GitHub API Quirks (Verified) + +API behaviors that matter for the Copilot review loop. All verified +against the current API surface — read this before reaching for an +alternative API or modifying the bundled scripts. + +## GraphQL trigger — `requestReviewsByLogin` is the supported path + +```graphql +mutation($p: ID!) { + requestReviewsByLogin(input: { + pullRequestId: $p, + botLogins: ["copilot-pull-request-reviewer"] + }) { + pullRequest { number } + } +} +``` + +Verified empirically against personal repos without Copilot Pro AND +org repos with Copilot Enterprise. Works for both initial-add and +re-request (no special re-request mutation). + +Three GraphQL traps: + +1. Mutation is **`requestReviewsByLogin`**, NOT `requestReviews`. + `RequestReviewsInput` (used by `requestReviews`) does not expose a + `botLogins` field, so it can't request a bot reviewer at all — + `botLogins` is the central field on `requestReviewsByLogin`. +2. Field is **`botLogins`**, NOT `userLogins`. The latter returns + `Could not resolve user with login 'Copilot'`. +3. Slug is **`copilot-pull-request-reviewer`** (the App slug). The + display login `Copilot` returns `Could not resolve bot with slug + 'Copilot'`. + +Verify success via a new `copilot_work_started` event on the issue's +events feed — `GET /repos/{o}/{r}/issues/{n}/events` (see SKILL.md +Gotchas "HTTP 200 / exit 0 is NOT proof"). Empirically this event +type IS exposed on the `/events` endpoint (verified across 20+ +trigger rounds on PR 236); it is not timeline-only. +`01-request-review.ps1` enforces this by comparing the event `id` +(monotonic) before and after the trigger. + +### Other trigger paths — DO NOT USE + +- **`requestReviews` with `botLogins`** → input type rejects the + field. Don't try variants. +- **REST `POST /pulls//requested_reviewers` with + `reviewers[]=Copilot`** → REST cannot route bot-reviewer requests. + In live testing this surfaced two ways depending on context: + (a) **HTTP 404** when GitHub treats `Copilot` as a User lookup + miss in the repo context (this is what `01-request-review.ps1` + inline-comment documents); and (b) **HTTP 201** with the bot + silently dropped from the response's `requested_reviewers[]`. + The REST `reviewers` field is type=User only and rejects bots + even when the login resolves to a Bot record. Not used by the + script. +- **`gh pr edit --add-reviewer Copilot`** → returns `'Copilot' not + found` on current `gh`. Not used by the script. + +## GraphQL `latestReviews` — stale cache, do NOT use + +```graphql +# DO NOT — stale projection: +pullRequest(number:$pr){ latestReviews(first:50){ nodes{...} } } + +# USE INSTEAD — always current: +pullRequest(number:$pr){ reviews(last:100){ nodes{...} } } +``` + +`latestReviews` is a "latest per user" projection with stale-cache +behavior: a fresh Copilot review can be absent for several minutes +after submission, while `reviews(last:100)` reflects it immediately. +Using `latestReviews` for in-flight or convergence checks causes the +script to operate on an obsolete commit OID — either falsely +declaring convergence or timing out for a review that already +exists. + +`02-check-review-status.ps1` uses `reviews(last:100)` filtered +client-side to the Copilot reviewer login. It also emits a stderr +warning when the result is exactly 100 reviews, so the caller knows +the boundary was hit and the latest Copilot review COULD be older +than the window — practically only possible if 100+ non-Copilot +reviews landed after the last Copilot review, which doesn't happen +in normal use. If you ever see the warning and the loop misbehaves, +fetch the full review list manually: + +```bash +gh pr view --json reviews --jq '.reviews[] | select(.author.login | test("copilot-pull-request-reviewer"))' +``` + +### Tie-break for multiple Copilot reviews + +When more than one Copilot review shares the same `submittedAt` +(rare server-side clock collision under burst re-triggers), the +script first prefers the review whose `commit.oid == HEAD`, then +falls back to a stable sort. The intent is "the review that +matches the current code is the one the agent should reply to" — +preventing a stale-OID review from winning the tie and falsely +flipping `ReviewAtHead` to false. + +## Reply + resolve mutations — both work + +```graphql +mutation($tid: ID!, $body: String!) { + addPullRequestReviewThreadReply(input: { + pullRequestReviewThreadId: $tid, + body: $body + }) { comment { id } } +} + +mutation($tid: ID!) { + resolveReviewThread(input: { threadId: $tid }) { + thread { isResolved } + } +} +``` + +## `isOutdated` ≠ `isResolved` — current unresolved state is truth + +A thread can be `isOutdated: true` (Copilot's comment points at lines +that have since changed) while still `isResolved: false`. These +threads: + +- Still need reply + resolve in the per-round loop. A thread can + become outdated mid-round when your own fix shifts the cited + lines. Filtering on `!isOutdated` would silently drop those + threads, leaving the PR's open-conversations list non-empty even + after the underlying code is fixed. +- `03-list-open-threads.ps1` therefore lists every unresolved + thread with no `isOutdated` filter. +- `10-cleanup-outdated.ps1` is a safety net only — for the rare + case where a thread becomes outdated AFTER your last per-round + fetch. + +## Review latency — don't poll faster than ~3 min + +Copilot reviews typically post 3–6 minutes after the request, +occasionally up to ~10 minutes. There is no progress signal; +polling more often than every ~3 min wastes API budget without +making the review arrive sooner. + +## `gh api graphql -F` coerces strings — use `-f` for `String!` + +The `gh` CLI distinguishes its two flag forms: + +- `-F key=value` — type inference. Values parsing as int, bool, or + null are sent as that JSON literal. +- `-f key=value` — always sends as raw string. + +For any GraphQL variable declared `String!` (e.g. `owner`, `repo`, +`body`, `tid`, `after`), use **`-f`** at call sites. A reply body that +happens to be `"true"`, `"null"`, or all digits would otherwise be +coerced and the call fails with a type error. Keep `-F` only for +genuinely numeric or boolean variables (e.g. `pr: Int!`). + +> Note: the shared `Invoke-Gh` wrapper may internally rewrite +> `-f field=` into `-F field=@` when the body contains +> embedded `"` (Windows PowerShell 5.1 native-arg quoting bug — see +> below). Even via `@file`, `-F` still applies type inference to the +> file content (gh's documented behaviour) — this rewrite is safe +> only because the rewrite trigger ("body contains `"`") guarantees +> the content is a string that no JSON literal (`123`, `true`, +> `null`, etc.) would match. Treat this `-F ...=@file` usage as an +> internal transport detail of the wrapper, not as permission to +> use `-F ...=@file` for arbitrary strings at call sites. + +```powershell +# Wrong — body could be coerced AND, under Windows PowerShell 5.1, +# any embedded `"` in $Body will be mis-split by the native-arg +# passer (gh sees a truncated body or a "received N args" error). +gh api graphql -f query=$q -F body=$Body + +# Right — go through Invoke-Gh / Invoke-GhGraphQL. The shared helper +# auto-rewrites `-f field=` and `-F field=` pairs whose +# body contains `"` to `-F field=@` so the value is read +# from disk and never appears on the command line. This works +# identically on Windows PowerShell 5.1 and PowerShell 7+. +Invoke-GhGraphQL -GhArgs @('-f',"query=$q",'-f',"body=$Body") -Context 'reply body' +``` + +Calling `gh` directly (e.g. via `& gh ...` or raw `gh api graphql`) +bypasses the cross-version tempfile rewrite — if your value contains +`"` you'll re-introduce the PowerShell-5.1-only splitting bug. Always +funnel `gh` calls through `Invoke-Gh` / `Invoke-GhGraphQL`. + +## Native `gh` exit codes bypass `$ErrorActionPreference` + +`gh` is a native executable, not a PowerShell cmdlet, so a non-zero +exit does **not** throw even when `$ErrorActionPreference = 'Stop'`. +Without an explicit check the script will print misleading success +messages after a failed API call, and the loop will falsely declare +convergence on auth issues, rate limits, or transient 5xx. + +Additional trap: `gh api graphql` can exit 0 for an HTTP 200 whose +JSON body carries a top-level `errors` array. Treat that as a failed +call too. + +The shared helpers in [scripts/_lib.ps1](../scripts/_lib.ps1) +(`Invoke-Gh` and `Invoke-GhGraphQL`) run `gh` via +`[System.Diagnostics.Process]` with `ProcessStartInfo.EnvironmentVariables` +set per-process (the legacy StringDictionary; the newer .NET Core-only +`.Environment` property is avoided so the helper still works on +Windows PowerShell 5.1 / .NET Framework). The parent's `$env:GH_HOST` +is never mutated — safe for concurrent runspaces. Stdout and stderr +are drained concurrently +via `StandardOutput.ReadToEndAsync()` / `StandardError.ReadToEndAsync()` +so neither pipe deadlocks the child, and the helper returns +`{ExitCode, Stdout, Stderr}`. `Invoke-GhGraphQL` additionally parses +the GraphQL `errors` array on the response body and throws on either +failure mode. All bundled scripts dot-source `_lib.ps1` and use these +wrappers — do the same in any new script. + +## `git stash push` argument order + +```bash +git stash push -m "local-build" -- src/path/a src/path/b # correct +git stash push -- src/path/a src/path/b -m "local-build" # SILENTLY drops -m +``` + +The `-m` MUST come before the `--` path separator. diff --git a/skills/copilot-pr-autopilot/references/orchestration.md b/skills/copilot-pr-autopilot/references/orchestration.md new file mode 100644 index 000000000..7d44e3a3a --- /dev/null +++ b/skills/copilot-pr-autopilot/references/orchestration.md @@ -0,0 +1,184 @@ +# Orchestration — Parent-Owned Loop Control + +Cross-cutting protocol for the Copilot PR review loop: time-boxing, +the sub-agent delegation map, the parent-owned steps (1, 7, 10), the +single-iteration fallback, and the loop-wide notes. Each per-step +sub-agent contract lives in its own `NN-*.md` file alongside this one. + +Build, test, and lint commands are NOT prescribed here. Every step +that needs them defers to the target repo's own conventions +(`CONTRIBUTING.md`, `AGENTS.md`, `README`, `package.json` / +`Makefile` / language tooling, etc.). Discover and follow the repo's +existing practice — never invent build commands. + +## Time-boxing & extension protocol + +| Concept | Rule | +|---------|------| +| Default budget | 5 minutes per sub-agent invocation | +| Sub-agent must return | `status` ∈ {`complete`, `partial`, `blocked`} + `next_action` + `needs_extension_minutes` (0 if none). Always summarize progress before the budget expires — never silently overrun. | +| Extension | parent only extends when `status: partial` AND `next_action` is concrete; sends `write_agent "continue for N min"` with `N = min(needs_extension_minutes, 10)` | +| Extension cap (default) | 2 extensions per step; step 6 (build/test) up to 2× for slow suites. Step 2 (wait) is a single bounded sub-agent — see [02-wait.md](02-wait.md) — not extension-driven. | +| Parent never blocks | step 1 (request), step 7 (commit + push), step 8 reply/resolve mutations, and the `task_complete` decision stay in the parent | + +When the cap is reached and the work is still `partial`, the parent +narrows the input (batch smaller in step 4 / split fix scope in step 5) +or takes the step over itself. + +## Sub-agent delegation map + +> **The loop:** one **round** = steps 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9. After step 9, if `Converged: false`, **go back to step 1** for another round. Repeat until step 9 returns `Converged: true`; then run step 10 once and exit. See [09-convergence.md](09-convergence.md) for the convergence definition and the loop-exit / loop-back decision. + +Canonical order per round: **request → wait → list → triage → fix → +build → commit + push → reply + resolve (citing pushed SHA) → +convergence check**. Reply/resolve runs AFTER push so replies can cite +the pushed commit SHA. + +| Step | Owner | Contract | +|------|-------|----------| +| 1 — Request review | parent | this file, [#step-1-request-review](#step-1-request-review) | +| 2 — Wait for review | sub-agent (`general-purpose`, 20 min) | [02-wait.md](02-wait.md) | +| 3 — List + categorize open threads | sub-agent (`explore`, 5 min) | [03-list-threads.md](03-list-threads.md) | +| 4 — Triage | sub-agent (`general-purpose`, 5 min/≤5 threads) | [04-triage.md](04-triage.md) | +| 5 — Apply fixes | sub-agents (`general-purpose`, parallel max 5, 5 min each) | [05-fix.md](05-fix.md) | +| 6 — Build + test per repo conventions | sub-agent (`task` + `explore`, 10 min) | [06-build-test.md](06-build-test.md) | +| 7 — Commit + push | parent | this file, [#step-7-commit-and-push](#step-7-commit-and-push) | +| 8 — Reply (always) + resolve (conditional) | sub-agent drafts → parent posts | [08-reply-resolve.md](08-reply-resolve.md) | +| 9 — Convergence verify | sub-agent (`explore`, 3 min) | [09-convergence.md](09-convergence.md) | +| 10 — Cleanup outdated (post-convergence, once) | parent | this file, [#step-10-cleanup-outdated](#step-10-cleanup-outdated) | + +## Step 1: Request review + +Sub-agent type: _(parent — no sub-agent)_; budget: n/a. + +**Inputs** +- `PrNumber` for the target PR. + +**Return contract** +- Captured `baseline` = `LatestCopilotReview.submittedAt` string (or empty) + to be passed to step 2. +- Boolean `single_iteration_mode` — `true` if the trigger failed because + Copilot isn't a valid reviewer; `false` otherwise. + +**Procedure** + +1. Snapshot first to learn whether Copilot is already pending: + + ```pwsh + $snap = pwsh ./scripts/02-check-review-status.ps1 -PrNumber + $baseline = if ($snap -match '"submittedAt":"([^"]+)"') { $Matches[1] } else { '' } + $pending = ($snap -match '"CopilotPending":true') + ``` + + Regex on raw JSON keeps `submittedAt` a string across the + parent → sub-agent boundary on any PS version (5.1 / 7.x), avoiding + `[datetime]` rebinding. + +2. **If `$pending`** — skip the trigger; jump to step 2 with `baseline`. + +3. **Else** — fire the trigger: + + ```pwsh + pwsh ./scripts/01-request-review.ps1 -PrNumber + ``` + + The script keeps its own `InFlight` short-circuit as a safety net, + but the canonical "is Copilot pending?" signal lives in + `02-check-review-status.ps1` (above). + +4. If `01-request-review.ps1` throws because Copilot isn't a valid + reviewer (Copilot Code Review not enabled on the repo / account), + take the [single-iteration fallback](#single-iteration-fallback). + +## Step 7: Commit and push + +Sub-agent type: _(parent — no sub-agent)_; budget: n/a. + +**Inputs** +- The fix results from step 5 and the green build from step 6. + +**Return contract** +- Pushed `HeadOid` (the new commit SHA), recorded for step 8 reply + bodies and step 9 convergence proof. + +**Procedure** +- Parent runs `git commit` + `git push` directly. One focused commit + per round — bundling rounds destroys the audit trail of which finding + drove which change and breaks `git bisect`. +- Include the trailer: + `Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>`. +- Record the pushed SHA so step 8 can cite it in every reply body and + step 9 can compare it against `LatestCopilotReview.commitOid`. + +## Step 10: Cleanup outdated + +Sub-agent type: _(parent — no sub-agent)_; budget: n/a. Runs **once, +after convergence** (step 9 returned `Converged: true`). + +**Procedure** + +```pwsh +pwsh ./scripts/10-cleanup-outdated.ps1 -PrNumber +``` + +Safety net only. Most loops converge with nothing to clean — outdated +threads should already have been replied + resolved in step 8 like any +other open thread. Unresolved state is the source of truth in the PR +UI; `10-cleanup-outdated.ps1` only catches strays. + +## Single-iteration fallback + +When `01-request-review.ps1` throws because Copilot Code Review isn't +enabled on the repo / account (the GraphQL mutation reports the bot +isn't a valid reviewer), the agent falls back to **single-iteration +mode**: + +- Skip step 2 (no Copilot review to wait for). +- Run steps 3 – 8 once against whatever review threads already exist + (human, advanced-security, other bots). +- At step 9, pass `-SingleIteration` to `02-check-review-status.ps1` so + the convergence check ignores the stale-review checks that can never + advance without a new Copilot review. `Converged: true` collapses to + `OpenThreadsAwaitingReply == 0`. +- Re-iteration happens only when a human posts new comments later — + re-run the skill at that point. + +Single-iteration mode is **the agent's decision after the trigger +fails**, not an auto-detected state — the script can't reliably tell +"Copilot disabled" from "Copilot enabled but not yet triggered" from +API state alone. + +## Convergence proof + +Print the proof of convergence in the `task_complete` message — proof, +not assertion: + +- `HeadOid` +- `LatestCopilotReview.commitOid` +- `submittedAt` +- `OpenThreadsAwaitingReply: 0` +- The list of any open `escalate-to-user` threads if + `OpenThreadCount > 0`. + +## Notes + +- **Re-request is first-class.** `01-request-review.ps1` does not + silently skip when Copilot has already reviewed; it issues the + same mutation and verifies via a new `copilot_work_started` event + (the script enforces this — see [api-quirks.md](api-quirks.md) for + the GraphQL surface and the silent-drop trap). +- **Outdated threads still need reply + resolve.** They show up in + the PR UI as unresolved until you explicitly close them; step 10 + is a safety net, not the primary mechanism. +- **Reopened / revisit requests reset the thread to step 4.** If a + declined finding is reopened by the user (or by a follow-up + Copilot review), pull it back into triage with the prior rationale + as input rather than re-running the whole loop. +- **Resumability after interruption.** On restart, snapshot HEAD, + the latest Copilot review's `commit.oid` + `submittedAt`, the + open-threads list, and any uncommitted local changes. Discard + cached triage / drafts if HEAD or the open-threads set changed. +- **Local-build patches.** For projects with uncommitted local-build + patches held out of the PR: `git stash push -m "local-build" -- + ` before committing, `git stash pop` after. Note `-m` must + come BEFORE `--` (see [api-quirks.md](api-quirks.md)). diff --git a/skills/copilot-pr-autopilot/scripts/01-request-review.ps1 b/skills/copilot-pr-autopilot/scripts/01-request-review.ps1 new file mode 100644 index 000000000..ba063fd09 --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/01-request-review.ps1 @@ -0,0 +1,511 @@ +<# +.SYNOPSIS + Request a Copilot review on a PR and verify the trigger landed. + +.DESCRIPTION + Single mechanism: GraphQL `requestReviewsByLogin` with + `botLogins:["copilot-pull-request-reviewer"]`. See + references/api-quirks.md for the GraphQL surface details. + + Success contract (exit 0, single-line JSON): + - Status="InFlight" — Copilot already a requested reviewer. + - Status="TriggerLanded" — mutation submitted and verified via a + new `copilot_work_started` event id. + + Failure (throw, exit 1): mutation failed, or no new event landed + within -VerifySeconds. Caller should push a substantive commit and + retry (auto-assign on `synchronize` is the most reliable fallback). + +.PARAMETER PrNumber PR number (required). +.PARAMETER Owner + Optional; auto-resolved from `gh repo view`. + +.PARAMETER Repo + Optional; auto-resolved from `gh repo view`. +.PARAMETER VerifySeconds Verification poll window (1..600, default 45). + +.EXAMPLE + pwsh 01-request-review.ps1 -PrNumber 236 +#> +[CmdletBinding()] +param( + [Parameter(Mandatory = $true)] + [int]$PrNumber, + + [string]$Owner, + [string]$Repo, + + [ValidateRange(1, 600)] + [int]$VerifySeconds = 45 +) + +$ErrorActionPreference = 'Stop' +. "$PSScriptRoot/_lib.ps1" + +function Get-LatestCopilotWorkStartedEvent { + $eventsPath = "repos/$Owner/$Repo/issues/$PrNumber/events?per_page=100" + $r = Invoke-Gh -GhArgs @('api','-i',$eventsPath) + if ($r.ExitCode -ne 0) { throw "events query failed (credentials redacted): $(Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr)" } + + $m = [regex]::Match($r.Stdout, '(?s)\A(?.*?)\r?\n\r?\n(?.*)\z') + if (-not $m.Success) { throw 'events query returned an unexpected header/body shape.' } + $headers = $m.Groups['headers'].Value + $body = $m.Groups['body'].Value + + $lastPage = 1 + # Link header looks like: `; rel="last"` + # Param order is not guaranteed — `page=4` may appear before or after + # other query params. Match `page=` inside the URL (allowing `?` + # or `&` separator) up to the closing angle bracket, then the + # `rel="last"` marker. + $lastMatch = [regex]::Match($headers, '(?i)<[^>]*[?&]page=(\d+)[^>]*>;\s*rel="last"') + if ($lastMatch.Success) { $lastPage = [int]$lastMatch.Groups[1].Value } + # Initialized outside the race-walk branch so the post-block + # `if ($raceWalkIncomplete)` check below is well-defined even when + # we take the single-page path ($lastPage -le 1). + $raceWalkIncomplete = $false + if ($lastPage -gt 1) { + # Race defense: a new event landing between the header-fetch above + # and the lastPage fetch below could push the actual last event + # onto a NEW page (e.g., total=200, pages=2; +1 event → total=201, + # pages=3). We'd then miss the new event entirely and report a + # spurious "no copilot_work_started landed" timeout. Defend by + # also fetching with `-i` and walking any rel="next" pages until + # the Link header reports no more pages. + $allBodies = [System.Collections.Generic.List[pscustomobject]]::new() + $page = $lastPage + # Default reason "exhausted" — overwritten if we break early via + # cursor non-advance or overshoot cap. + $raceBreakReason = 'exhausted (no rel="next" link returned)' + # $raceWalkIncomplete is initialized once above before the + # if-branch so the post-block check works on both paths. + while ($true) { + $r = Invoke-Gh -GhArgs @('api','-i',"repos/$Owner/$Repo/issues/$PrNumber/events?per_page=100&page=$page") + if ($r.ExitCode -ne 0) { throw "events last-page query failed (page=$page) (credentials redacted): $(Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr)" } + $pm = [regex]::Match($r.Stdout, '(?s)\A(?.*?)\r?\n\r?\n(?.*)\z') + if (-not $pm.Success) { throw "events query returned unexpected header/body shape on page $page." } + $allBodies.Add([pscustomobject]@{ Page = $page; Body = $pm.Groups['body'].Value; Stderr = $r.Stderr }) + $nextMatch = [regex]::Match($pm.Groups['headers'].Value, '(?i)<[^>]*[?&]page=(\d+)[^>]*>;\s*rel="next"') + if (-not $nextMatch.Success) { break } + $nextPage = [int]$nextMatch.Groups[1].Value + if ($nextPage -le $page) { + # rel="next" pointed at the same or earlier page — server + # invariant violation. Surface so operators can tell + # "incomplete walk" from a real "no event landed". + $raceBreakReason = "rel=`"next`" pointed at page $nextPage (current page $page) — cursor non-advance" + Write-Warning "events race-walk: $raceBreakReason on PR #$PrNumber. Pagination stopped; downstream 'no copilot_work_started landed' may be a false negative." + $raceWalkIncomplete = $true + break + } + $page = $nextPage + if ($page -gt $lastPage + 5) { + $raceBreakReason = "overshoot cap reached (page=$page > lastPage($lastPage)+5)" + Write-Warning "events race-walk: $raceBreakReason on PR #$PrNumber. Pagination stopped; downstream 'no copilot_work_started landed' may be a false negative." + $raceWalkIncomplete = $true + break + } + } + # Per-page JSON arrays can't be safely concat'd as raw text (the + # outer brackets break valid JSON). Deserialize each page body + # separately and aggregate the events list. + $events = [System.Collections.Generic.List[object]]::new() + foreach ($b in $allBodies) { + $parsed = @(ConvertFrom-GhJson -Stdout $b.Body -Stderr $b.Stderr -Context "events page $($b.Page) (race-walk on $Owner/$Repo PR #$PrNumber)") + foreach ($e in $parsed) { $events.Add($e) } + } + } else { + $events = @(ConvertFrom-GhJson -Stdout $body -Stderr $r.Stderr -Context "events page $lastPage ($Owner/$Repo PR #$PrNumber)") + } + # Sort defensively by [long] — GitHub event `id` is a JSON integer that + # ConvertFrom-Json normally deserialises as Int64, but a stringified + # variant would lexicographically mis-order (e.g., 100 before 99). Cast + # in the sort expression to guarantee numeric ordering. + $latest = $events | Where-Object { $_.event -eq 'copilot_work_started' } | Sort-Object { [long]$_.id } | Select-Object -Last 1 + # Default: baseline is certain. We reset to false only when the back-walk + # exhausts its cap WITHOUT reaching page 1 (i.e., older history exists + # but we chose not to scan it). Reaching page 1 with no match means there + # truly is no prior copilot_work_started event — baseline=0 is correct. + $baselineCertain = $true + # If the forward race-walk terminated via cursor non-advance or + # overshoot cap (vs exhausting rel="next"), the events list is + # incomplete and a $latest derived from it (or the lack thereof) + # is not trustworthy. Force baselineCertain=$false so the + # trigger-verify loop downstream uses the wall-clock acceptance + # path (less strict but correct) rather than the 'id > beforeId' + # path with a stale / wrong beforeId (including a spurious 0). + if ($raceWalkIncomplete) { $baselineCertain = $false } + if (-not $latest -and $lastPage -gt 1) { + # The forward race-walk above only materialized pages from + # $lastPage onward. On high-activity PRs the last `copilot_work_started` + # may live further back in history — without a baseline event id we'd + # report a spurious "no copilot_work_started landed" timeout. Bounded + # backward walk: scan up to 5 prior pages (oldest end of history is + # page 1) until we find one, then stop. Keeps API usage capped while + # preserving diagnostic accuracy on busy PRs. + $backCap = 5 + $startBack = $lastPage - 1 + $minBack = [Math]::Max(1, $lastPage - $backCap) + for ($bp = $startBack; $bp -ge $minBack; $bp--) { + $rb = Invoke-Gh -GhArgs @('api',"repos/$Owner/$Repo/issues/$PrNumber/events?per_page=100&page=$bp") + if ($rb.ExitCode -ne 0) { + Write-Warning "events back-walk: page=$bp query failed on PR #$PrNumber (credentials redacted) — $(Format-CopilotPrAutopilotErrorPreview -Text $rb.Stderr). Diagnostic baseline may be inaccurate." + $baselineCertain = $false + break + } + $backEvents = @(ConvertFrom-GhJson -Stdout $rb.Stdout -Stderr $rb.Stderr -Context "events back-walk page $bp") + $bm = $backEvents | Where-Object { $_.event -eq 'copilot_work_started' } | Sort-Object { [long]$_.id } | Select-Object -Last 1 + if ($bm) { $latest = $bm; break } + } + # If we exited the loop without a hit AND minBack > 1, older + # history exists that we didn't scan. baseline=0 is then a guess, + # and the caller must fall back to timestamp-based verification. + if (-not $latest -and $minBack -gt 1) { $baselineCertain = $false } + } + if (-not $latest) { return [pscustomobject]@{ Id = 0L; CreatedAt = ''; BaselineCertain = $baselineCertain } } + $createdAt = Format-IsoUtcString $latest.created_at + [pscustomobject]@{ Id = [long]$latest.id; CreatedAt = $createdAt; BaselineCertain = $baselineCertain } +} + +# ---------- repo resolve ---------- + +$coords = Resolve-RepoCoords -Owner $Owner -Repo $Repo +$Owner = $coords.Owner +$Repo = $coords.Repo + +# ---------- state: is Copilot currently requested? ---------- +# Single GraphQL query: requested reviewers + head SHA, followed by +# pagination for the full requested-reviewer set. + +$stateQuery = @' +query($o:String!,$r:String!,$n:Int!){ + viewer{login} + repository(owner:$o,name:$r){ + pullRequest(number:$n){ + id + headRefOid + state + author{login} + reviews(last:100){nodes{author{login}}} + reviewRequests(first:100){nodes{requestedReviewer{__typename ... on Bot{login} ... on User{login} ... on Mannequin{login}}} pageInfo{hasNextPage endCursor}} + } + } +} +'@ +$stateData = Invoke-GhGraphQL -GhArgs @('-f',"query=$stateQuery",'-f',"o=$Owner",'-f',"r=$Repo",'-F',"n=$PrNumber") -Context "state query for $Owner/$Repo PR #$PrNumber" +$pr = $stateData.data.repository.pullRequest +if (-not $pr) { throw "PR #$PrNumber not found in $Owner/$Repo." } +if ($pr.state -ne 'OPEN') { + throw "PR #$PrNumber is not OPEN (state=$($pr.state))." +} + +$viewerLogin = [string]$stateData.data.viewer.login +$prAuthorLogin = if ($pr.author) { [string]$pr.author.login } else { '' } +$viewerIsAuthor = ($viewerLogin -and $prAuthorLogin -and ($viewerLogin -eq $prAuthorLogin)) +$copilotHasReviewed = $false +if ($pr.reviews -and $pr.reviews.nodes) { + foreach ($rev in $pr.reviews.nodes) { + if ($rev.author -and $rev.author.login -and ($rev.author.login -match $CopilotPrAutopilot_CopilotReviewerLoginRegex)) { + $copilotHasReviewed = $true; break + } + } +} + +$headOid = $pr.headRefOid +$prNodeId = [string]$pr.id +if ([string]::IsNullOrWhiteSpace($prNodeId)) { + throw "Failed to resolve PR node id for $Owner/$Repo PR #$PrNumber from state query." +} +$reviewRequestsList = [System.Collections.Generic.List[object]]::new() +# Guard against `$pr.reviewRequests.nodes` being $null (PR with no +# requested reviewers / abbreviated GraphQL response). `@($null)` is +# Count=1 with a $null element in PowerShell — NOT empty — so iterating +# it unguarded would inject a $null into the list and contaminate +# downstream shape/logging assumptions. +if ($pr.reviewRequests -and $pr.reviewRequests.nodes) { + foreach ($n in $pr.reviewRequests.nodes) { $reviewRequestsList.Add($n) } +} +# Track the real pageInfo object across pages so we can pass it +# (not a synthetic [pscustomobject]) to Assert-CopilotPrAutopilotPageInfo +# — that keeps the helper validating the actual GraphQL connection +# shape (presence of hasNextPage / endCursor properties), which is +# the whole point of the missing-property check the helper just +# gained. +$currentPageInfo = $pr.reviewRequests.pageInfo +# Validate the first-page pageInfo BEFORE the while-condition +# reads hasNextPage. Otherwise a malformed or null pageInfo +# (or a missing hasNextPage property) silently coerces to $false +# and the script exits the loop without fetching subsequent +# pages — yielding an incomplete reviewRequests set and a wrong +# CopilotPending=false on PRs with >100 requested reviewers or +# on partial / malformed GraphQL responses. +Assert-CopilotPrAutopilotPageInfo -Context "reviewRequests pagination (first page) for $Owner/$Repo PR #$PrNumber" -PageInfo $currentPageInfo +$hasNext = [bool]$currentPageInfo.hasNextPage +$prevAfter = $null +while ($hasNext) { + # Cache cursor from the just-asserted page BEFORE fetching the next + # page, so the cursor we use to advance came from a validated + # pageInfo (not from a potentially-malformed subsequent response). + $after = $currentPageInfo.endCursor + if ($null -ne $prevAfter -and $after -eq $prevAfter) { + throw "reviewRequests pagination invariant violated for $Owner/$Repo PR #${PrNumber}: cursor did not advance (stuck at '$after')." + } + $prevAfter = $after + $pageQuery = @' +query($o:String!,$r:String!,$n:Int!,$after:String!){ + repository(owner:$o,name:$r){ + pullRequest(number:$n){ + reviewRequests(first:100,after:$after){nodes{requestedReviewer{__typename ... on Bot{login} ... on User{login} ... on Mannequin{login}}} pageInfo{hasNextPage endCursor}} + } + } +} +'@ + $pageData = Invoke-GhGraphQL -GhArgs @('-f',"query=$pageQuery",'-f',"o=$Owner",'-f',"r=$Repo",'-F',"n=$PrNumber",'-f',"after=$after") -Context "reviewRequests page query for $Owner/$Repo PR #$PrNumber" + $page = $pageData.data.repository.pullRequest.reviewRequests + foreach ($n in $page.nodes) { $reviewRequestsList.Add($n) } + $currentPageInfo = $page.pageInfo + # Assert-then-dereference for every subsequent page. Each + # $page.pageInfo is a fresh GraphQL object that could individually + # be malformed; without this, the next loop iteration's + # `[bool]$currentPageInfo.hasNextPage` could silently coerce a + # missing/null hasNextPage to $false and truncate pagination + # (potentially producing an incorrect CopilotPending=false). + Assert-CopilotPrAutopilotPageInfo -Context "reviewRequests pagination for $Owner/$Repo PR #$PrNumber" -PageInfo $currentPageInfo + $hasNext = [bool]$currentPageInfo.hasNextPage +} +$reviewRequests = $reviewRequestsList.ToArray() +$copilotPendingRequests = @($reviewRequests | Where-Object { + $_.requestedReviewer -and $_.requestedReviewer.login -and $_.requestedReviewer.login -match $CopilotPrAutopilot_CopilotReviewerLoginRegex +}) +$copilotPending = $copilotPendingRequests.Count -gt 0 + +# If Copilot is currently in requested_reviewers, it's in-flight by definition. +if ($copilotPending) { + @{ + Status = 'InFlight' + PrNumber = $PrNumber + HeadOid = $headOid + Detail = "Copilot is currently in requested_reviewers; review is in flight." + } | ConvertTo-Json -Compress + exit 0 +} + +# We do NOT short-circuit on AlreadyReviewed — the user wants re-request +# as a first-class flow. Re-trigger; the GraphQL mutation handles both +# initial-add and re-request identically. + +# ---------- snapshot copilot_work_started before triggering ---------- + +# Snapshot the latest copilot_work_started BEFORE triggering. Use the +# event's numeric `id` (monotonic) — `created_at` is second-resolution +# and would collide if a new event lands in the same second. +$beforeEvent = Get-LatestCopilotWorkStartedEvent +$beforeId = $beforeEvent.Id +$baselineCertain = $beforeEvent.BaselineCertain +# Wall-clock timestamp captured BEFORE the mutation. Used as a fallback +# verification anchor when the back-walk could not establish a certain +# baseline (i.e., $baselineCertain == $false). With a tiny clock-skew +# margin (5s) we can then accept a new event by timestamp >= triggerSentAt +# without the risk of mis-attributing a pre-existing event as 'landed'. +$triggerSentAt = (Get-Date).ToUniversalTime().AddSeconds(-5) + +# ---------- trigger via GraphQL requestReviewsByLogin ---------- + +$mut = 'mutation($p:ID!,$slug:String!){requestReviewsByLogin(input:{pullRequestId:$p,botLogins:[$slug]}){pullRequest{number}}}' +# Why this path and not REST or `requestReviews`? Verified end-to-end: +# - REST POST /pulls/{n}/requested_reviewers `reviewers:["Copilot"]` +# (the bot's REST login per `GET user/175728472`) → 404. The REST +# `reviewers` field accepts type=User only; bots are rejected even +# when the login resolves to a Bot record. +# - GraphQL `requestReviews` rejects bot node IDs ("Could not resolve +# to User node with the global id of 'BOT_…'") at schema level. +# - `requestReviewsByLogin.botLogins` is the ONLY public path for bot +# reviewers; trade-off is that it requires repo Triage/Write. +# - The UI 🔄 button uses a github.com Rails endpoint with a session +# cookie + CSRF that gh's OAuth token cannot satisfy. +# Catch the auth-gated case below and surface the two real workarounds. +# The reviewer slug is centralized in _lib.ps1 (`$CopilotReviewerSlug`) +# so the literal isn't duplicated across scripts. +$r = Invoke-Gh -GhArgs @('api','graphql','-f',"query=$mut",'-f',"p=$prNodeId",'-f',"slug=$CopilotReviewerSlug") + +# Belt-and-suspenders permission-error detection. Empirically `gh api graphql` +# exits non-zero AND puts the message in stderr for FORBIDDEN on +# requestReviewsByLogin (verified: exit=1, stderr contains "does not have the +# correct permissions"). But some GraphQL paths return exit=0 with a top-level +# `errors[]` carrying type=FORBIDDEN, so check both surfaces and route both to +# the same actionable-error formatter. +$permErrInStderr = ($r.ExitCode -ne 0) -and ($r.Stderr -match '(?i)does not have (the )?correct permissions|forbidden|HTTP 403') +$permErrInBody = $false +$bodyErrors = $null +if ($r.Stdout) { + try { + # Route through the shared ConvertFrom-GhJson helper so the + # preview format / context conventions stay consistent. The + # helper throws on parse failure; we catch and Write-Warning + # (fall through to the authoritative stderr/exit-code path) + # rather than abort — the warning makes the fall-through + # observable in logs. + $parsed = ConvertFrom-GhJson -Stdout $r.Stdout -Stderr $r.Stderr -Context 'requestReviewsByLogin' -PreviewChars 200 + if ($parsed.errors) { + $bodyErrors = $parsed.errors + # GraphQL FORBIDDEN can surface via three signals — `type`, + # `extensions.code`, or the message text. The API has been + # observed to return `extensions.code = "FORBIDDEN"` WITHOUT + # populating top-level `type`, so we must check both surfaces + # to avoid bypassing the actionable permission-error path. + $permErrInBody = [bool]($parsed.errors | Where-Object { + ($_.type -eq 'FORBIDDEN') -or + ($_.extensions -and $_.extensions.code -and ($_.extensions.code -match '(?i)^forbidden$')) -or + ($_.message -match '(?i)does not have (the )?correct permissions|forbidden') + }) + } + } catch { + # If gh exited 0 but returned non-JSON, that's an actionable hard + # failure for this step (a successful-looking response that's + # actually garbage would otherwise fall through to the event- + # verify loop and surface as a misleading "no event landed" + # timeout). Rethrow the original ErrorRecord so the helper's + # context-rich message keeps its type/InnerException/stack. We + # only fall through with a warning when stderr/exit-code already + # carry the authoritative signal (handled in the blocks below). + if ($r.ExitCode -eq 0 -and -not $permErrInStderr) { + throw + } + Write-Warning $_.Exception.Message + } +} + +if ($permErrInStderr -or $permErrInBody) { + # Redact credentials AND cap the embedded preview. Earlier rounds + # left $rawMsg uncapped for "operator-facing diagnostic" reasons, + # but in practice `gh` stderr under GH_DEBUG=api / verbose HTTP can + # produce multi-MB payloads that flood CI logs and destabilize + # downstream log collectors. 5000 chars preserves enough context + # to diagnose the underlying FORBIDDEN / RATE_LIMITED / NOT_FOUND + # message while staying parseable; full text is still recoverable + # via -Verbose / `gh` re-run. + $rawMsg = if ($permErrInStderr) { + Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr -MaxChars 5000 + } elseif ($bodyErrors) { + Format-CopilotPrAutopilotErrorPreview -Text (($bodyErrors | ForEach-Object { $_.message }) -join '; ') -MaxChars 5000 + } else { '(no message)' } + if ($viewerIsAuthor) { + # External PR author scenario: GitHub's UI 🔄 button uses an internal + # endpoint not exposed in the public GraphQL/REST schema. Verified via + # schema enumeration: the only public bot-reviewer mutation is + # requestReviewsByLogin, which requires Triage/Write on the repo. + # PR authors without write permission cannot trigger via any public API. + $scenario = if ($copilotHasReviewed) { 're-request' } else { 'initial add' } + throw @" +Cannot trigger Copilot via public API in this scenario ($scenario): + - You are the PR author ($viewerLogin) on $Owner/$Repo PR #$PrNumber. + - You lack repo Triage/Write permission, so requestReviewsByLogin returns FORBIDDEN. + - GitHub's public GraphQL schema has no other bot-reviewer mutation + (verified: requestReviews rejects bot node IDs; no REST 'bot_reviewers' field). + - The UI's '🔄 Re-request review' button uses an internal endpoint not in the public API. + +Use one of these workarounds (both reliably drive Copilot to re-review): + 1. UI: open the PR in a browser → click 🔄 next to 'copilot-pull-request-reviewer'. + 2. CLI: push a substantive (non-whitespace) commit. The 'synchronize' event + auto-triggers Copilot with no API call and no permission required. + +After triggering by either means, resume the loop with 02-check-review-status.ps1. + +Raw error: $rawMsg +"@ + } + throw @" +GraphQL requestReviewsByLogin failed with a permission error: $rawMsg + +Most likely causes: + * Authenticated user lacks Triage / Write permission on the repo + (run ``gh api repos/$Owner/$Repo --jq .permissions`` to confirm; Read-only + collaborators cannot request reviewers). + * Copilot Code Review not enabled on the repo / account. +"@ +} + +if ($r.ExitCode -ne 0) { + throw @" +GraphQL requestReviewsByLogin failed (credentials redacted): $(Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr -MaxChars 5000) + +Most likely causes: + * Quiet-period after a recent dismissal of Copilot — wait 5-10 min, or push a substantive commit. + * Copilot Code Review not enabled on the repo / account. + * PR in a state that blocks bot review (draft, conflict, branch protection). +"@ +} + +if ($bodyErrors) { + # Non-FORBIDDEN errors[] from a successful exit — surface them directly. + $msgs = Format-CopilotPrAutopilotErrorPreview -Text (($bodyErrors | ForEach-Object { $_.message }) -join '; ') -MaxChars 5000 + throw "GraphQL requestReviewsByLogin returned errors (credentials redacted): $msgs" +} + +# ---------- verify copilot_work_started event landed ---------- + +$deadline = (Get-Date).AddSeconds($VerifySeconds) +$afterTs = '' +$afterId = 0L +$lastErr = '' +# Backoff schedule: fast initial polls (events usually land within ~5s on the +# happy path) then coarsen to keep API spend bounded on slow paths. Without +# backoff, VerifySeconds=600 would burn ~120 events-API calls per attempt and +# raise the secondary rate-limit risk inside a tight loop. +$pollSchedule = @(2, 2, 3, 5, 10, 15, 30) +$pollIdx = 0 +do { + try { + $nowEvent = Get-LatestCopilotWorkStartedEvent + $lastErr = '' + $accept = $false + if ($baselineCertain) { + # Baseline is reliable: monotonic event-id comparison is exact. + if ($nowEvent.Id -gt $beforeId) { $accept = $true } + } else { + # Baseline uncertain (back-walk hit cap with older history + # still unscanned). An id > beforeId(=0) would be true for any + # pre-existing event and produce a false-positive "landed". + # Require event timestamp >= triggerSentAt (with 5s clock-skew + # margin already applied) so we only accept events that landed + # AFTER the mutation was sent. + if ($nowEvent.Id -gt 0 -and $nowEvent.CreatedAt) { + $parsedTs = $null + if ([DateTime]::TryParse($nowEvent.CreatedAt, [Globalization.CultureInfo]::InvariantCulture, [Globalization.DateTimeStyles]::AssumeUniversal -bor [Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$parsedTs)) { + if ($parsedTs -ge $triggerSentAt) { $accept = $true } + } + } + } + if ($accept) { + $afterId = $nowEvent.Id + $afterTs = $nowEvent.CreatedAt + break + } + } catch { + $lastErr = $_.Exception.Message + } + if ((Get-Date) -ge $deadline) { break } + $remaining = [int]($deadline - (Get-Date)).TotalSeconds + $step = $pollSchedule[[Math]::Min($pollIdx, $pollSchedule.Count - 1)] + Start-Sleep -Seconds ([Math]::Min($step, [Math]::Max(1, $remaining))) + $pollIdx++ +} while ((Get-Date) -lt $deadline) + +if (-not $afterId) { + $errTail = if ($lastErr) { "`n Last events-query error: $lastErr" } else { '' } + throw @" +GraphQL mutation returned success but no new copilot_work_started event landed within $VerifySeconds seconds. The server may have silently dropped the request, or the events query kept failing transiently. + Latest copilot_work_started event id before trigger: $beforeId + HEAD: $headOid$errTail + +Push a substantive commit (auto-assign on synchronize is the most reliable trigger) and retry. +"@ +} + +@{ + Status = 'TriggerLanded' + PrNumber = $PrNumber + HeadOid = $headOid + WorkStartedAt = $afterTs + Detail = "Triggered via GraphQL requestReviewsByLogin; copilot_work_started at $afterTs." +} | ConvertTo-Json -Compress +exit 0 diff --git a/skills/copilot-pr-autopilot/scripts/02-check-review-status.ps1 b/skills/copilot-pr-autopilot/scripts/02-check-review-status.ps1 new file mode 100644 index 000000000..3db8b44fe --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/02-check-review-status.ps1 @@ -0,0 +1,382 @@ +<# +.SYNOPSIS + Snapshot the current Copilot review state of a PR. Single-shot, no waiting. + +.DESCRIPTION + ONE job: return a JSON snapshot of the PR's current Copilot + review state. The agent (caller) decides what to do with it — + including how long to wait between snapshots when polling for a + new review to land. THIS SCRIPT DOES NOT WAIT. + + Output JSON fields: + - PrNumber, Owner, Repo + - HeadOid : current PR HEAD SHA + - State : PR state (OPEN/CLOSED/MERGED) + - LatestCopilotReview: {state, submittedAt, commitOid, bodyHead} + or null if no Copilot review is present + in the most recent 100 reviews (very long + PRs may have an older Copilot review outside + this window — treat null as "no recent + review", not "never reviewed") + - ReviewAtHead : true iff latest Copilot review's commit.oid == HeadOid + - NoNewComments : true iff the latest review body matches + "generated no new comments" / "generated 0 comments" + - OpenThreadCount : number of unresolved review threads (from all + reviewers); informational — convergence does + NOT require this to be zero + - OpenThreadsAwaitingReply: number of open threads where the + LAST comment is NOT from the authenticated + user (`gh api user`). "Ball-in-court" + model: a Copilot/human comment with no + reply from us OR a re-raise after our + earlier reply both count as awaiting. A + thread where WE are the latest commenter + counts as "done from our side" (the human + merge owner decides next). + - CopilotPending : true iff the Copilot reviewer bot is currently + listed in `requested_reviewers` on the PR (a + review is in flight; the caller should wait + rather than re-trigger) + - Converged : true iff the agent has done its job. + - When a Copilot review is at HEAD: + ReviewAtHead && NoNewComments && + OpenThreadsAwaitingReply == 0. + - When no Copilot review has been observed + on this PR (LatestCopilotReview is null + AND CopilotPending is false): just + OpenThreadsAwaitingReply == 0. Note this + ALSO fires for a brand-new PR with zero + threads — meaning "nothing to do yet"; + the agent should still trigger a Copilot + review via 01-request-review.ps1 if + Copilot is enabled on the repo. Single- + iteration mode (skip the trigger) is the + agent's decision after 01 fails with a + specific Copilot-disabled error, NOT an + auto-detected state from this script. + Open threads may remain in either case — + those are explicit hand-offs to the human + merge owner. + + Canonical agent loop (see ../references/orchestration.md and the per-step files): + 1. Call this script → capture LatestCopilotReview.submittedAt as + baseline AND read CopilotPending. + 2. If CopilotPending is true, skip the trigger step — Copilot is + already reviewing. Otherwise, call 01-request-review.ps1. + If 01 throws with a Copilot-disabled error (e.g. the bot + isn't a valid reviewer on this repo), the agent may fall + back to single-iteration mode: skip the wait, jump to + 03-list-open-threads.ps1, triage + reply to whatever exists, + done. + 3. Wait sub-agent polls this script until either submittedAt + advances past baseline AND ReviewAtHead is true, OR Converged. + 4. On convergence end the loop; otherwise fetch threads via + 03-list-open-threads.ps1, triage, fix, push, reply, repeat. + + Parsing the JSON: timestamps are emitted as plain ISO-8601 UTC + strings (e.g. `"2026-06-08T02:02:44Z"`). Extract via regex on the + raw JSON to avoid PowerShell's auto re-binding of ISO strings to + `[datetime]` (which renders local culture on string interpolation + and silently breaks lexicographic baseline compares): + + $snap = pwsh -NoProfile -File 02-check-review-status.ps1 -PrNumber + $baseline = if ($snap -match '"submittedAt":"([^"]+)"') { $Matches[1] } else { '' } + $copilotPending = ($snap -match '"CopilotPending":true') + $converged = ($snap -match '"Converged":true') + + Works on any PowerShell version (5.1 + 7.x). No `[datetime]` + rebinding, no version-specific parameters. + +.PARAMETER PrNumber + The pull request number. The only required parameter. + +.PARAMETER Owner + Repository owner. OPTIONAL — auto-resolved from `gh repo view`. + +.PARAMETER Repo + Repository name. OPTIONAL — auto-resolved from `gh repo view`. + +.EXAMPLE + pwsh 02-check-review-status.ps1 -PrNumber 236 + + # Output (converged): + # {"HeadOid":"abc...","State":"OPEN","LatestCopilotReview":{...},"ReviewAtHead":true,"NoNewComments":true,"OpenThreadCount":0} + + # Output (not converged — new findings): + # {"HeadOid":"abc...","ReviewAtHead":true,"NoNewComments":false,"OpenThreadCount":3,...} +#> +[CmdletBinding()] +param( + [Parameter(Mandatory = $true)] + [int]$PrNumber, + + [string]$Owner, + [string]$Repo, + + # When set, the agent has decided to drive this PR as a single + # iteration (typically because 01-request-review.ps1 failed with a + # Copilot-disabled error). In this mode, convergence ignores the + # stale-review checks (ReviewAtHead / NoNewComments) — those can + # never become true when the trigger is intentionally skipped — + # and depends solely on OpenThreadsAwaitingReply == 0. + [switch]$SingleIteration +) + +$ErrorActionPreference = 'Stop' +. "$PSScriptRoot/_lib.ps1" + +$coords = Resolve-RepoCoords -Owner $Owner -Repo $Repo +$Owner = $coords.Owner +$Repo = $coords.Repo + +# Identity of the currently-authenticated gh user. Used below to +# detect "the agent has already replied to this thread" and therefore +# count it as our work-completed (the thread may still be open +# deliberately as a human hand-off). +$meR = Invoke-Gh -GhArgs @('api','user','--jq','.login') +if ($meR.ExitCode -ne 0) { + throw "gh api user failed (exit $($meR.ExitCode)) (credentials redacted): $(Format-CopilotPrAutopilotErrorPreview -Text $meR.Stderr)" +} +$me = $meR.Stdout.Trim() + +# Query A (once): PR head/state/reviews. Reviews are not paginated +# here — `reviews(last:100)` is the most recent 100 reviews, sufficient +# for finding the latest Copilot review. +$qHead = @' +query($o:String!,$r:String!,$n:Int!){ + repository(owner:$o,name:$r){ + pullRequest(number:$n){ + headRefOid + state + reviews(last:100){nodes{id author{login} state submittedAt body commit{oid}}} + } + } +} +'@ + +$d = Invoke-GhGraphQL -GhArgs @('-f',"query=$qHead",'-f',"o=$Owner",'-f',"r=$Repo",'-F',"n=$PrNumber") -Context "head query for $Owner/$Repo PR #$PrNumber" +$pr = $d.data.repository.pullRequest +if (-not $pr) { throw "PR #$PrNumber not found in $Owner/$Repo." } + +# Query B (paginated): reviewThreads — fetch isResolved AND the last +# comment's author per thread so we can compute +# "is this open thread awaiting our reply, or have we already handed +# it off?" The loop converges when WE have nothing more to do, not +# when the open-thread count drops to zero (some threads stay open +# deliberately as human hand-offs / escalated declines). +$qThreads = @' +query($o:String!,$r:String!,$n:Int!,$after:String){ + repository(owner:$o,name:$r){ + pullRequest(number:$n){ + reviewThreads(first:100, after:$after){ + pageInfo{endCursor hasNextPage} + nodes{ + isResolved + comments(last:1){nodes{author{login}}} + } + } + } + } +} +'@ + +$after = $null +$allThreadsList = [System.Collections.Generic.List[object]]::new() +do { + $ghArgs = @('-f', "query=$qThreads", '-f', "o=$Owner", '-f', "r=$Repo", '-F', "n=$PrNumber") + if ($after) { $ghArgs = $ghArgs + @('-f', "after=$after") } + $threadResp = Invoke-GhGraphQL -GhArgs $ghArgs -Context "threads query for $Owner/$Repo PR #$PrNumber" + $pagePr = $threadResp.data.repository.pullRequest + if (-not $pagePr) { throw "PR #$PrNumber not found in $Owner/$Repo (threads page)." } + # Assert pageInfo BEFORE dereferencing endCursor/hasNextPage so a + # malformed connection surfaces the canonical invariant message + # (not a null-reference). hasNextPage cached locally to avoid + # re-reading the property in the do/while condition. + Assert-CopilotPrAutopilotPageInfo -Context "threads query for $Owner/$Repo PR #$PrNumber" -PageInfo $pagePr.reviewThreads.pageInfo + foreach ($n in $pagePr.reviewThreads.nodes) { $allThreadsList.Add($n) } + $after = $pagePr.reviewThreads.pageInfo.endCursor + $hasNext = [bool]$pagePr.reviewThreads.pageInfo.hasNextPage +} while ($hasNext) +$allThreads = $allThreadsList.ToArray() + +# Query C (paginated): reviewRequests — typical PRs have <100 requested +# reviewers, but pagination is required for correctness so we never +# falsely report CopilotPending=false on a PR with >100 requested +# reviewers (which would cause the wait sub-agent to re-trigger a +# review that's actually already in flight). +$qReviewRequests = @' +query($o:String!,$r:String!,$n:Int!,$after:String){ + repository(owner:$o,name:$r){ + pullRequest(number:$n){ + reviewRequests(first:100, after:$after){ + pageInfo{endCursor hasNextPage} + nodes{requestedReviewer{__typename ... on Bot{login} ... on User{login} ... on Mannequin{login}}} + } + } + } +} +'@ + +$after = $null +$allReviewRequestsList = [System.Collections.Generic.List[object]]::new() +do { + $ghArgs = @('-f', "query=$qReviewRequests", '-f', "o=$Owner", '-f', "r=$Repo", '-F', "n=$PrNumber") + if ($after) { $ghArgs = $ghArgs + @('-f', "after=$after") } + $rrResp = Invoke-GhGraphQL -GhArgs $ghArgs -Context "reviewRequests query for $Owner/$Repo PR #$PrNumber" + $rrPagePr = $rrResp.data.repository.pullRequest + if (-not $rrPagePr) { throw "PR #$PrNumber not found in $Owner/$Repo (reviewRequests page)." } + # Same ordering: assert pageInfo before reading endCursor/hasNextPage. + Assert-CopilotPrAutopilotPageInfo -Context "reviewRequests query for $Owner/$Repo PR #$PrNumber" -PageInfo $rrPagePr.reviewRequests.pageInfo + foreach ($n in $rrPagePr.reviewRequests.nodes) { $allReviewRequestsList.Add($n) } + $after = $rrPagePr.reviewRequests.pageInfo.endCursor + $hasNext = [bool]$rrPagePr.reviewRequests.pageInfo.hasNextPage +} while ($hasNext) +$allReviewRequests = $allReviewRequestsList.ToArray() + +# M1 tie-break: when multiple Copilot reviews share the same +# submittedAt (server-side clock collision is rare but possible +# under burst re-triggers), pick the one whose commit.oid matches +# HEAD if any. For the residual tie among same-submittedAt reviews, +# add a deterministic SECONDARY sort key (`id`, the GraphQL node +# ID) instead of relying on `Sort-Object` stability — stability is +# a current implementation detail, not a documented contract, so +# any future PowerShell release could silently flip the chosen +# review and with it `ReviewAtHead` / `NoNewComments`. Descending +# `id` is a reasonable proxy for "newest" when timestamps tie +# because GitHub-issued IDs are monotonic within a connection, +# and even if that proxy is occasionally wrong the choice stays +# stable across runs. +# M3 pagination: reviews(last:100) returns the MOST RECENT 100 +# reviews. If a PR has 100+ reviews more recent than the last +# Copilot review (essentially impossible in normal use, but +# theoretically possible on heavily-bot-reviewed PRs), the latest +# Copilot review would be cut off. Emit a soft warning to stderr +# when we hit the boundary so the caller knows to inspect manually. +if ($pr.reviews.nodes.Count -ge 100) { + [Console]::Error.WriteLine("WARNING: reviews(last:100) boundary hit on PR #$PrNumber — if there are 100+ non-Copilot reviews more recent than the latest Copilot review, LatestCopilotReview may be stale. Inspect via 'gh pr view $PrNumber --comments' if convergence behaves unexpectedly.") +} +$copilotReviews = @($pr.reviews.nodes | Where-Object { + $_.author -and $_.author.login -and $_.author.login -match $CopilotPrAutopilot_CopilotReviewerLoginRegex +}) +$latest = if ($copilotReviews.Count -gt 0) { + $atHead = $copilotReviews | Where-Object { $_.commit -and $_.commit.oid -eq $pr.headRefOid } | Sort-Object submittedAt,id -Descending | Select-Object -First 1 + if ($atHead) { $atHead } else { $copilotReviews | Sort-Object submittedAt,id -Descending | Select-Object -First 1 } +} else { $null } + +$reviewAtHead = $false +$noNewComments = $false +$bodyHead = $null +$latestCommitOid = $null +if ($latest) { + if ($latest.commit -and $latest.commit.oid) { + $latestCommitOid = $latest.commit.oid + $reviewAtHead = ($latestCommitOid -eq $pr.headRefOid) + } + $bodyText = if ($latest.body) { $latest.body } else { '' } + # NoNewComments covers both successful zero-finding reviews AND the + # "Copilot wasn't able to review any files in this pull request" + # body that Copilot returns for empty / pure-whitespace / line-ending- + # only diffs. Both are terminal for the loop: there is nothing for + # the agent to address and re-triggering will produce the same body. + # Matches Copilot's "no findings" terminal phrases. Anchored on + # \b (word boundary, blocks "regenerated") and on a following + # sentence-end (./!/EOL/EOS) so the regex does NOT false-positive + # on substrings like "generated no comments yet but..." or + # "with 0 comments outstanding". Tested against the 4 known + # negative inputs and 6 known positive Copilot body templates. + $noNewComments = ($bodyText -match '(?im)\b(?:generated|had|with)\s+(?:no|0|zero)\s+(?:new\s+)?comments\s*(?:[\.\!]|$)|wasn''t\s+able\s+to\s+review\s+any\s+files\s+in\s+this\s+pull\s+request|was\s+not\s+able\s+to\s+review\s+any\s+files\s+in\s+this\s+pull\s+request') + $bodyHead = if ($bodyText.Length -gt 300) { $bodyText.Substring(0, 300) } else { $bodyText } +} + +$openThreads = @($allThreads | Where-Object { -not $_.isResolved }) +$openCount = $openThreads.Count + +# OpenThreadsAwaitingReply: open threads where the LAST comment is +# NOT from the authenticated user. "Ball is in our court" model: +# - Copilot/human posts a finding → last=them → awaiting our reply. +# - We reply → last=us → ball passes back → not awaiting. +# - Copilot re-raises after our reply → last=them again → awaiting. +# Using "last comment" (not "any comment by us in window") is what +# correctly handles re-raised threads. Threads we've replied to but +# the reviewer hasn't yet acted on count as "done from our side" — +# the human merge owner decides what to do with them next. +$awaitingReply = @($openThreads | Where-Object { + $thread = $_ + $lastAuthor = $null + if ($thread.comments -and $thread.comments.nodes -and $thread.comments.nodes.Count -gt 0) { + $lastComment = $thread.comments.nodes[$thread.comments.nodes.Count - 1] + if ($lastComment -and $lastComment.author -and $lastComment.author.login) { + $lastAuthor = $lastComment.author.login + } + } + $lastAuthor -ne $me +}) +$awaitingCount = $awaitingReply.Count + +# CopilotPending: is the Copilot reviewer bot currently in +# `requested_reviewers`? Canonical signal for "review is in flight"; +# the wait sub-agent (workflow step 2) consults this so the trigger +# step (01-request-review.ps1) can be skipped when already pending. +$copilotPending = @($allReviewRequests | Where-Object { + $_.requestedReviewer -and $_.requestedReviewer.login -and $_.requestedReviewer.login -match $CopilotPrAutopilot_CopilotReviewerLoginRegex +}).Count -gt 0 + +# Force submittedAt to a stable ISO-8601 UTC string. On PowerShell 7+ +# `ConvertFrom-Json` auto-converts ISO timestamp strings to [datetime]; +# Windows PowerShell 5.1 leaves them as [string]. `Format-IsoUtcString` +# handles BOTH cases. The reason this normalisation matters: on PS 7+ +# `ConvertTo-Json` would otherwise emit a [datetime] via .NET's "o" +# format (`2026-06-07T18:06:59.0000000Z`) — but more importantly, +# downstream callers that pipe our JSON through `ConvertFrom-Json` again +# (also on PS 7+) would get another [datetime] which renders local +# culture on string interpolation, silently breaking lexicographic +# baseline comparisons. Emit a plain string so the round-trip is identity +# across both PS versions. +$submittedAtIso = if ($latest -and $latest.submittedAt) { Format-IsoUtcString $latest.submittedAt } else { $null } + +$result = [ordered]@{ + PrNumber = $PrNumber + Owner = $Owner + Repo = $Repo + HeadOid = $pr.headRefOid + State = $pr.state + LatestCopilotReview = if ($latest) { + [ordered]@{ + state = $latest.state + submittedAt = $submittedAtIso + commitOid = $latestCommitOid + bodyHead = $bodyHead + } + } else { $null } + ReviewAtHead = $reviewAtHead + NoNewComments = $noNewComments + OpenThreadCount = $openCount + OpenThreadsAwaitingReply = $awaitingCount + CopilotPending = $copilotPending + # Converged = "the agent has nothing more to do". + # PR State guard: a CLOSED / MERGED PR can never be the target of + # a productive review loop — the agent cannot push, the loop + # cannot iterate. Force Converged = false so the parent surfaces + # the PR-state change to the user instead of silently calling + # task_complete on a non-OPEN PR. + # - SingleIteration (agent decision; Copilot unavailable or + # trigger intentionally skipped): just OpenThreadsAwaitingReply + # == 0. Ignores ReviewAtHead / NoNewComments because those will + # never advance without a new Copilot review. + # - Copilot review exists or pending: ReviewAtHead && + # NoNewComments && OpenThreadsAwaitingReply == 0. + # - No Copilot review has ever been observed: just + # OpenThreadsAwaitingReply == 0 (also fires for brand-new PRs + # with zero findings; agent should still trigger via + # 01-request-review.ps1 if Copilot is enabled). + Converged = if ($pr.state -ne 'OPEN') { + $false + } elseif ($SingleIteration) { + $awaitingCount -eq 0 + } elseif ($latest -or $copilotPending) { + $reviewAtHead -and $noNewComments -and $awaitingCount -eq 0 + } else { + $awaitingCount -eq 0 + } +} +$result | ConvertTo-Json -Depth 5 -Compress diff --git a/skills/copilot-pr-autopilot/scripts/03-list-open-threads.ps1 b/skills/copilot-pr-autopilot/scripts/03-list-open-threads.ps1 new file mode 100644 index 000000000..5d583e8a9 --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/03-list-open-threads.ps1 @@ -0,0 +1,166 @@ +<# +.SYNOPSIS + List unresolved review threads on a pull request (all reviewers). + +.DESCRIPTION + Fetches review threads via GraphQL (paginated) and prints every + thread that is still `isResolved: false`. Threads from all reviewers + (Copilot, humans, other bots) are included; the triage step decides + what to do with each. + + Each thread's `comments(first:1)` is the originating review comment + — that's where `path`, `line`, and `body` come from. Reply chains + on the same thread are intentionally not surfaced here; this script + is the input to triage, not to reading conversation history. + +.PARAMETER Owner + Optional; auto-resolved from `gh repo view`. + +.PARAMETER Repo + Optional; auto-resolved from `gh repo view`. + +.PARAMETER PrNumber + The pull request number. + +.EXAMPLE + pwsh 03-list-open-threads.ps1 -PrNumber 122 + +.PARAMETER MaxBodyLength + Cap the `Body` column at this many characters (default 500; pass 0 to + disable). Long Copilot comments otherwise dominate stdout and slow + triage; truncated bodies are suffixed with `…`. +#> +[CmdletBinding()] +param( + [string]$Owner, + [string]$Repo, + + [Parameter(Mandatory = $true)] + [int]$PrNumber, + + [ValidateRange(0, 100000)] + [int]$MaxBodyLength = 500 +) + +$ErrorActionPreference = 'Stop' +. "$PSScriptRoot/_lib.ps1" + +$coords = Resolve-RepoCoords -Owner $Owner -Repo $Repo +$Owner = $coords.Owner +$Repo = $coords.Repo + +$query = @' +query($owner: String!, $repo: String!, $pr: Int!, $after: String) { + repository(owner: $owner, name: $repo) { + pullRequest(number: $pr) { + reviewThreads(first: 100, after: $after) { + pageInfo { + endCursor + hasNextPage + } + nodes { + id + isResolved + comments(first: 1) { + nodes { + author { login } + body + path + line + createdAt + } + } + } + } + } + } +} +'@ + +$all = [System.Collections.Generic.List[object]]::new() +$after = $null +do { + $ghArgs = @('-f', "query=$query", '-f', "owner=$Owner", '-f', "repo=$Repo", '-F', "pr=$PrNumber") + if ($after) { $ghArgs = $ghArgs + @('-f', "after=$after") } + + $data = Invoke-GhGraphQL -GhArgs $ghArgs -Context "list threads for $Owner/$Repo PR #$PrNumber" + # Guard against missing PR (invalid number, repo renamed, access denied) + # so the subsequent .reviewThreads dereference doesn't surface as the + # misleading "pageInfo invariant violated (pageInfo is null)" error. + if (-not $data.data.repository -or -not $data.data.repository.pullRequest) { + throw "list threads for $Owner/$Repo PR #${PrNumber}: pull request not found or inaccessible (repository or pullRequest is null in GraphQL response)." + } + $page = $data.data.repository.pullRequest.reviewThreads + # Assert pageInfo invariants BEFORE reading endCursor/hasNextPage so + # malformed connections surface as the canonical invariant message + # rather than a generic null-reference. Pull hasNextPage into a + # local so the do/while condition can't race against a re-read. + Assert-CopilotPrAutopilotPageInfo -Context "list threads for $Owner/$Repo PR #$PrNumber" -PageInfo $page.pageInfo + foreach ($n in $page.nodes) { $all.Add($n) } + $after = $page.pageInfo.endCursor + $hasNext = [bool]$page.pageInfo.hasNextPage +} while ($hasNext) + +$threads = $all.ToArray() + +$open = $threads | Where-Object { -not $_.isResolved } + +$resultList = [System.Collections.Generic.List[object]]::new() +foreach ($t in $open) { + $c = if ($t.comments -and $t.comments.nodes -and $t.comments.nodes.Count -gt 0) { $t.comments.nodes[0] } else { $null } + if (-not $c) { continue } # malformed thread (no originating comment) — skip rather than crash + # Normalise body to a string up-front. GraphQL nullable scalars come + # back as $null for hidden/deleted/malformed comment nodes, and a + # $null body would make `$body` $null and throw on `$body.Length` + # below. [string]$null is '' which is the safe rendering. + $rawBody = if ($null -eq $c.body) { '' } else { [string]$c.body } + # Use a single-quoted literal regex (`\r?\n`) instead of an interpolated + # double-quoted string (`"`r?`n"`) so the pattern is unambiguously + # readable and survives copy/paste without invisible CR/LF surprises. + $body = ($rawBody -replace '\r?\n', ' ') + if ($MaxBodyLength -gt 0 -and $body.Length -gt $MaxBodyLength) { + # MaxBodyLength is gated $-gt 0 above (so prefixLen >= 0 is + # guaranteed in this branch). At MaxBodyLength=1 you get a + # bare '…' of length 1. (The parameter itself accepts >= 0; + # `0` disables truncation by skipping this whole branch.) + $prefixLen = $MaxBodyLength - 1 + $body = $body.Substring(0, $prefixLen) + '…' + } + # Normalise Path so the documented contract (":" or "") + # holds even when the GraphQL response omits path (PR-level threads) or + # returns line without path. Sentinel '(pull request)' makes PR-level + # threads visible; when a line is provided without a path (rare — GitHub + # occasionally emits this for diff-anchored PR-level comments) we keep + # the line on the sentinel so the anchor context isn't dropped. + $path = if ([string]::IsNullOrWhiteSpace($c.path)) { + if ($null -ne $c.line) { "(pull request):$($c.line)" } else { '(pull request)' } + } elseif ($null -ne $c.line) { + "$($c.path):$($c.line)" + } else { + $c.path + } + $author = if ($c.author -and $c.author.login) { $c.author.login } else { '(deleted)' } + # Normalise createdAt via the shared helper so the format stays + # identical across 01/02/03 (Format-IsoUtcString in _lib.ps1). + $createdAt = Format-IsoUtcString $c.createdAt + $resultList.Add([pscustomobject]@{ + ThreadId = $t.id + Author = $author + Path = $path + CreatedAt = $createdAt + Body = $body + }) +} +$result = $resultList.ToArray() + +# Single-line JSON output (matches the contract used by 01/02 — 08 and 10 +# emit plain-text status lines for human eyes, not JSON). +# -Compress so callers can pipe to ConvertFrom-Json or jq directly without +# stripping PowerShell's default Format-List rendering / ANSI escapes. +[pscustomobject]@{ + PrNumber = $PrNumber + Owner = $Owner + Repo = $Repo + OpenThreadCount = $result.Count + Threads = $result +} | ConvertTo-Json -Depth 6 -Compress diff --git a/skills/copilot-pr-autopilot/scripts/08-reply-and-resolve.ps1 b/skills/copilot-pr-autopilot/scripts/08-reply-and-resolve.ps1 new file mode 100644 index 000000000..2cf65ae26 --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/08-reply-and-resolve.ps1 @@ -0,0 +1,150 @@ +<# +.SYNOPSIS + Post a reply on a Copilot review thread and resolve it. + +.DESCRIPTION + Performs the two GraphQL mutations needed to address a Copilot finding: + 1. addPullRequestReviewThreadReply — appends a reply comment. + 2. resolveReviewThread — marks the thread resolved. + + Use this for both accepted-and-fixed findings and for declined-with- + rationale findings. See ../assets/reply-fix.md, ../assets/reply-decline.md, + ../assets/reply-drift.md, and ../assets/reply-partial.md for body patterns. + +.PARAMETER ThreadId + The GraphQL node ID of the review thread (e.g. PRRT_kw...). + +.PARAMETER Body + The reply body. Markdown is supported. + +.PARAMETER NoResolve + If set, posts the reply only and leaves the thread open. Useful when + you want to start a back-and-forth discussion rather than close out the + thread. + +.PARAMETER ResolveOnly + If set, skips the reply mutation and ONLY runs resolveReviewThread. + Use this to retry just the resolve step after a previous run posted + the reply but failed on the resolve (transient 5xx / rate-limit). + Lives in a separate parameter set from -Body / -NoResolve, so the + PowerShell binder rejects passing -Body or -NoResolve together with + -ResolveOnly (rather than silently ignoring them at runtime). + +.EXAMPLE + pwsh 08-reply-and-resolve.ps1 -ThreadId PRRT_kw... -Body "Fixed in abc1234." + +.EXAMPLE + # Decline with rationale, do not resolve yet + pwsh 08-reply-and-resolve.ps1 -ThreadId PRRT_kw... -NoResolve ` + -Body "Declining: this would require cross-class plumbing for a hypothetical race." + +.NOTES + Reply + resolve are TWO separate GraphQL mutations and the script + is NOT atomic: if the reply succeeds and the resolve then fails + (transient 5xx, rate limit, thread disappeared mid-call), the reply + is already posted. The script is also NOT idempotent on reply — + re-running it with the same -ThreadId and -Body will post a + duplicate reply because -Body is mandatory. + + Recommended retry policy when a call fails: + - If "Replied to thread X" did NOT print, the reply step failed; + safe to re-run the whole command. + - If "Replied to thread X" DID print but "Resolved thread X" did + not, ONLY the resolve step failed. Re-run with -ResolveOnly to + retry just the resolve mutation (no duplicate reply): + + pwsh 08-reply-and-resolve.ps1 -ThreadId PRRT_kw... -ResolveOnly +#> +[CmdletBinding(DefaultParameterSetName = 'Reply')] +param( + [Parameter(Mandatory = $true, ParameterSetName = 'Reply')] + [Parameter(Mandatory = $true, ParameterSetName = 'ResolveOnly')] + # PS 5.1 lacks ValidateNotNullOrWhiteSpace; ValidateScript with an + # explicit message gives the same fail-at-bind-time guarantee on + # both runtimes. Mandatory alone only blocks $null / empty string, + # not all-whitespace, which would produce a confusing GraphQL + # "Could not resolve to a node with the global id of ''" error. + [ValidateScript({ + if ([string]::IsNullOrWhiteSpace($_)) { + throw "ThreadId must be a non-empty, non-whitespace GraphQL node id (e.g., PRRT_kw...)." + } + $true + })] + [string]$ThreadId, + + [Parameter(Mandatory = $true, ParameterSetName = 'Reply')] + [ValidateScript({ + if ([string]::IsNullOrWhiteSpace($_)) { + throw "Body must be a non-empty, non-whitespace reply string." + } + $true + })] + [string]$Body, + + [Parameter(ParameterSetName = 'Reply')] + [switch]$NoResolve, + + [Parameter(Mandatory = $true, ParameterSetName = 'ResolveOnly')] + [switch]$ResolveOnly +) + +$ErrorActionPreference = 'Stop' +. "$PSScriptRoot/_lib.ps1" + +if (-not $ResolveOnly) { + # Escalate-to-user safety: when invoked with -NoResolve the operator + # is handing the thread off to the human merge owner. Embed an + # HTML-comment marker (invisible in rendered Markdown) into the body + # so 10-cleanup-outdated.ps1 can recognise the hand-off and refuse + # to auto-resolve the thread later, even if the thread becomes + # outdated (e.g. the referenced file is deleted in a follow-up + # commit). The marker is appended only when not already present so + # re-runs with the same body don't accumulate markers. + $effectiveBody = $Body + if ($NoResolve -and -not $Body.Contains($CopilotPrAutopilot_EscalationMarker)) { + # TrimEnd("`r","`n") (not bare TrimEnd()) preserves trailing + # spaces and tabs that may be intentional Markdown — two + # trailing spaces are a Markdown hard line break, and the + # user's body should round-trip verbatim apart from the + # closing CR/LF stripped to make room for the marker prologue. + $effectiveBody = $Body.TrimEnd([char]13, [char]10) + "`n`n" + $CopilotPrAutopilot_EscalationMarker + } + $replyMutation = @' +mutation($tid: ID!, $body: String!) { + addPullRequestReviewThreadReply(input: { + pullRequestReviewThreadId: $tid, + body: $body + }) { + comment { id } + } +} +'@ + + $replyArgs = @('-f', "query=$replyMutation", '-f', "tid=$ThreadId", '-f', "body=$effectiveBody") + Invoke-GhGraphQL -GhArgs $replyArgs -Context "reply to thread $ThreadId" | Out-Null + Write-Output "Replied to thread $ThreadId" +} + +if (-not $NoResolve) { + $resolveMutation = @' +mutation($tid: ID!) { + resolveReviewThread(input: { threadId: $tid }) { + thread { isResolved } + } +} +'@ + $resolveArgs = @('-f', "query=$resolveMutation", '-f', "tid=$ThreadId") + $resolveResp = Invoke-GhGraphQL -GhArgs $resolveArgs -Context "resolve thread $ThreadId" + $isResolved = $resolveResp.data.resolveReviewThread.thread.isResolved + if ($isResolved -ne $true) { + # Format defensively so the error message is actionable even when + # the response is missing `thread` or `isResolved` entirely. + $observedStr = + if ($null -eq $resolveResp.data.resolveReviewThread) { '(missing resolveReviewThread node)' } + elseif ($null -eq $resolveResp.data.resolveReviewThread.thread) { '(missing thread node)' } + elseif ($null -eq $isResolved) { '(null)' } + else { "'$isResolved'" } + throw "resolveReviewThread for $ThreadId did not set isResolved=true (observed: $observedStr). Thread may still be open; re-run with -ResolveOnly after investigating." + } + Write-Output "Resolved thread $ThreadId" +} diff --git a/skills/copilot-pr-autopilot/scripts/10-cleanup-outdated.ps1 b/skills/copilot-pr-autopilot/scripts/10-cleanup-outdated.ps1 new file mode 100644 index 000000000..0fb6f7436 --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/10-cleanup-outdated.ps1 @@ -0,0 +1,897 @@ +<# +.SYNOPSIS + Batch-resolve outdated Copilot review threads on a PR. + +.DESCRIPTION + After a review loop converges, the PR may still show old `isOutdated` + Copilot threads listed as open. They were addressed by later commits + but never explicitly resolved. This script finds them and resolves them + in bulk. + + Only acts on threads where: + - isOutdated: true + - isResolved: false + - the first comment's author matches the canonical Copilot + reviewer login regex `$CopilotPrAutopilot_CopilotReviewerLoginRegex` + (matches both `copilot-pull-request-reviewer` and the + `[bot]`-suffixed `copilot-pull-request-reviewer[bot]` form + — GitHub's GraphQL `author.login` and REST APIs differ on + which form they return) + - the LAST comment's author is the authenticated user (i.e. we + have already replied to the thread). This safety guard prevents + the script from hiding actionable findings if invoked before + the review loop has converged. Override with -Force. + - the last comment does NOT carry the escalate-to-user marker + (`$CopilotPrAutopilot_EscalationMarker`, embedded automatically + by `08-reply-and-resolve.ps1 -NoResolve`). This marker signals + an intentional hand-off to the human merge owner; auto- + resolving it later (e.g. when the file is removed and the + thread becomes outdated) would break the hand-off contract. + -Force does NOT override this guard either; the marker is an + explicit operator decision. + - NO comment in the thread is from a non-Copilot, non-authenticated- + user author (the "human-in-thread" guard — if a human or another + bot has chimed in anywhere in the thread, even past Copilot's + opener, the thread is SKIPPED). This guard is NOT overridable + by -Force; auto-resolving a thread carrying human signal would + silently hide unaddressed concerns. + + The doc claim "threads from human reviewers are never touched" + therefore holds for both single-author human threads AND mixed- + authorship threads where Copilot opened and a human replied. + +.PARAMETER Owner + Repository owner (org or user). Defaults to the current repo's owner + (resolved via `gh repo view`). + +.PARAMETER Repo + Repository name. Defaults to the current repo's name. + +.PARAMETER PrNumber + The pull request number. + +.EXAMPLE + pwsh 10-cleanup-outdated.ps1 -PrNumber 122 + +.EXAMPLE + pwsh 10-cleanup-outdated.ps1 -PrNumber 122 -DryRun +#> +[CmdletBinding()] +param( + [string]$Owner, + [string]$Repo, + + [Parameter(Mandatory = $true)] + [int]$PrNumber, + + # Print what would be resolved without making any GraphQL + # mutation. Uses an explicit switch (not the PowerShell + # SupportsShouldProcess / -WhatIf machinery) so the helper's + # internal `2>` redirect doesn't inherit a WhatIfPreference and + # print cosmetic "Performing the operation Output to File" noise. + [switch]$DryRun, + + # Override the safety guard that requires the last commenter on a + # thread to be the authenticated user. Without -Force, threads + # where Copilot (or anyone other than us) had the last word are + # SKIPPED — resolving them would hide an actionable finding the + # agent hasn't replied to yet. Pass -Force only when you're + # intentionally clearing stale outdated Copilot threads out of + # band of the convergence loop. + # + # -Force does NOT override the human-in-thread guard: threads with + # any non-Copilot, non-authenticated-user comment anywhere in the + # thread are always skipped regardless of -Force. + [switch]$Force, + + # When set, the script exits with code 2 if any thread was skipped + # for an authorship/pagination reason that the operator likely needs + # to investigate (skippedHumanInThread, skippedUnknownAuthorInThread, + # skippedMissingAuthorship, skippedPaginationError). Useful when this + # script is wired into automation that needs to surface "I cleaned + # what I could, but human follow-up is required" as a non-success + # outcome. + # + # The -Strict gate ALWAYS evaluates, regardless of -DryRun and + # regardless of whether any targets remained after skipping (so + # automation sees the strict-skip signal even when 'everything was + # skipped' or during a dry run). exit 1 (mutation failure during + # a real run) still takes precedence over exit 2. + # + # Note: skippedAwaitingReply is NOT included — that one is the normal + # last-comment-author guard and is fully overridable via -Force, so + # treating it as failure would be noisy in routine use. + [switch]$Strict +) + +$ErrorActionPreference = 'Stop' +. "$PSScriptRoot/_lib.ps1" + +$coords = Resolve-RepoCoords -Owner $Owner -Repo $Repo +$Owner = $coords.Owner +$Repo = $coords.Repo + +# Authenticated-user login — needed (unless -Force) so we only resolve +# threads where we have actually replied. Mirrors the pattern in +# 02-check-review-status.ps1. +$meR = Invoke-Gh -GhArgs @('api','user','--jq','.login') +if ($meR.ExitCode -ne 0) { + throw "gh api user failed (exit $($meR.ExitCode)) (credentials redacted): $(Format-CopilotPrAutopilotErrorPreview -Text $meR.Stderr)" +} +$me = $meR.Stdout.Trim() + +$query = @' +query($owner: String!, $repo: String!, $pr: Int!, $after: String) { + repository(owner: $owner, name: $repo) { + pullRequest(number: $pr) { + reviewThreads(first: 50, after: $after) { + pageInfo { + endCursor + hasNextPage + } + nodes { + id + isResolved + isOutdated + firstComment: comments(first: 1) { + nodes { author { login } } + } + lastComment: comments(last: 1) { + nodes { author { login } body } + } + # First page of comment authors so we can reject threads where + # any non-Copilot, non-$me participant has commented (a human + # or different bot chimed in after Copilot's opener). The doc + # promise "threads from human reviewers are never touched" + # must hold even when the human comment is not the FIRST. + # totalCount lets us detect threads that exceed the initial + # window; for those we paginate the rest via the node(id:) + # query below so authorship visibility is always complete. + # Outer page is 50 threads × 30 author samples = 1500 nodes, + # well under GitHub's GraphQL "max node count" complexity + # ceiling for a single request and small enough to keep + # latency low; threads with more comments fall through to + # the per-thread node(id:) pagination below. + # + # We intentionally fetch ONLY `author.login` (not `body`) on + # this outer window to keep the payload small — comment + # bodies can be large and we only need the escalation-marker + # scan for the small subset of threads that survive all + # other safety gates. That marker scan is implemented in + # `Test-CopilotPrAutopilotThreadEscalated` (below) and + # lazily paginates per-thread bodies via `node(id:)` only + # when about to resolve. + allComments: comments(first: 30) { + totalCount + pageInfo { endCursor hasNextPage } + nodes { author { login } } + } + } + } + } + } +} +'@ + +$all = [System.Collections.Generic.List[object]]::new() +$after = $null +do { + $ghArgs = @('-f', "query=$query", '-f', "owner=$Owner", '-f', "repo=$Repo", '-F', "pr=$PrNumber") + if ($after) { $ghArgs = $ghArgs + @('-f', "after=$after") } + + $data = Invoke-GhGraphQL -GhArgs $ghArgs -Context "list outdated threads for $Owner/$Repo PR #$PrNumber" + # Guard against missing PR (invalid number, repo renamed, access denied) + # so the subsequent .reviewThreads dereference doesn't surface as the + # misleading "pageInfo invariant violated (pageInfo is null)" error. + if (-not $data.data.repository -or -not $data.data.repository.pullRequest) { + throw "list outdated threads for $Owner/$Repo PR #${PrNumber}: pull request not found or inaccessible (repository or pullRequest is null in GraphQL response)." + } + $page = $data.data.repository.pullRequest.reviewThreads + # Assert pageInfo invariants BEFORE we read endCursor/hasNextPage so + # a malformed connection surfaces as the canonical invariant message + # (with context) rather than a generic null-reference / "$null can't + # be cast to bool" error from the dereference itself. Pull + # hasNextPage into a local so the do/while condition can't race + # against a property re-read. + Assert-CopilotPrAutopilotPageInfo -Context "list outdated threads for $Owner/$Repo PR #$PrNumber" -PageInfo $page.pageInfo + foreach ($n in $page.nodes) { $all.Add($n) } + $after = $page.pageInfo.endCursor + $hasNext = [bool]$page.pageInfo.hasNextPage +} while ($hasNext) + +$threads = $all.ToArray() + +# Comment-pagination helper: when a thread's first 30 comments don't +# cover the full set (pageInfo.hasNextPage=true on the outer query), +# fetch the remaining pages via node(id:) so authorship visibility is +# always complete. +# Returns the full ordered list of author logins (may include $null +# entries for ghost/deleted users — callers must tolerate $null). +# +# Guarantees: +# * Uses a typed List[object] (preserves $null author entries for +# ghost / deleted users) to avoid O(n^2) array growth via `+=`. +# * Hard upper bound of `30 + MaxPages * 100` comments to prevent a +# runaway loop if the server ever returns an invalid pageInfo +# (cursor that never advances). The outer query window is 30 +# (the FirstPage handed in); the in-loop node(id:) query window +# is 100, capped by MaxPages. +# Default MaxPages=200 → 30 + 20,000 = 20,030 comments, three +# orders of magnitude beyond any plausible PR thread. +# * Throws with explicit context if the paginated `node(id:)` query +# returns $null (e.g., thread deleted mid-pagination) so the failure +# bubbles up rather than silently producing partial authorship. +function Get-AllThreadAuthors { + [CmdletBinding()] + param( + [Parameter(Mandatory)] [string]$ThreadId, + [Parameter(Mandatory)] $FirstPage, # the allComments object from the outer query + [int]$MaxPages = 200, + # Optional early-exit predicate. Called with the candidate + # author login (which may be $null for ghost / deleted users). + # When it returns $true the function stops paginating and + # returns the authors collected so far. Lets callers that + # only need an existence check (e.g. "does any non-Copilot, + # non-$me author exist?") avoid burning sequential GraphQL + # calls on the remaining pages of a 1000-comment thread. + [scriptblock]$StopPredicate + ) + + # List[object] (not List[string]) so $null author entries — which + # GitHub returns for deleted / ghost users — round-trip as $null + # instead of being coerced to ''. Callers rely on `-not $login` to + # skip both, but preserving the original shape keeps the contract + # honest for any future caller. + $authors = [System.Collections.Generic.List[object]]::new() + # Local flag set by StopPredicate; checked at every author and + # at the page boundary so we exit both the inner foreach and the + # outer while loop on the first matching author. + $stopRequested = $false + + $firstNodes = @() + if ($FirstPage -and $FirstPage.nodes) { $firstNodes = @($FirstPage.nodes) } + foreach ($n in $firstNodes) { + $login = $null + if ($n -and $n.author) { $login = $n.author.login } + $authors.Add($login) + if ($StopPredicate -and (& $StopPredicate $login)) { + $stopRequested = $true + break + } + } + + # Pagination invariant: a true hasNextPage with empty endCursor + # would either re-fetch the same page or loop without cursor + # advancement. A null pageInfo also fails closed. Assert BEFORE + # reading hasNextPage/endCursor so we never dereference pre-assert + # values (consistent with the "assert then dereference" pattern + # used in the other pagination loops in this skill). + if ($FirstPage) { + Assert-CopilotPrAutopilotPageInfo -Context "Get-AllThreadAuthors first page for thread $ThreadId" -PageInfo $FirstPage.pageInfo + } + + $hasNext = $false + $after = $null + if ($FirstPage -and $FirstPage.pageInfo) { + $hasNext = [bool]$FirstPage.pageInfo.hasNextPage + $after = $FirstPage.pageInfo.endCursor + } + + $pageQuery = @' +query($id: ID!, $after: String) { + node(id: $id) { + ... on PullRequestReviewThread { + comments(first: 100, after: $after) { + pageInfo { endCursor hasNextPage } + nodes { author { login } } + } + } + } +} +'@ + + # First fetched page is labeled 1: $pageIndex initialised to 0 here, + # incremented at loop entry. The outer (pre-loop) query is page 0; + # this loop fetches additional pages, so MaxPages caps the in-loop + # iterations, making the total bound (outer + paginated) = MaxPages + 1. + # We also derive a totalCount-based ceiling and use the MINIMUM of + # MaxPages and the derived bound — most threads have a handful of + # comments and shouldn't burn 200 API calls just because the safety + # ceiling allows it. The +2 fudge covers concurrent comments added + # during pagination (rare but possible). + $totalCount = if ($FirstPage -and $null -ne $FirstPage.totalCount) { [int]$FirstPage.totalCount } else { 0 } + $pagesNeededFromTotal = if ($totalCount -gt $firstNodes.Count) { + [int][Math]::Ceiling(($totalCount - $firstNodes.Count) / 100.0) + 2 + } else { 0 } + $effectiveMaxPages = if ($pagesNeededFromTotal -gt 0) { + [Math]::Min($MaxPages, $pagesNeededFromTotal) + } else { + $MaxPages + } + $pageIndex = 0 + while ($hasNext -and -not $stopRequested) { + $pageIndex++ + if ($pageIndex -gt $effectiveMaxPages) { + throw "Get-AllThreadAuthors: exceeded effectiveMaxPages=$effectiveMaxPages (MaxPages=$MaxPages, totalCount=$totalCount) for thread $ThreadId — likely a malformed server response (cursor not advancing) or unexpected comment growth during pagination." + } + + $pageArgs = @('-f', "query=$pageQuery", '-f', "id=$ThreadId") + if ($after) { $pageArgs = $pageArgs + @('-f', "after=$after") } + $pageData = Invoke-GhGraphQL -GhArgs $pageArgs -Context "paginate comments for thread $ThreadId (page $pageIndex)" + + $threadNode = $null + if ($pageData -and $pageData.data) { $threadNode = $pageData.data.node } + if (-not $threadNode) { + throw "Get-AllThreadAuthors: node(id: '$ThreadId') returned null on page $pageIndex (thread deleted or inaccessible)." + } + $pageBody = $threadNode.comments + if (-not $pageBody) { + throw "Get-AllThreadAuthors: thread $ThreadId has no comments connection on page $pageIndex." + } + + $pageNodes = @() + if ($pageBody.nodes) { $pageNodes = @($pageBody.nodes) } + foreach ($n in $pageNodes) { + $login = $null + if ($n -and $n.author) { $login = $n.author.login } + $authors.Add($login) + if ($StopPredicate -and (& $StopPredicate $login)) { + $stopRequested = $true + break + } + } + if ($stopRequested) { break } + + $prevCursor = $after + # Assert-then-dereference: validate $pageBody.pageInfo BEFORE + # reading hasNextPage/endCursor so a malformed connection + # surfaces as the canonical invariant message (with context) + # rather than a generic null-reference / "$null can't be cast + # to bool" error from the dereference itself. Consistent with + # the pattern used in 01/02/03 pagination loops. + Assert-CopilotPrAutopilotPageInfo -Context "Get-AllThreadAuthors page $pageIndex for thread $ThreadId" -PageInfo $pageBody.pageInfo + $hasNext = $false + $after = $null + if ($pageBody.pageInfo) { + $hasNext = [bool]$pageBody.pageInfo.hasNextPage + $after = $pageBody.pageInfo.endCursor + } + # Belt-and-suspenders: if the server claims more pages but the + # cursor didn't advance, MaxPages would still catch it — but + # bail explicitly with a clearer message. + if ($hasNext -and $after -eq $prevCursor) { + throw "Get-AllThreadAuthors: pagination cursor did not advance for thread $ThreadId on page $pageIndex (server returned same endCursor='$after')." + } + } + + # Return as an array (comma-prefix prevents PowerShell from + # unwrapping a single-element list at the call boundary). + return ,$authors.ToArray() +} + +# Test-CopilotPrAutopilotThreadEscalated — lazy escalation-marker +# scanner. Returns $true if ANY comment body in the thread contains +# `$CopilotPrAutopilot_EscalationMarker`, $false otherwise. +# +# Why lazy: the marker only matters for threads that have survived +# every other safety gate and would otherwise be auto-resolved. That +# set is small (typically 0–5 threads per PR), so paginating comment +# bodies on demand is dramatically cheaper than fetching `body` for +# every comment of every thread on every cleanup run (which on a +# busy PR with 50 threads × 30 comments would balloon the GraphQL +# response by orders of magnitude and raise the chance of API-side +# response-size / complexity failures). +# +# Algorithm: +# 1. Check `lastComment.body` (already on the outer query) — the +# common case (operator just escalated → marker IS the last +# comment) terminates in zero extra API calls. +# 2. If totalCount <= 1 OR `!hasNextPage` from the outer authors +# window, there's nothing more to look at — return $false. +# 3. Otherwise paginate `comments(first: 100, after: cursor)` +# via `node(id:)` fetching ONLY `body`, returning $true the +# moment the marker appears. +# +# MaxPages bound mirrors Get-AllThreadAuthors (default 200 → 20,000 +# comments, three orders of magnitude beyond any plausible thread). +function Test-CopilotPrAutopilotThreadEscalated { + [CmdletBinding()] + param( + [Parameter(Mandatory)] [string]$ThreadId, + [Parameter(Mandatory)] [AllowEmptyCollection()] $FirstPage, # outer query's allComments + [Parameter(Mandatory)] [AllowNull()] [AllowEmptyString()] [string]$LastCommentBody, + [Parameter(Mandatory)] [AllowEmptyString()] [string]$Marker, + [int]$MaxPages = 200 + ) + + if (-not $Marker) { return $false } + # Fast path: marker is the most recent comment (the operator just + # escalated). Costs zero extra GraphQL. + if ($LastCommentBody -and $LastCommentBody.Contains($Marker)) { return $true } + + if (-not $FirstPage) { return $false } + $firstNodes = @() + if ($FirstPage.nodes) { $firstNodes = @($FirstPage.nodes) } + $totalCount = if ($null -ne $FirstPage.totalCount) { [int]$FirstPage.totalCount } else { 0 } + + # Nothing more to inspect: the lastComment check above already + # covered the only candidate(s). Note: the outer window omits + # `body`, so we cannot scan the first-page nodes here — but the + # only body we genuinely care about beyond `lastComment` is one + # buried deeper in the history, and that's exactly what the + # paginated path below fetches. + if ($totalCount -le 1) { return $false } + + # If the first page covered all comments AND we already checked + # the last one above, there's no "middle" body we haven't seen. + # But the outer window doesn't have bodies, so we still need to + # paginate from scratch when totalCount > 1 — UNLESS the only + # other comment is the firstComment opener (Copilot's original + # finding) which by construction can't contain the operator's + # escalation marker. Detect that and short-circuit. + # + # Validate the pageInfo invariant BEFORE dereferencing + # `hasNextPage` so a malformed connection surfaces as the + # canonical invariant message (with context) rather than a + # generic null-reference / "$null can't be cast to bool" error. + # Consistent with the assert-then-dereference pattern used in + # the other pagination loops in this skill. + Assert-CopilotPrAutopilotPageInfo -Context "Test-CopilotPrAutopilotThreadEscalated first page for thread $ThreadId" -PageInfo $FirstPage.pageInfo + $firstPageHasNext = [bool]$FirstPage.pageInfo.hasNextPage + if ($totalCount -le 2 -and -not $firstPageHasNext) { + # Two comments total: opener (Copilot) + lastComment. We + # already checked the lastComment body. The opener cannot + # contain the operator's escalation marker. + return $false + } + + # Paginated path: fetch ONLY body, page-by-page, returning $true + # the moment the marker appears. We re-fetch from the start (no + # cursor) because the outer query didn't capture bodies; the + # node(id:) variant uses its own cursor. + $pageQuery = @' +query($id: ID!, $after: String) { + node(id: $id) { + ... on PullRequestReviewThread { + comments(first: 100, after: $after) { + pageInfo { endCursor hasNextPage } + nodes { body } + } + } + } +} +'@ + + $hasNext = $true + $after = $null + $pageIndex = 0 + while ($hasNext) { + $pageIndex++ + if ($pageIndex -gt $MaxPages) { + throw "Test-CopilotPrAutopilotThreadEscalated: exceeded MaxPages=$MaxPages for thread $ThreadId — likely a malformed server response (cursor not advancing)." + } + + $pageArgs = @('-f', "query=$pageQuery", '-f', "id=$ThreadId") + if ($after) { $pageArgs = $pageArgs + @('-f', "after=$after") } + $pageData = Invoke-GhGraphQL -GhArgs $pageArgs -Context "scan escalation marker for thread $ThreadId (page $pageIndex)" + + $threadNode = $null + if ($pageData -and $pageData.data) { $threadNode = $pageData.data.node } + if (-not $threadNode) { + throw "Test-CopilotPrAutopilotThreadEscalated: node(id: '$ThreadId') returned null on page $pageIndex (thread deleted or inaccessible)." + } + $pageBody = $threadNode.comments + if (-not $pageBody) { + throw "Test-CopilotPrAutopilotThreadEscalated: thread $ThreadId has no comments connection on page $pageIndex." + } + + if ($pageBody.nodes) { + foreach ($n in $pageBody.nodes) { + if ($n -and $n.body -and $n.body.Contains($Marker)) { return $true } + } + } + + $prevCursor = $after + Assert-CopilotPrAutopilotPageInfo -Context "Test-CopilotPrAutopilotThreadEscalated page $pageIndex for thread $ThreadId" -PageInfo $pageBody.pageInfo + $hasNext = $false + $after = $null + if ($pageBody.pageInfo) { + $hasNext = [bool]$pageBody.pageInfo.hasNextPage + $after = $pageBody.pageInfo.endCursor + } + if ($hasNext -and $after -eq $prevCursor) { + throw "Test-CopilotPrAutopilotThreadEscalated: pagination cursor did not advance for thread $ThreadId on page $pageIndex (server returned same endCursor='$after')." + } + } + return $false +} + +$copilotLoginRegex = $CopilotPrAutopilot_CopilotReviewerLoginRegex # namespaced canonical from _lib.ps1 + +# Build $targets via an explicit foreach instead of Where-Object {...}. +# The earlier Where-Object predicate did real work (warnings, try/catch, +# pagination calls, counter mutation) — Where-Object's script-block runs +# in a child scope, which is why those counters had to be $script:. A +# plain foreach keeps the predicate semantics readable, lets the +# counters be normal locals, and makes error handling easier to follow. +[int]$skippedAwaitingReply = 0 +[int]$skippedEscalatedToUser = 0 +[int]$skippedHumanInThread = 0 +[int]$skippedUnknownAuthorInThread = 0 +[int]$skippedMissingAuthorship = 0 +[int]$skippedPaginationError = 0 +# Collect skipped thread IDs per reason so operators running with +# -Strict (or just inspecting -Verbose output) can pinpoint exactly +# which threads need manual investigation, rather than only seeing +# a counter. +$skippedHumanInThreadIds = [System.Collections.Generic.List[string]]::new() +$skippedUnknownAuthorInThreadIds = [System.Collections.Generic.List[string]]::new() +$targets = [System.Collections.Generic.List[object]]::new() + +foreach ($thread in $threads) { + if (-not $thread.isOutdated) { continue } + if ($thread.isResolved) { continue } + + # Defensive null guard around the GraphQL shape. The reviewer + # login can appear as either `copilot-pull-request-reviewer` or + # `copilot-pull-request-reviewer[bot]` depending on the GraphQL + # surface; match both with the same regex used elsewhere. + $firstAuthor = $null + if ($thread.firstComment -and $thread.firstComment.nodes -and $thread.firstComment.nodes.Count -gt 0 -and $thread.firstComment.nodes[0].author) { + $firstAuthor = $thread.firstComment.nodes[0].author.login + } + if (-not ($firstAuthor -and ($firstAuthor -match $copilotLoginRegex))) { continue } + + # Human-in-thread guard: if ANY comment in the thread is from a + # non-Copilot, non-$me author (i.e., a human or different bot chimed + # in after Copilot's opener), refuse to auto-resolve even with -Force. + # The doc claim "threads from human reviewers are never touched" + # must hold regardless of *position* of the human comment — a thread + # with mixed authorship still carries human signal that must not + # silently disappear when the loop calls cleanup. + # + # The outer query fetches the first 30 comment authors; when the + # connection's pageInfo.hasNextPage is true, Get-AllThreadAuthors + # paginates the rest via node(id:) so authorship visibility is + # always complete. hasNextPage is the canonical connection signal + # for "more pages exist" — using it directly is more robust than + # comparing totalCount vs nodes.Count (totalCount is kept on the + # query for diagnostics but not used as the pagination trigger). + # Fail-safe: if the allComments connection itself is malformed + # (missing or null pageInfo), we cannot prove authorship is + # complete. Treat it as "skip and leak the outdated thread" + # rather than risk auto-resolving a thread whose later comments + # might include a human reviewer. Same posture as the + # totalCount-vs-nodes mismatch below. + if (-not $thread.allComments -or -not $thread.allComments.pageInfo) { + $skippedMissingAuthorship++ + Write-Warning "Thread $($thread.id): allComments or allComments.pageInfo is null — skipping (fail-safe: cannot trust authorship view when pagination metadata is missing)." + continue + } + # Route through the shared invariant guard before reading + # hasNextPage so a malformed allComments.pageInfo (missing + # hasNextPage property, or hasNextPage=true with missing/empty + # endCursor) is caught as a fail-safe skip rather than silently + # coercing to $hasMore=$false and treating the first page as + # complete — which would risk auto-resolving a thread whose + # unseen later comments include a human reviewer. + try { + Assert-CopilotPrAutopilotPageInfo -Context "Thread $($thread.id) allComments.pageInfo" -PageInfo $thread.allComments.pageInfo + } catch { + $skippedMissingAuthorship++ + Write-Warning "Thread $($thread.id): $($_.Exception.Message) — skipping (fail-safe: malformed pageInfo cannot guarantee complete authorship view)." + continue + } + $hasMore = [bool]$thread.allComments.pageInfo.hasNextPage + # Defensive cross-check: if the connection reports totalCount > what + # we received in nodes but hasNextPage is false, the response is + # internally inconsistent (API edge case / partial response). We + # cannot safely paginate (Get-AllThreadAuthors reads hasNextPage / + # endCursor from the SAME pageInfo, so it would just return the + # first page anyway), so fail-safe by skipping the thread under + # the missing-authorship counter — better to leak an outdated bot + # thread than auto-resolve one whose later comments we can't see. + if (-not $hasMore) { + $tc = $thread.allComments.totalCount + $nc = if ($null -eq $thread.allComments.nodes) { 0 } else { @($thread.allComments.nodes).Count } + if ($null -ne $tc -and $tc -gt $nc) { + $skippedMissingAuthorship++ + Write-Warning "Thread $($thread.id): allComments.totalCount=$tc but received $nc node(s) with hasNextPage=false — skipping (fail-safe: cannot trust incomplete authorship view)." + continue + } + } + $allAuthors = $null + # Pre-scan the first page authors for a disqualifying case (human or + # ghost/null author) before deciding whether to paginate. If the + # first page already proves the thread isn't bot-only, we can skip + # the extra GraphQL pagination calls entirely (rate-limit-friendly + # on long threads). + $firstPageEarlySkip = $null + if ($thread.allComments -and $thread.allComments.nodes -and @($thread.allComments.nodes).Count -gt 0) { + foreach ($n in $thread.allComments.nodes) { + $login = if ($n -and $n.author) { $n.author.login } else { $null } + if (-not $login) { $firstPageEarlySkip = 'unknown'; break } + if ($login -eq $me) { continue } + if ($login -match $copilotLoginRegex) { continue } + $firstPageEarlySkip = 'human'; break + } + } + if ($firstPageEarlySkip -eq 'human') { + $skippedHumanInThread++ + [void]$skippedHumanInThreadIds.Add($thread.id) + continue + } + if ($firstPageEarlySkip -eq 'unknown') { + $skippedUnknownAuthorInThread++ + [void]$skippedUnknownAuthorInThreadIds.Add($thread.id) + continue + } + if ($hasMore) { + # Per-thread try/catch so a transient pagination failure on ONE + # thread (cursor not advancing, node(id:) returning null mid-walk, + # rate-limit etc.) doesn't abort the rest of the cleanup pass — + # mirrors the per-thread isolation already used for the resolve + # mutation below. On failure: fail-safe by SKIPPING the thread + # (never resolve when authorship is unknown). + try { + # StopPredicate short-circuit: as soon as we observe a + # disqualifying author (null/ghost or a non-Copilot, + # non-$me login), Get-AllThreadAuthors stops paginating + # and returns. The downstream foreach loop then re-detects + # the same disqualifier in O(1) and classifies the skip + # reason — but we've avoided burning a sequential GraphQL + # call per remaining 100-comment page on long threads + # (rate-limit + latency friendly). + $stopOnDisqualifier = { + param($login) + if (-not $login) { return $true } # ghost / deleted + if ($login -eq $me) { return $false } + if ($login -match $copilotLoginRegex) { return $false } + return $true # human-in-thread + } + $allAuthors = Get-AllThreadAuthors -ThreadId $thread.id -FirstPage $thread.allComments -StopPredicate $stopOnDisqualifier + } catch { + $skippedPaginationError++ + Write-Warning "Pagination failed for thread $($thread.id) — skipping (fail-safe): $($_.Exception.Message)" + continue + } + } else { + # Treat a null/malformed/EMPTY allComments connection as unknown + # authorship (fail-safe). If the connection is missing or the + # nodes array is empty, we can't prove the thread is bot-only, + # so we must NOT auto-resolve — otherwise the human-in-thread + # guard would vacuously pass and silently surface threads for + # resolution without verifying authorship. Every valid thread + # has at least one comment (the one that opened it), so empty + # nodes after a successful query means malformed response. + if (-not $thread.allComments -or + $null -eq $thread.allComments.nodes -or + @($thread.allComments.nodes).Count -eq 0) { + $skippedMissingAuthorship++ + continue + } + $allAuthorsList = [System.Collections.Generic.List[object]]::new() + foreach ($n in $thread.allComments.nodes) { + # Null-guard `$n` too: GraphQL list elements are nullable by + # default unless explicitly NonNull in the schema, and a + # `null` node would otherwise NPE on the `.author` access. + # Treat it like a `$null` author (unknown → fail-safe in the + # downstream `if (-not $login)` branch). + $login = if ($n -and $n.author) { $n.author.login } else { $null } + $allAuthorsList.Add($login) + } + $allAuthors = $allAuthorsList.ToArray() + } + + $humanInThread = $false + $unknownAuthorInThread = $false + foreach ($login in $allAuthors) { + if (-not $login) { + # `$null` author = ghost / deleted user. Authorship is + # genuinely unknown, so treat as unsafe (fail-safe): we'd + # rather leak an outdated bot thread than auto-resolve a + # thread that *might* contain human signal hidden behind + # a deleted account. Surfaced separately from + # $humanInThread so summaries can distinguish "human + # touched this" from "we couldn't tell who touched this". + $unknownAuthorInThread = $true + continue + } + if ($login -eq $me) { continue } + if ($login -match $copilotLoginRegex) { continue } + $humanInThread = $true + break + } + if ($humanInThread) { + $skippedHumanInThread++ + [void]$skippedHumanInThreadIds.Add($thread.id) + continue + } + if ($unknownAuthorInThread) { + $skippedUnknownAuthorInThread++ + [void]$skippedUnknownAuthorInThreadIds.Add($thread.id) + continue + } + + # Safety guard: don't resolve threads where Copilot (or anyone + # other than us) had the last word — we haven't replied yet, so + # resolving would hide an actionable finding. Override with -Force. + $lastAuthor = $null + $lastBody = $null + if ($thread.lastComment -and $thread.lastComment.nodes -and $thread.lastComment.nodes.Count -gt 0) { + if ($thread.lastComment.nodes[0].author) { $lastAuthor = $thread.lastComment.nodes[0].author.login } + $lastBody = $thread.lastComment.nodes[0].body + } + if (-not $Force -and $lastAuthor -ne $me) { + $skippedAwaitingReply++ + continue + } + + # Escalate-to-user safety: 08-reply-and-resolve embeds the + # $CopilotPrAutopilot_EscalationMarker into its body when invoked + # with -NoResolve so the human merge owner can act on the thread. + # If that thread later becomes outdated (e.g. the file was deleted + # in a follow-up commit), auto-resolving it here would silently + # close the hand-off and break the workflow contract that + # escalated threads stay open. -Force does NOT override this: the + # marker is an explicit operator decision, not a default. + # + # Marker scan is LAZY: only invoked at this gate, after all other + # safety checks have passed. The implementation first checks the + # already-fetched lastComment body (fast path) and only paginates + # comment bodies via node(id:) when the marker could be buried + # earlier in the history (e.g. Copilot re-posted after escalation + # reply, pushing the marker out of the lastComment window). + if (Test-CopilotPrAutopilotThreadEscalated ` + -ThreadId $thread.id ` + -FirstPage $thread.allComments ` + -LastCommentBody ([string]$lastBody) ` + -Marker $CopilotPrAutopilot_EscalationMarker) { + $skippedEscalatedToUser++ + continue + } + + $targets.Add($thread) +} + +# Helper: format a "Thread IDs:" trailer, capped to keep summary +# lines bounded on PRs with many skipped threads. Default cap is 20; +# operator can pass -Verbose to log the full list one-per-line via +# Write-Verbose (kept off the default Write-Output stream so CI logs +# stay parseable). +function Format-SkippedIdsTrailer { + param( + [Parameter(Mandatory)][AllowEmptyCollection()][System.Collections.IList]$Ids, + [ValidateRange(1, [int]::MaxValue)] + [int]$MaxInline = 20 + ) + $count = $Ids.Count + if ($count -eq 0) { return '' } + if ($count -le $MaxInline) { + return " Thread IDs: $($Ids -join ', ')" + } + $shown = @($Ids)[0..($MaxInline - 1)] -join ', ' + $rest = $count - $MaxInline + return " Thread IDs (first $MaxInline of $count): $shown, ... and $rest more (re-run with -Verbose for the full list)." +} + +if ($skippedHumanInThread -gt 0) { + $trailer = Format-SkippedIdsTrailer -Ids $skippedHumanInThreadIds + Write-Output "Skipped $skippedHumanInThread outdated Copilot thread(s) where a non-Copilot, non-'$me' commenter participated (-Force does NOT override this — human signal must not silently disappear).$trailer" + if ($skippedHumanInThreadIds.Count -gt 20) { + foreach ($id in $skippedHumanInThreadIds) { Write-Verbose "skippedHumanInThread: $id" } + } +} +if ($skippedUnknownAuthorInThread -gt 0) { + $trailer = Format-SkippedIdsTrailer -Ids $skippedUnknownAuthorInThreadIds + Write-Output "Skipped $skippedUnknownAuthorInThread outdated Copilot thread(s) with at least one ghost / deleted-user (null author) comment (-Force does NOT override this — authorship is unknown, fail-safe is to skip).$trailer" + if ($skippedUnknownAuthorInThreadIds.Count -gt 20) { + foreach ($id in $skippedUnknownAuthorInThreadIds) { Write-Verbose "skippedUnknownAuthorInThread: $id" } + } +} +if ($skippedMissingAuthorship -gt 0) { + Write-Output "Skipped $skippedMissingAuthorship outdated Copilot thread(s) where the GraphQL response returned a null / missing allComments connection so authorship could not be verified (fail-safe: skip rather than vacuously auto-resolve)." +} +if ($skippedAwaitingReply -gt 0) { + Write-Output "Skipped $skippedAwaitingReply outdated Copilot thread(s) where the last comment is not from '$me' (pass -Force to override)." +} +if ($skippedEscalatedToUser -gt 0) { + Write-Output "Skipped $skippedEscalatedToUser outdated Copilot thread(s) that contain the escalate-to-user marker in any comment (intentionally left open via 08-reply-and-resolve -NoResolve for the human merge owner; -Force does NOT override this)." +} +if ($skippedPaginationError -gt 0) { + Write-Output "Skipped $skippedPaginationError outdated Copilot thread(s) due to authorship-pagination errors (fail-safe: never resolve when authorship is unknown). See warnings above for per-thread detail." +} + +if ($targets.Count -eq 0) { + Write-Output 'No outdated Copilot threads to clean up.' + # Don't `return` here — -Strict must still evaluate even when nothing + # remained to clean up (the 'everything was skipped' case). +} else { + Write-Output "Found $($targets.Count) outdated Copilot thread(s) to resolve." + + $resolveMutation = @' +mutation($tid: ID!) { + resolveReviewThread(input: { threadId: $tid }) { + thread { isResolved } + } +} +'@ + + # Per-thread try/catch so a single mutation failure (rate-limit, + # transient GraphQL error, thread-disappeared-mid-loop) does NOT abort + # the whole cleanup pass and leave the remaining outdated threads + # unresolved. Track successes and failures, then summarise at the end + # with a non-zero exit code if any thread failed. + $resolved = 0 + $failed = New-Object System.Collections.Generic.List[object] + foreach ($t in $targets) { + if ($DryRun) { + Write-Output "Would resolve $($t.id) (DryRun)" + continue + } + try { + $resolveArgs = @('-f', "query=$resolveMutation", '-f', "tid=$($t.id)") + $resolveResp = Invoke-GhGraphQL -GhArgs $resolveArgs -Context "resolve outdated thread $($t.id)" + # Verify the mutation actually resolved the thread. The mutation + # selects `thread { isResolved }`, so a structurally-successful + # response that doesn't flip `isResolved` to true (partial + # failure, already-resolved-by-another-actor edge case, schema + # drift) would otherwise be misreported as success. Treat any + # non-true value (false, $null, missing field) as a failure. + # Rename locally to $resolvedThread (vs the earlier outer-scope + # use of $thread for review-thread iteration) so the resolve + # verification reads unambiguously when scanning the file. + $resolvedThread = $resolveResp.data.resolveReviewThread.thread + if (-not ($resolvedThread -and $resolvedThread.isResolved -eq $true)) { + $observed = if ($resolvedThread) { ($resolvedThread.isResolved | Out-String).Trim() } else { '(missing thread node)' } + throw "resolveReviewThread returned success but thread.isResolved is not true (observed: $observed)" + } + Write-Output "Resolved $($t.id)" + $resolved++ + } catch { + $msg = $_.Exception.Message + Write-Warning "Failed to resolve $($t.id): $msg" + $failed.Add([pscustomobject]@{ ThreadId = $t.id; Error = $msg }) + } + } + + if (-not $DryRun) { + Write-Output "Cleanup summary: resolved=$resolved failed=$($failed.Count) skippedAwaitingReply=$skippedAwaitingReply skippedHumanInThread=$skippedHumanInThread skippedUnknownAuthorInThread=$skippedUnknownAuthorInThread skippedMissingAuthorship=$skippedMissingAuthorship skippedPaginationError=$skippedPaginationError" + if ($failed.Count -gt 0) { + # Normalize each error message to a single line and bound its + # length BEFORE concatenating into the summary. Without this, + # a multi-line GraphQL FORBIDDEN payload echoed by `gh` could + # flood CI logs across N failed threads (newlines break log + # collectors that expect line-delimited records, and large + # payloads can drown the summary). Each error is also passed + # through the redactor-aware Format helper so credentials are + # scrubbed and the per-error preview is capped — the joined + # summary stays parseable even when every thread failed with + # a verbose response. 200 chars/error keeps the joined line + # readable when N>5 (5 × ~250 = ~1250 visible chars). + $failedFmt = $failed | ForEach-Object { + $oneLine = (Format-CopilotPrAutopilotErrorPreview -Text $_.Error -MaxChars 200) -replace '\s*[\r\n]+\s*', ' ' + "$($_.ThreadId) ($oneLine)" + } + Write-Output ("Failed threads (credentials redacted): " + ($failedFmt -join '; ')) + # exit 1 takes precedence over the -Strict exit 2 below: a + # mutation failure is a stronger signal than a documented + # skip-for-investigation outcome. + exit 1 + } + } +} + +# -Strict: treat human/unknown-authorship/missing-authorship/pagination +# skips as a non-success outcome so automation can surface "manual +# follow-up required". Evaluated ALWAYS (including under -DryRun and +# including when $targets.Count was 0) so the strict-skip signal is +# never silently lost. skippedAwaitingReply is intentionally excluded +# (overridable via -Force; routine in normal use). +if ($Strict) { + $strictSkipped = $skippedHumanInThread + $skippedUnknownAuthorInThread + $skippedMissingAuthorship + $skippedPaginationError + if ($strictSkipped -gt 0) { + Write-Output "-Strict: $strictSkipped thread(s) skipped for authorship/pagination reasons that require manual investigation; exiting 2." + exit 2 + } +} diff --git a/skills/copilot-pr-autopilot/scripts/_lib.ps1 b/skills/copilot-pr-autopilot/scripts/_lib.ps1 new file mode 100644 index 000000000..4103f2d56 --- /dev/null +++ b/skills/copilot-pr-autopilot/scripts/_lib.ps1 @@ -0,0 +1,1304 @@ +# Shared helpers for copilot-pr-autopilot scripts. +# Dot-source with: `. "$PSScriptRoot/_lib.ps1"` +# +# Dot-sourcing runs the prerequisite check below; if `gh` is missing or +# unauthenticated the script halts BEFORE doing any work, with a single +# actionable error message the calling agent can pattern-match on. +# +# Compatibility: Windows PowerShell 5.1+ and PowerShell 7+. +# `Invoke-CopilotPrAutopilotGhProcess` launches `gh` via +# `[System.Diagnostics.ProcessStartInfo]` with per-process environment +# overrides (so concurrent runspaces in the same parent PS process +# don't race on `$env:GH_HOST`), and drains stdout/stderr concurrently +# via `ReadToEndAsync`. On PS 7+ it uses `ArgumentList` for arg +# passing; on PS 5.1 (.NET Framework, no `ArgumentList`) it falls back +# to building `.Arguments` with CommandLineToArgvW round-trip quoting. +# `EnvironmentVariables` (StringDictionary) is used in preference to +# the newer `Environment` (IDictionary) for the same +# .NET Framework reason. + +# Canonical Copilot Code Review reviewer login regex. +# GraphQL exposes the login as either `copilot-pull-request-reviewer` (when +# referenced via `requestedReviewer.login`) or `copilot-pull-request-reviewer[bot]` +# (when referenced via review `author.login`), so callers must accept both. +# Centralised here so all step scripts (01 / 02 / 10) stay in sync — if the +# canonical login ever changes, change it once. +# +# Namespaced (`CopilotPrAutopilot_` prefix) + read-only because `_lib.ps1` is +# dot-sourced into the caller's scope; a bare name like `$CopilotLoginRegex` +# would risk colliding with caller-side variables. `Set-Variable -Force` lets +# us re-dot-source in the same session without erroring on the read-only flag. +# A back-compat alias `$CopilotReviewerLoginRegex` is preserved so callers +# don't have to type the prefix on every read site (and so older snapshots of +# 01/02/10 keep working). +Set-Variable -Name 'CopilotPrAutopilot_CopilotReviewerLoginRegex' ` + -Value '(?i)^copilot-pull-request-reviewer(\[bot\])?$' ` + -Option ReadOnly -Force -Scope Script +Set-Variable -Name 'CopilotReviewerLoginRegex' ` + -Value $CopilotPrAutopilot_CopilotReviewerLoginRegex ` + -Option ReadOnly -Force -Scope Script + +# Canonical Copilot Code Review reviewer slug (sans `[bot]` suffix). This +# is what GraphQL `requestReviewsByLogin(userLogins: [...])` and the REST +# `requested_reviewers` payloads expect. Centralised so the literal +# isn't duplicated across scripts/docs/queries; if GitHub ever renames +# the canonical bot slug, change it once. The regex above continues to +# accept the `[bot]` suffix variant on the read path. +Set-Variable -Name 'CopilotPrAutopilot_CopilotReviewerSlug' ` + -Value 'copilot-pull-request-reviewer' ` + -Option ReadOnly -Force -Scope Script +Set-Variable -Name 'CopilotReviewerSlug' ` + -Value $CopilotPrAutopilot_CopilotReviewerSlug ` + -Option ReadOnly -Force -Scope Script + +# Marker that 08-reply-and-resolve embeds into a reply body when invoked +# with -NoResolve (escalate-to-user). 10-cleanup-outdated.ps1 refuses to +# auto-resolve outdated threads whose last comment contains this marker +# so the workflow contract "escalated threads stay open for the human +# merge owner" survives the thread becoming outdated. The HTML-comment +# form keeps the marker invisible in rendered Markdown. +Set-Variable -Name 'CopilotPrAutopilot_EscalationMarker' ` + -Value '' ` + -Option ReadOnly -Force -Scope Script + +# Credential-redaction rules used by Protect-CopilotPrAutopilotCredentials. +# Hoisted to script scope (built ONCE per dot-source) so hot paths +# (error previews, timeouts, JSON parse errors, GraphQL errors) don't +# rebuild the array per call. ReadOnly so a misbehaving caller can't +# mutate the table out from under in-flight redactions. Patterns target +# the formats `gh` and GitHub APIs actually emit; not exhaustive. +# +# Each entry's Pattern is a precompiled `[regex]` object (with +# `RegexOptions.Compiled` for hot-path speed and case-insensitivity +# where the original pattern used `(?i)`). Precompiling avoids +# depending on .NET's process-wide regex cache, which is size-limited +# (default 15) and shared across the whole process — repeated calls +# across many error paths/tools could evict our patterns and force a +# re-parse. The compile cost is paid once per dot-source. +Set-Variable -Name 'CopilotPrAutopilot_CredentialRedactors' ` + -Value @( + @{ Pattern = [regex]::new('(?i)(Authorization\s*:\s*Bearer\s+)\S+', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '${1}' } + @{ Pattern = [regex]::new('(?i)(Authorization\s*:\s*token\s+)\S+', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '${1}' } + @{ Pattern = [regex]::new('(?i)(Authorization\s*:\s*basic\s+)\S+', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '${1}' } + @{ Pattern = [regex]::new('\b(gh[a-z]_[A-Za-z0-9]{36,})\b', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '' } + @{ Pattern = [regex]::new('\b(github_pat_[A-Za-z0-9_]{82,})\b', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '' } + @{ Pattern = [regex]::new('(?i)("access_token"\s*:\s*")[^"]+', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '${1}' } + @{ Pattern = [regex]::new('(?i)("token"\s*:\s*")[^"]+', [System.Text.RegularExpressions.RegexOptions]::Compiled); Replace = '${1}' } + ) ` + -Option ReadOnly -Force -Scope Script + +# Best-effort credential redactor for any text destined for an error +# message or CI log. Applied to `gh` stdout/stderr before embedding in +# throws: `gh --verbose`, GH_DEBUG=api, and certain GitHub API error +# bodies can echo Authorization headers, gh[a-z]_* tokens, github_pat_* +# bodies, or JSON {"access_token":"..."}/{"token":"..."} fragments — +# all of which would otherwise land in CI logs that may be public or +# persistent. Patterns live in the script-scope read-only +# $CopilotPrAutopilot_CredentialRedactors table as precompiled +# [regex] objects (single-source-of-truth, zero per-call allocation, +# independent of .NET's size-limited process-wide regex cache). +# Callers still apply length truncation after redaction so throws +# stay bounded. +function Protect-CopilotPrAutopilotCredentials { + [CmdletBinding()] + param([Parameter(Mandatory)][AllowNull()][AllowEmptyString()][string]$Text) + if ([string]::IsNullOrEmpty($Text)) { return $Text } + $out = $Text + foreach ($r in $script:CopilotPrAutopilot_CredentialRedactors) { + $out = $r.Pattern.Replace($out, $r.Replace) + } + return $out +} + +# Small companion to Protect-CopilotPrAutopilotCredentials for the +# common "embed gh stderr/stdout in an error message" pattern used by +# step scripts. Redacts credentials, bounds the preview to $MaxChars +# (default 500 — matches Invoke-GhGraphQL / Resolve-RepoCoords), and +# returns "(empty)" for null/whitespace input. Hoisted into _lib.ps1 +# so step scripts (01/02/10/...) don't each reinvent the same +# redact → truncate → empty-sentinel formatting. +function Format-CopilotPrAutopilotErrorPreview { + [CmdletBinding()] + param( + [AllowNull()][AllowEmptyString()][string]$Text, + [ValidateRange(0, [int]::MaxValue)] + [int]$MaxChars = 500 + ) + if ([string]::IsNullOrWhiteSpace($Text)) { return '(empty)' } + $safe = (Protect-CopilotPrAutopilotCredentials -Text $Text).Trim() + if ($MaxChars -eq 0) { return '' } + if ($safe.Length -gt $MaxChars) { + return $safe.Substring(0, $MaxChars) + "... (truncated to $MaxChars chars)" + } + return $safe +} + +# Prerequisite check: gh CLI installed AND authenticated. +# Fails fast with install/login instructions. Idempotent within a single +# PowerShell process: subsequent calls (or dot-sources of _lib.ps1) inside +# the same process are no-ops because the flag lives in the dot-sourcing +# script's scope. Each step script (01/02/03/...) is a fresh process when +# invoked from a bash loop, so the check runs once per step invocation by +# construction. +function Assert-GhReady { + # 1. Installed? Resolve to the absolute Application path BEFORE any + # `& gh ...` invocation, so a session-defined function/alias named + # `gh` cannot intercept the host-derivation probe or the auth-status + # check below. Constrain to CommandType Application; the resolved + # $cmd.Path is then used for every gh invocation in this function + # (and stashed in $script:CopilotPrAutopilot_GhPath for Invoke-Gh). + $cmd = Get-Command gh -CommandType Application -ErrorAction SilentlyContinue + if (-not $cmd) { + throw @' +copilot-pr-autopilot: prerequisite missing — `gh` CLI is not on PATH. + +Install (one of): + - winget install --id GitHub.cli (Windows) + - brew install gh (macOS) + - sudo apt install gh (Debian/Ubuntu — see https://cli.github.com for other distros) + - https://cli.github.com/ (universal installer + download) + +Then `gh auth login` and re-run this command. +'@ + } + + # Normalize $env:GH_HOST: `gh` expects a bare hostname (e.g., + # `github.com` or `ghe.example.com`). Some users export it as a URL + # form (`https://ghe.example.com/`) — passing that through would + # break the `--hostname` argument and silently bypass the auth check. + # Strip scheme and any trailing path/slash; tolerate plain hostnames. + $rawHost = $env:GH_HOST + if ([string]::IsNullOrWhiteSpace($rawHost)) { + # Derive gh's actual default host instead of hardcoding github.com, + # which would otherwise break GitHub Enterprise users who have + # `gh` configured with a non-github.com default (e.g., via `gh + # auth login --hostname ghe.example.com`). Probe `gh auth status` + # to discover the active host. If no host can be derived (fresh + # install, no auth yet, or unrecognized `gh auth status` format) + # we fail closed with an actionable diagnostic rather than + # silently defaulting to github.com, which would mask Enterprise + # misconfiguration with a misleading "not authenticated to + # github.com" error pointing at the wrong host. + $ghDerivedHost = $null + $statusOutput = '' + try { + # Route through Invoke-CopilotPrAutopilotGhProcess so the + # host-derivation probe gets the same 5-min hard timeout, + # UTF-8 decoding, and non-interactive stdin behavior as + # every other gh call in this skill. The original + # `& $cmd.Path auth status 2>&1 | Out-String` could hang + # indefinitely on a credential-helper stall (e.g., + # macOS keychain prompt, manager-core deadlock) since + # PowerShell's native-command invocation has no built-in + # timeout. We use a shorter 30s timeout here because + # `gh auth status` is purely local (no network) so any + # call that exceeds that is almost certainly a hang. + # We DO NOT throw on a non-zero exit code from this + # probe — `gh auth status` returns non-zero when no + # account is logged in, and the parsing below correctly + # falls through to the "couldn't derive a host" branch + # that surfaces a friendly diagnostic. + $statusResult = Invoke-CopilotPrAutopilotGhProcess ` + -GhPath $cmd.Path ` + -GhArgs @('auth', 'status') ` + -TimeoutSeconds 30 + # Mirror the original `2>&1` merge so the parser sees + # whichever stream gh chose to write the header lines on + # (different gh versions vary on this). + $statusOutput = ($statusResult.Stdout + "`n" + $statusResult.Stderr) + # Parse `gh auth status` block-by-block. The output is + # organized as one "Logged in to " header followed + # by indented lines per host. Splitting on the header + # boundary lets us pick the host whose OWN block contains + # "Active account: true" — a naive `(?ms).*?Active...` + # regex can skip past a later header and incorrectly + # return the first host even when the active flag belongs + # to a different host further down the output. + $ghDerivedHost = $null + $hostHeaderRegex = [regex]'(?im)^.*?Logged in to\s+(\S+)' + $headerMatches = @($hostHeaderRegex.Matches($statusOutput)) + if ($headerMatches.Count -gt 0) { + for ($hi = 0; $hi -lt $headerMatches.Count; $hi++) { + $m = $headerMatches[$hi] + $blockStart = $m.Index + $blockEnd = if (($hi + 1) -lt $headerMatches.Count) { $headerMatches[$hi + 1].Index } else { $statusOutput.Length } + $block = $statusOutput.Substring($blockStart, $blockEnd - $blockStart) + if ($block -match '(?im)Active account:\s*true') { + $ghDerivedHost = $m.Groups[1].Value + break + } + } + if (-not $ghDerivedHost) { + # Single-host or pre-2.40 gh (no Active marker): fall + # back to the first header. + $ghDerivedHost = $headerMatches[0].Groups[1].Value + } + } + } catch { + # Capture the underlying failure (process-launch error, + # 30s timeout kill, malformed gh output, etc.) into the + # status buffer so the downstream "couldn't derive a + # host" diagnostic surfaces a concrete reason instead + # of the vacuous "(gh auth status produced no output)" + # sentinel. Use the exception's Message rather than the + # full ErrorRecord so PowerShell's stack trace noise + # stays out of the operator-facing message; the message + # itself goes through the same credential-redaction + # path as the normal output (applied at $statusExcerpt + # construction below). + $statusOutput = "(gh auth status probe failed: $($_.Exception.Message))" + } + if (-not $ghDerivedHost) { + # Don't silently default to github.com — GitHub Enterprise users + # without $env:GH_HOST set, or environments where `gh auth + # status` output format / localization breaks both regexes, + # would otherwise get a misleading "not authenticated to + # github.com" error pointing at the wrong host. Fail explicitly + # with an actionable message that surfaces the actual status + # output excerpt so the operator can see which hosts (if any) + # gh thinks it's authenticated against. + # Redact credentials BEFORE truncation so the 600-char cap + # reflects the safe content and we never silently embed an + # unredacted token because the visible region happened to + # be on the truncated side. + $statusExcerpt = (Protect-CopilotPrAutopilotCredentials -Text $statusOutput).Trim() + if ($statusExcerpt.Length -gt 600) { + $statusExcerpt = $statusExcerpt.Substring(0, 600) + '... (truncated)' + } + if ([string]::IsNullOrEmpty($statusExcerpt)) { + $statusExcerpt = '(gh auth status produced no output — likely not logged in to any host)' + } + $msg = @' +copilot-pr-autopilot: could not derive a GitHub host from `gh auth status` (and `$env:GH_HOST` is not set, credentials redacted). + +Fix one of: + - Run `gh auth login --hostname github.com` (or your enterprise host) + - Set `$env:GH_HOST` to your host (e.g., `$env:GH_HOST = 'ghe.example.com'`) + - Inspect `gh auth status` yourself and resolve the auth state + +`gh auth status` reported: +{STATUS} +'@ + throw $msg.Replace('{STATUS}', $statusExcerpt) + } + $targetHostLookup = $ghDerivedHost + # Note: the validated host is stored in + # $script:CopilotPrAutopilot_GhHost and applied by Invoke-Gh + # via a per-process ProcessStartInfo.EnvironmentVariables + # override around the native gh call. The parent process's + # $env:GH_HOST is NEVER mutated, so concurrent runspaces in + # the same parent PS process don't race, and importing this + # library does not leave the caller's environment mutated. + } else { + $rawHost = $rawHost.Trim() + $parsedHost = $null + # Try absolute URI first (handles https://ghe.example.com/path/). + $uri = $null + if ([Uri]::TryCreate($rawHost, [UriKind]::Absolute, [ref]$uri) -and $uri.Host) { + $parsedHost = $uri.Host + } else { + # Fall back to manual scheme/path stripping for inputs like + # `ghe.example.com/`, `//ghe.example.com`, or `host:port/x`. + $stripped = $rawHost -replace '^[a-zA-Z][a-zA-Z0-9+.\-]*://', '' + $stripped = $stripped -replace '^/+', '' + $slashIdx = $stripped.IndexOf('/') + if ($slashIdx -ge 0) { $stripped = $stripped.Substring(0, $slashIdx) } + # Trim only trailing slashes here. A trailing dot CAN be a + # valid absolute-FQDN representation (DNS root); stripping + # it would change the literal hostname and could break GHE + # setups that intentionally use it (or break TLS SNI on + # the few servers that distinguish the two forms). + $stripped = $stripped.TrimEnd('/') + # Reattempt URI parsing with an injected scheme so `host:port` + # and bracketed IPv6 (e.g., `[::1]:8080`) get their port + # stripped via `.Host` rather than left intact — `gh + # --hostname` expects a bare hostname. + $reparsed = $null + if ([Uri]::TryCreate("https://$stripped", [UriKind]::Absolute, [ref]$reparsed) -and $reparsed.Host) { + $parsedHost = $reparsed.Host + } else { + $parsedHost = $stripped + } + } + if ([string]::IsNullOrWhiteSpace($parsedHost)) { + # Escape embedded single quotes in $rawHost so a host containing + # `'` (legal in arbitrary $env: values) cannot break out of the + # quoting and produce an ambiguous diagnostic. + $rawHostEsc = $rawHost.Replace("'", "''") + throw "copilot-pr-autopilot: `$env:GH_HOST is set to '$rawHostEsc' which does not resolve to a usable hostname. Set it to a bare host (e.g., 'github.com' or 'ghe.example.com') or unset it." + } + $targetHostLookup = $parsedHost + # Note: the validated host is stored in + # $script:CopilotPrAutopilot_GhHost and applied by Invoke-Gh + # via a per-process ProcessStartInfo.EnvironmentVariables + # override around the native gh call (works uniformly across + # every gh subcommand, unlike per-subcommand --hostname, and + # without mutating the parent's $env:GH_HOST — concurrent- + # runspace-safe). Importing this library does not leave the + # caller's environment mutated. + } + if ($script:CopilotPrAutopilot_GhReady -and $script:CopilotPrAutopilot_GhHost -eq $targetHostLookup) { + # Return a snapshot object so callers can use these locals + # directly (eliminating the read-after-validate race window + # that arises when Invoke-Gh re-reads $script:* after a + # concurrent runspace has mutated them). + return [pscustomobject]@{ + GhHost = $script:CopilotPrAutopilot_GhHost + GhPath = $script:CopilotPrAutopilot_GhPath + } + } + # Re-validate when $env:GH_HOST has changed mid-process so we never + # short-circuit on a memoized pass that was for a different host. + + # 2. Authenticated against the *target* host? `gh auth status` without + # --hostname can exit 0 if the user is logged into ANY host (e.g. + # ghes.example.com but not github.com), which would let the prereq + # silently pass and surface as a less actionable failure later. We + # constrain to the resolved target host so the contract + # "halts BEFORE doing any work" actually holds for the target host. + $targetHost = $targetHostLookup + # Use the per-process ProcessStartInfo helper (no in-process env + # mutation) so this prereq check is also thread-safe across + # concurrent runspaces in the same parent PowerShell process. + # Single source of truth: pass the resolved host via gh's + # native `--hostname` flag and DO NOT set ExtraEnv GH_HOST here. + # The flag wins anyway, and dual-specifying just opens a future + # foot-gun if one side gets updated and the other doesn't. + $authResult = Invoke-CopilotPrAutopilotGhProcess -GhPath $cmd.Path -GhArgs @('auth','status','--hostname',$targetHost) + if ($authResult.ExitCode -ne 0) { + $errTrimmed = $authResult.Stderr.Trim() + # Substitute a sentinel when stderr is empty so the diagnostic + # tail is never blank. `gh auth status` sometimes exits non-zero + # with all output on stdout (or no output at all on rare + # config-corruption surfaces); a blank `{ERR}` would otherwise + # leave operators with nothing actionable. + if ([string]::IsNullOrEmpty($errTrimmed)) { + $stdoutTrimmed = $authResult.Stdout.Trim() + if (-not [string]::IsNullOrEmpty($stdoutTrimmed)) { + $errTrimmed = $stdoutTrimmed + } else { + $errTrimmed = '(gh emitted no output — re-run `gh auth status --hostname ' + $targetHost + '` interactively for full output)' + } + } + $msg = @' +copilot-pr-autopilot: prerequisite missing — `gh` CLI is not authenticated to {HOST}. + +Run: + gh auth login --hostname {HOST} + +Then re-run this command. (Override the target host with `$env:GH_HOST`.) + +`gh auth status --hostname {HOST}` reported: + {ERR} +'@ + # Redact credentials in {ERR} BEFORE substitution. `gh auth + # status` is the most likely place to leak token-like values + # (the whole command is about auth) so this is the single most + # important throw site for the redactor to cover. + $errSafe = Protect-CopilotPrAutopilotCredentials -Text $errTrimmed + $msg = $msg.Replace('{HOST}', $targetHost).Replace('{ERR}', $errSafe) + throw $msg + } + + $script:CopilotPrAutopilot_GhReady = $true + $script:CopilotPrAutopilot_GhPath = $cmd.Path + $script:CopilotPrAutopilot_GhHost = $targetHost + # See matching return in the memoized short-circuit above for why + # we return a snapshot object instead of $null. + return [pscustomobject]@{ + GhHost = $targetHost + GhPath = $cmd.Path + } +} + +function New-CopilotPrAutopilotPrivateTempDir { + # Internal helper: create a per-call temp dir for transient files + # (gh stderr, body tempfiles) with owner-only access on Unix. + # On Unix: `mkdir -m 700` atomically creates with mode 0700 (no umask + # window). On Windows: rely on the documented per-user %TEMP% ACL + # (interactive logon default) plus operator's + # $env:COPILOT_PR_AUTOPILOT_TEMP_ROOT override for non-interactive + # service accounts where shared TEMP is configured. + $tempRoot = $null + if (-not [string]::IsNullOrWhiteSpace($env:COPILOT_PR_AUTOPILOT_TEMP_ROOT)) { + $candidate = $env:COPILOT_PR_AUTOPILOT_TEMP_ROOT.Trim() + # [IO.Path]::IsPathRooted returns $true for drive-relative paths + # like 'C:temp' (no backslash after the colon), which Windows + # treats as "relative to the current directory on drive C:" — + # NOT fully qualified. That would silently leak PR content into + # whatever directory the caller happens to be in. PS 7+ ships + # IsPathFullyQualified which rejects 'C:temp'; on 5.1 we fall + # back to a regex check that accepts drive-absolute (C:\...), + # UNC (\\server\share), and Unix-absolute (/...) forms. + $isFullyQualified = $false + if ([IO.Path].GetMethod('IsPathFullyQualified', [Type[]]@([string]))) { + $isFullyQualified = [IO.Path]::IsPathFullyQualified($candidate) + } else { + $isFullyQualified = ($candidate -match '^[A-Za-z]:[\\/]') -or ($candidate -match '^[\\/][\\/]') -or ($candidate -match '^/') + } + if (-not $isFullyQualified) { + throw "copilot-pr-autopilot: `$env:COPILOT_PR_AUTOPILOT_TEMP_ROOT must be a fully-qualified absolute path (e.g., 'C:\\Temp\\copilot' or '/tmp/copilot'); got '$candidate'. Drive-relative paths like 'C:temp' are not accepted because they resolve against the caller's current directory." + } + # Operator explicitly set this to force tempfiles onto a chosen + # private volume. A silent fallback to GetTempPath() on missing + # / non-directory paths would defeat that intent and quietly + # leak PR content into the default TEMP. Fail closed so a + # typo'd override is immediately actionable. + if (-not (Test-Path -LiteralPath $candidate -PathType Container)) { + throw "copilot-pr-autopilot: `$env:COPILOT_PR_AUTOPILOT_TEMP_ROOT='$candidate' does not exist or is not a directory. Create it first (with the desired ACLs) or unset the variable to use the default temp path." + } + $tempRoot = $candidate + } + if (-not $tempRoot) { $tempRoot = [IO.Path]::GetTempPath() } + $tdir = Join-Path $tempRoot ("copilot-pr-autopilot-" + [Guid]::NewGuid().ToString('N')) + if ($PSVersionTable.PSVersion.Major -ge 6 -and ($IsLinux -or $IsMacOS)) { + $mkdirCmd = Get-Command mkdir -CommandType Application -ErrorAction SilentlyContinue + if (-not $mkdirCmd) { + throw "copilot-pr-autopilot: external 'mkdir' binary not found on PATH — cannot create owner-only temp dir atomically." + } + $mkdirOutput = $null + # Initialize $mkdirEc to a sentinel so a throw before the + # native-mkdir line never produces a "exited " diagnostic. + $mkdirEc = -1 + # Capture mkdir's combined stdout+stderr in-memory via 2>&1 + # instead of redirecting to a tempfile in the shared TEMP + # path. mkdir's failure messages (e.g., "Permission denied", + # filesystem-full diagnostics) are short and benign — but + # routing them through shared TEMP would contradict the + # whole point of this helper. In-memory capture preserves + # the diagnostic for the throw without ever touching disk. + # $ErrorActionPreference is not flipped here because it has + # no effect on native-executable failures (only $LASTEXITCODE + # matters for & mkdir ...), and flipping it would only add + # surprise behavior if cmdlets were added inside the block. + $mkdirOutput = & $mkdirCmd.Path -m 700 $tdir 2>&1 + $mkdirEc = $LASTEXITCODE + if ($mkdirEc -ne 0) { + $diag = if ($mkdirOutput) { ($mkdirOutput | Out-String).Trim() } else { '(no output)' } + throw "copilot-pr-autopilot: failed to create owner-only temp dir ($($mkdirCmd.Path) -m 700 $tdir exited $mkdirEc): $diag" + } + } else { + [void][System.IO.Directory]::CreateDirectory($tdir) + } + return $tdir +} + +# Internal helper: launch `gh` as a child process with a per-process +# environment override (specifically GH_HOST), so concurrent runspaces +# inside the same PowerShell host don't race on `$env:GH_HOST`. +# Returns [pscustomobject]@{ ExitCode = ; Stdout = ; Stderr = }. +# +# Concurrent stdout/stderr capture via StandardOutput.ReadToEndAsync / +# StandardError.ReadToEndAsync prevents child-process deadlock when +# either pipe's OS-side buffer fills (default 4-64 KiB depending on +# platform) — synchronously calling ReadToEnd on one stream while the +# other fills its pipe is the classic deadlock. We deliberately do NOT +# use Begin*ReadLine event handlers: those fire on threadpool workers +# without a PowerShell Runspace, which raises PSInvalidOperationException +# the first time a PS scriptblock handler runs. +function Invoke-CopilotPrAutopilotGhProcess { + param( + [Parameter(Mandatory)][string]$GhPath, + [Parameter(Mandatory)][string[]]$GhArgs, + [hashtable]$ExtraEnv, + # Hard cap on child-process wall time. Default 5 min covers any + # realistic gh api / graphql call (typical: <2s) while preventing + # an entire automation step from hanging indefinitely on network + # stalls, credential-helper deadlocks, or a hung interactive + # prompt (we should never trigger one — UseShellExecute=false + + # the auth-status precheck — but defense-in-depth). Override per + # call via $env:COPILOT_PR_AUTOPILOT_GH_TIMEOUT_SECONDS (positive + # int) if a particular workflow legitimately needs longer. + # ValidateRange enforces the same positive-int invariant as the + # env-var path so that a caller passing 0 or a negative value + # gets a clear parameter-binding error rather than a confusing + # Process.WaitForExit(int) ArgumentOutOfRangeException or an + # immediate timeout/kill on the first WaitForExit tick. + [ValidateRange(1, [int]::MaxValue)] + [int]$TimeoutSeconds = 300 + ) + $psi = [System.Diagnostics.ProcessStartInfo]::new() + $psi.FileName = $GhPath + # Cross-runtime: ProcessStartInfo.ArgumentList was added in .NET + # Core 2.1 (PS 7+). Windows PowerShell 5.1 runs on .NET Framework + # which only has the legacy Arguments string. Branch on the runtime + # so this skill keeps its documented PS 5.1+ compatibility (per + # SKILL.md and the file-header note above). + # Validate $GhArgs UP FRONT: a $null token in the array almost + # always indicates an upstream bug (an optional -F field + # silently resolved to $null instead of being omitted, a + # mis-spelled variable name, etc.). Silently coercing $null to + # '' would suppress that bug AND produce a subtly different gh + # invocation (an empty-string token IS a real positional arg + # that gh consumes), making the eventual misbehavior much + # harder to diagnose. Fail loudly with the offending index so + # the operator can fix the call site. + for ($i = 0; $i -lt $GhArgs.Count; $i++) { + if ($null -eq $GhArgs[$i]) { + # NOTE for code reviewers: the message below renders the + # index correctly. Backtick escapes turn `$GhArgs` and + # `$null` into literals; `[$i]` interpolates the loop + # index. Empirically: `$GhArgs[1] is $null. Callers must...`. + # Do NOT "fix" this to `-GhArgs[$i]` — same output, no bug. + throw "Invoke-CopilotPrAutopilotGhProcess: `$GhArgs[$i] is `$null. Callers must pass a non-null string for every argument (empty-string '' is allowed and considered intentional). A `$null token most likely indicates an upstream bug (uninitialised variable, optional field silently nulled, misspelled var name)." + } + } + if ($PSVersionTable.PSVersion.Major -ge 6) { + foreach ($arg in $GhArgs) { + [void]$psi.ArgumentList.Add($arg) + } + } else { + # CommandLineToArgvW round-trip quoting (Microsoft's documented + # rules): https://learn.microsoft.com/en-us/cpp/cpp/main-function-command-line-args#parsing-c-command-line-arguments + # - Backslashes followed by " must be doubled and a " preceded + # by them is escaped as \". Backslashes not followed by " are + # passed through literally. + # - Arguments with whitespace or " must be wrapped in "..."; an + # empty argument must be wrapped as "" so it survives splitting. + $quoted = foreach ($a in $GhArgs) { + if ($a.Length -gt 0 -and -not ($a -match '[\s"]')) { + $a + } else { + $sb = [System.Text.StringBuilder]::new() + [void]$sb.Append('"') + # Use $ci (character index) here rather than $i so we + # don't clobber the outer arg-index $i used by the + # up-front null-validation pass earlier in the function. + # PowerShell foreach in script does NOT introduce a new + # variable scope; renaming keeps future edits safe. + $ci = 0 + while ($ci -lt $a.Length) { + $bs = 0 + while ($ci -lt $a.Length -and $a[$ci] -eq '\') { $bs++; $ci++ } + if ($ci -eq $a.Length) { + [void]$sb.Append([string]::new('\', $bs * 2)) + break + } elseif ($a[$ci] -eq '"') { + [void]$sb.Append([string]::new('\', $bs * 2 + 1)).Append('"') + } else { + [void]$sb.Append([string]::new('\', $bs)).Append($a[$ci]) + } + $ci++ + } + [void]$sb.Append('"') + $sb.ToString() + } + } + $psi.Arguments = ($quoted -join ' ') + } + $psi.UseShellExecute = $false + $psi.RedirectStandardOutput = $true + $psi.RedirectStandardError = $true + # Redirect (and immediately close) stdin so gh can never block + # waiting on an interactive prompt — e.g. a missing/expired token + # on a TTY-less CI agent. Without this, the child inherits our + # stdin and a prompt-on-stdin path would hang until TimeoutSeconds + # fires (slow, and the kill path is observable in CI logs). With + # RedirectStandardInput=$true + StandardInput.Close() the child + # sees EOF immediately and fails fast with a clean exit code. + $psi.RedirectStandardInput = $true + # Force UTF-8 (no BOM) for the redirected streams. Without this, + # Windows PowerShell 5.1 on .NET Framework defaults to the OEM + # code page (typically CP437 / CP1252 / CP932 depending on locale) + # for redirected child-process output, which corrupts any non- + # ASCII bytes in `gh`'s JSON / comment bodies and then trips a + # downstream ConvertFrom-Json failure or silent mojibake. `gh` + # itself writes UTF-8 regardless of platform (Go runtime default), + # so explicitly pinning the reader to UTF-8 no-BOM matches the + # producer and removes the locale dependency. + $utf8NoBom = [System.Text.UTF8Encoding]::new($false) + $psi.StandardOutputEncoding = $utf8NoBom + $psi.StandardErrorEncoding = $utf8NoBom + $psi.CreateNoWindow = $true + # Inherit current env, then apply per-process overrides without + # mutating the parent's $env:* (thread-safe across concurrent + # PowerShell runspaces sharing the same parent process). + # Use EnvironmentVariables (StringDictionary) rather than the + # newer Environment property (IDictionary) because + # the latter is .NET Core / .NET 5+ only and would throw a + # property-not-found error on Windows PowerShell 5.1's .NET + # Framework runtime. EnvironmentVariables exists on both and is + # case-insensitive on Windows (which matches Win32 env semantics). + if ($ExtraEnv) { + foreach ($k in $ExtraEnv.Keys) { + # Cast to [string] once. ProcessStartInfo.EnvironmentVariables + # (StringDictionary) requires string keys; a hashtable can + # legally carry non-string keys (int, enum, etc.) so a + # caller mis-typing one of these would otherwise throw + # late inside ContainsKey/Remove/indexer. Casting up front + # keeps the helper robust against unusual caller input. + $key = [string]$k + $v = $ExtraEnv[$k] + if ($null -eq $v) { + if ($psi.EnvironmentVariables.ContainsKey($key)) { [void]$psi.EnvironmentVariables.Remove($key) } + } else { + $psi.EnvironmentVariables[$key] = [string]$v + } + } + } + # Operator override: positive integer in + # $env:COPILOT_PR_AUTOPILOT_GH_TIMEOUT_SECONDS overrides the param + # DEFAULT for the entire process (useful when a single workflow + # legitimately needs to allow longer-running gh calls without + # threading a parameter through every call site). + # + # Precedence: an explicit `-TimeoutSeconds` from the caller WINS + # over the env var. Callers that pass a deliberately tight timeout + # (e.g., Assert-GhReady's 30s for a purely-local `gh auth status` + # probe) must not be silently relaxed by a global env-var override + # — that would defeat the intended fast-fail behavior and turn a + # 30s hang ceiling into a 5+ minute one. Detect "caller did NOT + # pass -TimeoutSeconds" via $PSBoundParameters so the env var + # applies only to the default-binding path. + $envTimeoutStr = [Environment]::GetEnvironmentVariable('COPILOT_PR_AUTOPILOT_GH_TIMEOUT_SECONDS', [EnvironmentVariableTarget]::Process) + if ($envTimeoutStr) { + $parsedTimeout = 0 + if ([int]::TryParse($envTimeoutStr, [ref]$parsedTimeout) -and $parsedTimeout -gt 0) { + if (-not $PSBoundParameters.ContainsKey('TimeoutSeconds')) { + $TimeoutSeconds = $parsedTimeout + } + } else { + throw "copilot-pr-autopilot: `$env:COPILOT_PR_AUTOPILOT_GH_TIMEOUT_SECONDS must be a positive integer (got '$envTimeoutStr')." + } + } + $proc = [System.Diagnostics.Process]::new() + $proc.StartInfo = $psi + $startedAt = [DateTime]::UtcNow + try { + [void]$proc.Start() + # Close stdin immediately so any code path inside gh that + # would prompt on stdin (auth refresh, "use this account?", + # editor invocation) sees EOF and aborts instead of hanging + # until our TimeoutSeconds fires. Wrapped in try/catch + # because StandardInput is not guaranteed available on every + # platform/version combination and a failure here must not + # mask the real gh exit code we're about to capture. + try { $proc.StandardInput.Close() } catch { } + # Drain both pipes concurrently via .NET Tasks (no PowerShell + # script-block event handlers — those would run on threadpool + # workers without a PS Runspace and throw PSInvalidOperationException). + # Concurrent draining is required to avoid child-process deadlock + # when either pipe's OS-side buffer fills (default 4-64 KiB). + $outTask = $proc.StandardOutput.ReadToEndAsync() + $errTask = $proc.StandardError.ReadToEndAsync() + # Bounded wait so a hung child (network stall, credential helper + # deadlock) cannot freeze the calling step script indefinitely. + # Promote to [long] BEFORE multiplying so very large + # TimeoutSeconds values (e.g., operator-supplied 86400) don't + # silently overflow [int] arithmetic and produce a negative ms + # argument; then clamp to Int32.MaxValue for the WaitForExit + # signature. + $waitMs = [int]([Math]::Min([long][int]::MaxValue, [long]$TimeoutSeconds * 1000L)) + if (-not $proc.WaitForExit($waitMs)) { + # Process is still running. Try to kill the whole tree so any + # helper subprocess (e.g., credential helper) goes too. + # Process.Kill($true) is .NET 5+ / PS 7+; on Windows PS 5.1 + # / .NET Framework it doesn't exist. To still achieve a tree + # kill on that runtime, we deliberately try `taskkill /T /F + # /PID` BEFORE the parent-only Kill() fallback: once Kill() + # terminates the parent, `taskkill /PID` may race the + # parent's death and miss children (credential helpers etc.) + # leaving orphaned processes after a timeout. + $killedTree = $false + try { + $proc.Kill($true) + $killedTree = $true + } catch { + $taskkilled = $false + if ([Environment]::OSVersion.Platform -eq [PlatformID]::Win32NT) { + $tk = Get-Command taskkill.exe -CommandType Application -ErrorAction SilentlyContinue + if ($tk) { + try { + & $tk.Path /T /F /PID $proc.Id 2>&1 | Out-Null + $taskkilled = $true + $killedTree = $true + } catch { } + } + } + if (-not $taskkilled) { + try { $proc.Kill() } catch { } + } + } + # Use [Environment]::OSVersion.Platform (Win32NT) rather than + # $env:OS — the latter is a convention-only variable that can + # be unset in constrained or container Windows environments + # (nanoserver, restricted SYSTEM contexts), which would + # silently skip the taskkill fallback exactly where it's + # most needed (PS 5.1 without Process.Kill($true)). + if (-not $killedTree -and [Environment]::OSVersion.Platform -eq [PlatformID]::Win32NT) { + $taskkill = Get-Command taskkill.exe -CommandType Application -ErrorAction SilentlyContinue + if ($taskkill) { + try { & $taskkill.Path /T /F /PID $proc.Id 2>&1 | Out-Null } catch { } + } + } + # Give the pipes a brief moment to close after the kill so + # the drain Tasks settle. Without this, we'd either lose the + # partial stdout/stderr that was already flushed before the + # hang, or risk reading from a half-disposed stream below. + try { [void]$proc.WaitForExit(2000) } catch { } + # Verify termination. If the child is genuinely stuck (e.g., + # an unkillable kernel-state thread, taskkill failed + # silently, or a credential helper subprocess that ignored + # signals), give it a longer second wait, then surface that + # uncertainty in the throw so operators know a zombie may + # still be running. Process.Dispose() in finally{} will + # still detach our handles, but the OS process can outlive + # this PowerShell session if it truly refuses to die. + $stillRunning = $false + try { + if (-not $proc.HasExited) { + [void]$proc.WaitForExit(5000) + if (-not $proc.HasExited) { $stillRunning = $true } + } + } catch { + # HasExited / WaitForExit on a fully-detached or + # already-disposed process can throw — treat as + # "termination cannot be confirmed" rather than fatal. + $stillRunning = $true + } + $partialStdout = '' + $partialStderr = '' + try { if ($outTask.Wait(1000)) { $partialStdout = $outTask.Result } } catch { } + try { if ($errTask.Wait(1000)) { $partialStderr = $errTask.Result } } catch { } + # Redact `-f`/`-F` values AND `-H`/`--header` values so + # timeout logs don't leak GraphQL queries, reply bodies, + # or — more critically — Authorization / token / bearer + # credentials a caller may have passed via `gh api -H`. + # We keep the field name / header name so the preview + # still pinpoints which call timed out. + $redactedArgs = [System.Collections.Generic.List[string]]::new() + for ($ai = 0; $ai -lt $GhArgs.Count; $ai++) { + $cur = $GhArgs[$ai] + if (($cur -eq '-f' -or $cur -eq '-F') -and ($ai + 1) -lt $GhArgs.Count) { + $nxt = $GhArgs[$ai + 1] + $eq = $nxt.IndexOf('=') + $fname = if ($eq -gt 0) { $nxt.Substring(0, $eq) } else { $nxt } + [void]$redactedArgs.Add($cur) + [void]$redactedArgs.Add("$fname=") + $ai++ + } elseif (($cur -eq '-H' -or $cur -eq '--header') -and ($ai + 1) -lt $GhArgs.Count) { + $nxt = $GhArgs[$ai + 1] + $colon = $nxt.IndexOf(':') + $hname = if ($colon -gt 0) { $nxt.Substring(0, $colon) } else { $nxt } + [void]$redactedArgs.Add($cur) + [void]$redactedArgs.Add("${hname}: ") + $ai++ + } else { + [void]$redactedArgs.Add($cur) + } + } + $argPreview = ($redactedArgs -join ' ') + if ($argPreview.Length -gt 200) { $argPreview = $argPreview.Substring(0, 200) + '... (truncated)' } + $elapsed = ([DateTime]::UtcNow - $startedAt).TotalSeconds + $partialTail = '' + $combined = (($partialStderr, $partialStdout) | Where-Object { $_ } | ForEach-Object { $_.TrimEnd([char]13, [char]10) }) -join "`n" + if ($combined) { + # Best-effort credential redaction before embedding + # in the timeout throw. See Protect-CopilotPrAutopilotCredentials + # for the patterns and rationale. Truncation runs + # after redaction so the throw stays bounded. + $combined = Protect-CopilotPrAutopilotCredentials -Text $combined + if ($combined.Length -gt 500) { $combined = $combined.Substring(0, 500) + '... (truncated)' } + $partialTail = "`n Partial output before kill (credentials redacted): $combined" + } + $termStatus = if ($stillRunning) { ' (WARNING: process could not be confirmed terminated — possible zombie gh PID ' + $proc.Id + ' still running)' } else { '' } + throw "copilot-pr-autopilot: gh process exceeded ${TimeoutSeconds}s timeout (elapsed=$([Math]::Round($elapsed,1))s) and was killed$termStatus. Args (values redacted): $argPreview. Override with `$env:COPILOT_PR_AUTOPILOT_GH_TIMEOUT_SECONDS= if a longer wait is intentional.$partialTail" + } + $stdout = $outTask.GetAwaiter().GetResult() + $stderr = $errTask.GetAwaiter().GetResult() + # Strip all trailing CR/LF so the output matches the + # previous `& gh ... | -join "`n"` path (no trailing newline). + # TrimEnd peels off every trailing CR/LF character — that's + # equivalent for callers (no payload contains intentional + # trailing blank lines that gh produces with reliability), and + # it also normalizes the edge case where ReadToEndAsync delivers + # `text\r\n\r\n` (gh sometimes flushes a blank trailer). + $stdout = $stdout.TrimEnd([char]13, [char]10) + $stderr = $stderr.TrimEnd([char]13, [char]10) + [pscustomobject]@{ ExitCode = $proc.ExitCode; Stdout = $stdout; Stderr = $stderr } + } finally { + $proc.Dispose() + } +} + +# Single-invocation gh wrapper. Captures stdout + stderr separately +# in-memory via the ProcessStartInfo helper above. Returns +# ExitCode/Stdout/Stderr so callers never have to re-invoke `gh` just +# to recover stderr, and never feed stderr into `ConvertFrom-Json` on +# success. +# +# Note on -WhatIf: only the temp-body cleanup in the finally below +# could possibly interact with $WhatIfPreference (we pass +# -WhatIf:$false / -Confirm:$false explicitly so it can't). The +# bundled `10-cleanup-outdated.ps1` uses an explicit `-DryRun` +# switch instead of [CmdletBinding(SupportsShouldProcess)] so a +# leaked WhatIfPreference can't make Remove-Item noisy either. +function Invoke-Gh { + param([Parameter(Mandatory)][string[]]$GhArgs) + + # Mandatory only enforces the parameter is present, not non-empty; + # `-GhArgs @()` would silently invoke `gh` bare (which prints help + # and exits 0), masking call-site bugs as success. Guard explicitly. + if ($null -eq $GhArgs -or $GhArgs.Count -eq 0) { + $countDisplay = if ($null -eq $GhArgs) { '' } else { [string]$GhArgs.Count } + throw "Invoke-Gh: -GhArgs must be a non-empty array; received $countDisplay tokens." + } + if ([string]::IsNullOrWhiteSpace($GhArgs[0])) { + throw "Invoke-Gh: -GhArgs[0] must be a non-empty subcommand token (received '$($GhArgs[0])')." + } + + # Make the dependency on Assert-GhReady explicit and local. The call + # is idempotent and memoized (host-scoped + # `$script:CopilotPrAutopilot_GhReady`), so there is no per-invocation + # cost after the first call. This protects Invoke-Gh against future + # callers that import functions from _lib.ps1 without dot-sourcing + # the bottom-level Assert-GhReady invocation. + # + # Capture the (GhHost, GhPath) snapshot from the return value + # directly. The previous "read $script:* immediately after Assert" + # pattern still had a small race window: another runspace sharing + # the same module-scoped state could mutate $script:GhHost / + # $script:GhPath between the assertion and the local read. By + # consuming the return value of the same Assert call, the pair we + # use is guaranteed to come from a single function-local execution + # — no later script-scope mutation can affect it. + $ghReady = Assert-GhReady + $callGhHost = $ghReady.GhHost + $callGhPath = $ghReady.GhPath + + # Cross-version safety: Windows PowerShell 5.1's native-command + # argument passer mangles arguments that contain embedded double-quote + # characters (long-standing bug, only fully fixed in PS 7.3+ via + # $PSNativeCommandArgumentPassing). GraphQL queries/mutations routinely + # embed quoted strings (comments, default values, enum-like literals + # such as `["copilot-pull-request-reviewer"]`), so passing them as + # command-line values (`-f field=`) round-trips correctly in + # pwsh 7 but silently mis-splits under 5.1 (e.g., gh CLI reports + # "accepts 1 arg(s), received 7" or 'Expected type "number", but it + # was malformed: "-pull"'). To work identically in both runtimes, any + # `-f field=` or `-F field=` pair whose body contains `"` + # is rewritten to `-F field=@` (the body is written to disk + # first; `gh` reads it from the file and the value never appears on + # the command line). + # + # IMPORTANT typing note (verified live with gh api graphql): + # * `gh -F field=@` reads the file content and applies type + # inference (digit→Number, true/false→Boolean, null→null, else + # String). + # * `gh -f field=@` does NOT expand `@` — it sends the + # literal string `@` as the value (gh's `-f` skips the @ + # prefix entirely). So `-f` is NOT a viable tempfile carrier; + # the rewrite MUST use `-F`. + # + # Safety of the unconditional rewrite-to-`-F`: + # * Query bodies (large GraphQL strings) never look like Number / + # Boolean / null after inference, so they round-trip as String. + # * Reply bodies typed by humans (08-reply-and-resolve) almost + # never look like exactly `"true"`, `"false"`, `"null"`, or a + # bare digit run — and if they do AND they also contain `"` + # (the rewrite trigger), the resulting coercion would be a + # loud GraphQL `String!` type error, not silent data loss. + # Tempfiles are cleaned up in `finally` — the rewrite loop AND the + # native gh call are inside ONE outer try/finally so any throw + # during rewrite (e.g., mkdir -m 700 failure, WriteAllText failure) + # still triggers cleanup of any tempFiles/tempDirs accumulated so far. + $rewritten = [System.Collections.Generic.List[string]]::new() + $tempFiles = [System.Collections.Generic.List[string]]::new() + $tempDirs = [System.Collections.Generic.List[string]]::new() + # Lazy single per-call private dir for body tempfiles. Created on + # first rewrite hit, reused for every subsequent rewritten arg in + # the same call (calls with both `query=...` and `body=...` get one + # mkdir / one cleanup instead of two). Filenames are body-1.txt, + # body-2.txt, ... within that dir to keep them distinct. + $bodyDir = $null + $bodyIdx = 0 + # Cache one UTF8Encoding(no-BOM) instance reused for every body + # tempfile in this call, matching the same caching the + # ProcessStartInfo branch uses for StandardOutputEncoding / + # StandardErrorEncoding. Allocating a new instance per WriteAllText + # added a small but unnecessary per-rewrite allocation. + $utf8NoBom = [System.Text.UTF8Encoding]::new($false) + # -ErrorVariable +rmErrs below auto-creates an ArrayList on + # first append; leaving it unset here so the standard semantics + # apply (pre-initializing to a typed List[object] interferes with + # the +Variable accumulator on PS 5.1). + $rmErrs = $null + try { + for ($i = 0; $i -lt $GhArgs.Count; $i++) { + $a = $GhArgs[$i] + # Rewrite both `-f field=` and `-F field=` whose body + # contains `"` — same PS 5.1 native-arg splitting bug applies to + # both. The rewrite ALWAYS emits `-F` because `gh -f field=@file` + # does not expand `@file` (only `-F` does — verified live). The + # file content is then sent as a String GraphQL variable for any + # body that doesn't look like a Number/Boolean/null (i.e., every + # real-world query body and reply body in this skill). + if (($a -eq '-f' -or $a -eq '-F') -and ($i + 1) -lt $GhArgs.Count) { + $next = $GhArgs[$i + 1] + # Defensive null-check: the canonical $null-token guard + # lives in Invoke-CopilotPrAutopilotGhProcess (so the + # error message includes the offending index), but this + # rewrite loop runs FIRST and would otherwise crash with + # "You cannot call a method on a null-valued expression" + # on `$next.IndexOf('=')`, producing a non-actionable + # error that obscures the real call-site bug. Throw with + # the same index-bearing message format so the operator + # sees the actionable diagnostic regardless of which + # validator fires first. + if ($null -eq $next) { + throw "Invoke-Gh: -GhArgs[$($i + 1)] is `$null (following '$a' at index $i); every token must be a non-null string." + } + if ($next -isnot [string]) { $next = [string]$next } + $eqIdx = $next.IndexOf('=') + if ($eqIdx -gt 0 -and $next.Substring($eqIdx + 1).Contains('"')) { + $field = $next.Substring(0, $eqIdx) + $body = $next.Substring($eqIdx + 1) + if ($null -eq $bodyDir) { + # Lazy: only create the private 0700 dir the first + # time a rewrite is actually needed in this call. + # On Unix the helper uses `mkdir -m 700` so the + # dir exists with 0700 atomically (no umask + # window); on Windows it uses .NET CreateDirectory + # and relies on per-user %TEMP% (interactive + # logon default) plus the operator override + # $env:COPILOT_PR_AUTOPILOT_TEMP_ROOT. Containing + # the body files inside a 0700 directory is the + # only TOCTOU-safe pattern: even chmod 600 on + # the file itself leaves an open()-FD window. + $bodyDir = New-CopilotPrAutopilotPrivateTempDir + [void]$tempDirs.Add($bodyDir) + } + $bodyIdx++ + $tf = Join-Path $bodyDir "body-$bodyIdx.txt" + # UTF-8 without BOM so `gh` reads the body verbatim. The + # file lives inside a 0700 directory on Unix, so no other + # user can traverse to it regardless of the file's own + # mode. + [IO.File]::WriteAllText($tf, $body, $utf8NoBom) + [void]$tempFiles.Add($tf) + [void]$rewritten.Add('-F') + [void]$rewritten.Add("$field=@$tf") + $i++ + continue + } + } + [void]$rewritten.Add($a) + } + + $finalArgs = $rewritten.ToArray() + # Apply the resolved host via a per-process env override on + # ProcessStartInfo.EnvironmentVariables (StringDictionary; the + # newer .Environment IDictionary is .NET Core / + # .NET 5+ only and would throw on PS 5.1 / .NET Framework). + # Concurrent runspaces inside the same parent PowerShell process + # can call Invoke-Gh in parallel without racing on `$env:GH_HOST` + # (which would otherwise be mutated in-process and restored + # asymmetrically when two calls interleave). The in-process + # `$env:GH_HOST` is never touched. gh's stderr is captured + # in-memory by the helper via ReadToEndAsync (no temp file is + # needed for it here). + $envOverride = @{ GH_HOST = $callGhHost } + $result = Invoke-CopilotPrAutopilotGhProcess -GhPath $callGhPath -GhArgs $finalArgs -ExtraEnv $envOverride + [pscustomobject]@{ ExitCode = $result.ExitCode; Stdout = $result.Stdout; Stderr = $result.Stderr } + } finally { + # -WhatIf:$false / -Confirm:$false so cleanup runs regardless of + # the caller's $WhatIfPreference / $ConfirmPreference. The temp + # bodies contain PR content; leaking them on disk is much worse + # than emitting WhatIf noise. Documented in references/api-quirks. + # We capture (rather than swallow) per-file delete errors so the + # operator gets a Write-Warning surfacing exactly which path was + # left behind — a file locked by AV / a Windows service-account + # shared TEMP / a permission flip should never silently leave PR + # content (GraphQL queries, reply bodies) on disk. We still + # continue best-effort through the rest of the cleanup list. + foreach ($tf in $tempFiles) { + if ($tf -and (Test-Path -LiteralPath $tf)) { + Remove-Item -LiteralPath $tf -WhatIf:$false -Confirm:$false -ErrorAction SilentlyContinue -ErrorVariable +rmErrs + } + } + foreach ($td in $tempDirs) { + if ($td -and (Test-Path -LiteralPath $td)) { + Remove-Item -LiteralPath $td -Recurse -Force -WhatIf:$false -Confirm:$false -ErrorAction SilentlyContinue -ErrorVariable +rmErrs + } + } + if ($rmErrs -and $rmErrs.Count -gt 0) { + foreach ($e in $rmErrs) { + # Emit a sentinel when the failed-path can't be + # recovered from the ErrorRecord, plus the + # FullyQualifiedErrorId so operators have something + # concrete to grep for in CI logs even when the path + # itself is unavailable. + $leakedPath = if ($e.TargetObject -and -not [string]::IsNullOrWhiteSpace([string]$e.TargetObject)) { + [string]$e.TargetObject + } else { + '(unknown path)' + } + Write-Warning ("Invoke-Gh: failed to delete temp body file/dir '{0}' (errorId={1}; {2}). PR content (GraphQL query or reply body) may have been left on disk — please remove manually." -f $leakedPath, $e.FullyQualifiedErrorId, $e.Exception.Message) + } + } + } +} + +# Wrap ConvertFrom-Json so a non-JSON / empty stdout failure carries +# the calling $Context plus trimmed stdout/stderr — without this +# callers see a bare "Unexpected character encountered" exception +# that doesn't say which gh command produced the bad output. +# Centralised so the preview limits + format stay consistent across +# Invoke-GhGraphQL, Resolve-RepoCoords, and any future call sites. +function ConvertFrom-GhJson { + param( + [Parameter(Mandatory)][AllowEmptyString()][AllowNull()][string]$Stdout, + [AllowEmptyString()][AllowNull()][string]$Stderr, + [Parameter(Mandatory)][string]$Context, + [ValidateRange(0, [int]::MaxValue)] + [int]$PreviewChars = 500 + ) + $caught = $null + try { + if ([string]::IsNullOrWhiteSpace($Stdout)) { + # Both PS 5.1 and 7+ silently return $null when ConvertFrom-Json + # receives empty/whitespace input (via pipeline OR -InputObject), + # which would mask the original gh failure. Throw the wrapped + # contextual error explicitly so callers always see $Context. + # Redact credentials in the stderr preview before embedding — + # gh verbose/debug stderr can carry Authorization headers / + # token-like values that would otherwise land in CI logs. + $stderrSafe = Protect-CopilotPrAutopilotCredentials -Text $Stderr + $emptyMessage = "$Context returned empty stdout (no JSON to parse) (credentials redacted).`nstderr (<=${PreviewChars} chars): " + $(if ($stderrSafe) { $stderrSafe.Substring(0, [Math]::Min($PreviewChars, $stderrSafe.Length)) } else { '(empty)' }) + throw [System.Management.Automation.RuntimeException]::new($emptyMessage) + } + return $Stdout | ConvertFrom-Json -ErrorAction Stop + } catch [System.Management.Automation.RuntimeException] { + # Empty-stdout sentinel above is already a fully-formatted RuntimeException + # with $Context prefix; re-throw as-is so we don't double-wrap "returned non-JSON" + # onto an "returned empty stdout" message. + if ($_.Exception.Message.StartsWith("$Context returned empty stdout", [System.StringComparison]::Ordinal)) { throw } + # Capture for the shared message-build below ($_ does not persist + # outside the catch block). + $caught = $_.Exception + } catch { + $caught = $_.Exception + } + # Shared message-build for both non-empty-stdout failure branches. + # Hoisted out of the two catch blocks (they had identical preview + # formatting + throw) to keep future format tweaks single-sourced. + $previewOf = { + param([string]$s) + if ($s) { + $safe = Protect-CopilotPrAutopilotCredentials -Text $s + $safe.Substring(0, [Math]::Min($PreviewChars, $safe.Length)) + } else { '(empty)' } + } + $stdoutPreview = & $previewOf $Stdout + $stderrPreview = & $previewOf $Stderr + $message = "$Context returned non-JSON (credentials redacted): $($caught.Message)`nstdout (<=${PreviewChars} chars): $stdoutPreview`nstderr (<=${PreviewChars} chars): $stderrPreview" + # Re-throw as a real exception that preserves the original as + # InnerException so callers retain the original type/stack for + # automated handling, while still surfacing the contextual message. + throw [System.Management.Automation.RuntimeException]::new($message, $caught) +} + +# Wrapper around Invoke-Gh for `gh api graphql` that throws on either +# non-zero exit OR a GraphQL `errors` array in the response body. +# Cross-version safety for embedded quotes in queries is handled by +# Invoke-Gh's automatic `-f field=` → tempfile rewrite. +function Invoke-GhGraphQL { + param( + [Parameter(Mandatory)][string[]]$GhArgs, + [Parameter(Mandatory)][string]$Context + ) + $r = Invoke-Gh -GhArgs (@('api','graphql') + $GhArgs) + if ($r.ExitCode -ne 0) { + # Some `gh api graphql` failures (auth handshake errors, schema + # validation rejections) emit useful detail on stdout while + # stderr is empty or only carries the exit envelope. Include a + # trimmed stdout preview alongside stderr so operators don't + # have to re-run interactively just to recover context. + # Truncate both streams symmetrically. A pathological `gh` + # failure (verbose HTTP traces, large auth error payloads on + # stderr) could otherwise produce a multi-megabyte throw message + # that drowns CI logs without adding diagnostic value. + $previewChars = 500 + # Delegate redact → trim → truncate → empty-sentinel formatting + # to the shared Format-CopilotPrAutopilotErrorPreview helper so + # `gh api graphql` failures use the same preview shape as every + # other throw site (step scripts, Resolve-RepoCoords, ConvertFrom-GhJson). + $stderrPart = Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr -MaxChars $previewChars + $stdoutPart = Format-CopilotPrAutopilotErrorPreview -Text $r.Stdout -MaxChars $previewChars + throw "gh api graphql failed (exit $($r.ExitCode)) [$Context] (credentials redacted):`nstderr (<=${previewChars} chars): $stderrPart`nstdout (<=${previewChars} chars): $stdoutPart" + } + $data = ConvertFrom-GhJson -Stdout $r.Stdout -Stderr $r.Stderr -Context "gh api graphql [$Context]" + # The GraphQL spec guarantees `errors` is a JSON array when present + # (https://spec.graphql.org/October2021/#sec-Errors), so once + # ConvertFrom-Json deserializes, `$data.errors` is either $null, an + # empty Object[], or a populated Object[]. `if (@()) { }` is $false + # in PowerShell, but we use explicit `@($data.errors).Count -gt 0` + # to make the intent obvious to readers and to avoid relying on + # PowerShell's empty-array truthiness rule. (NOTE: this guard is + # NOT robust to a hypothetical `errors: {}` hashtable response — + # `@(@{}).Count` is 1 — but the GraphQL spec forbids that shape.) + if ($data.errors -and @($data.errors).Count -gt 0) { + # Aggregate type + path + extensions.code alongside .message so + # callers see actionable failures without re-running with extra + # logging. GitHub commonly returns type=NOT_FOUND / FORBIDDEN / + # RATE_LIMITED and extensions.code=undefinedField etc.; dropping + # them turns a clear failure ("FORBIDDEN at /repository/pullRequest") + # into an opaque message-only string. + $msgs = ($data.errors | ForEach-Object { + $parts = New-Object System.Collections.Generic.List[string] + if ($_.type) { $parts.Add("type=$($_.type)") } + if ($_.path) { $parts.Add("path=$(($_.path) -join '/')") } + if ($_.extensions -and $_.extensions.code) { $parts.Add("code=$($_.extensions.code)") } + $parts.Add("message=$($_.message)") + ($parts -join ' ') + }) -join '; ' + # Defensive redaction: GraphQL error.message strings normally + # don't contain credentials, but the central library posture + # is "redact before throwing" so any future surface (extensions + # carrying request envelopes, FORBIDDEN responses echoing tokens) + # is covered without revisiting this site. + $msgs = Protect-CopilotPrAutopilotCredentials -Text $msgs + throw "GraphQL errors [$Context] (credentials redacted): $msgs" + } + $data +} + +# Centralized GraphQL `pageInfo` invariant guard. Six call sites across +# 02/03/10 previously open-coded the same check ("if hasNextPage is true +# but endCursor is null/empty, fail fast so we don't refetch the same +# page forever or duplicate data"). Routing them all through this helper +# guarantees the wording and the truth-table stay consistent if the +# invariant ever needs to grow (e.g., to also require a non-decreasing +# cursor). Throws with a single canonical message that embeds the +# caller-provided $Context so operators can see which query violated. +function Assert-CopilotPrAutopilotPageInfo { + [CmdletBinding()] + param( + [Parameter(Mandatory)][string]$Context, + [Parameter(Mandatory)][AllowNull()]$PageInfo + ) + # Fail-closed on a missing pageInfo. The six call sites all paginate + # off a GraphQL connection where `pageInfo` is a non-null required + # subselection per spec (https://relay.dev/graphql/connections.htm#sec-undefined.PageInfo); + # a $null here means the response shape is broken (truncated body, + # unexpected error envelope, schema mismatch). Silently returning + # would let the caller stop pagination early without any signal, + # producing incomplete results. Throw with the caller-supplied + # context so operators can diagnose the affected query. + if ($null -eq $PageInfo) { + throw "${Context}: GraphQL pageInfo invariant violated (pageInfo is null — connection response is malformed)." + } + # PowerShell silently returns $null for a missing property, which + # would let a malformed pageInfo lacking `hasNextPage` coerce to + # $false and silently truncate pagination. Detect the property's + # presence explicitly via PSObject.Properties so a missing field + # is a thrown invariant violation, not a silent stop. + $hasNextPageProp = $PageInfo.PSObject.Properties['hasNextPage'] + if (-not $hasNextPageProp) { + throw "${Context}: GraphQL pageInfo invariant violated (pageInfo present but 'hasNextPage' property is missing — malformed Relay connection)." + } + # Validate that hasNextPage is actually a Boolean (GraphQL spec + # requires Boolean!). [bool]$value would happily turn the string + # 'false' into $true (any non-empty string is truthy in + # PowerShell), so a bad GraphQL response could silently flip + # the pagination decision. ConvertFrom-Json materialises JSON + # booleans as System.Boolean, so anything else here means the + # response shape is corrupt and we must fail loudly. + $hasNextRaw = $hasNextPageProp.Value + if ($hasNextRaw -isnot [bool]) { + $rawType = if ($null -eq $hasNextRaw) { '' } else { $hasNextRaw.GetType().FullName } + throw "${Context}: GraphQL pageInfo invariant violated (hasNextPage is not a Boolean — got type $rawType, value '$hasNextRaw' — malformed Relay connection)." + } + $hasNext = $hasNextRaw + if ($hasNext) { + $endCursorProp = $PageInfo.PSObject.Properties['endCursor'] + if (-not $endCursorProp) { + throw "${Context}: GraphQL pageInfo invariant violated (hasNextPage=true but 'endCursor' property is missing)." + } + if ([string]::IsNullOrEmpty([string]$endCursorProp.Value)) { + throw "${Context}: GraphQL pageInfo invariant violated (hasNextPage=true but endCursor is null/empty)." + } + } +} + + +# Both-or-neither contract: passing exactly one of -Owner/-Repo is rejected, +# because mixing a caller-supplied owner with a locally-detected repo (or vice +# versa) silently constructs a non-existent or unintended `/` pair. +function Resolve-RepoCoords { + param([string]$Owner, [string]$Repo) + # Treat whitespace-only as "not provided" so `Owner=' '` can't satisfy + # the presence check or trigger the partial-override guard. Mirror the + # both-or-neither contract on the *meaningful* presence of each value. + $ownerSet = -not [string]::IsNullOrWhiteSpace($Owner) + $repoSet = -not [string]::IsNullOrWhiteSpace($Repo) + if ($ownerSet -ne $repoSet) { + # Escape embedded single quotes (PowerShell single-quoted-string + # convention: double them) so a value containing `'` produces an + # unambiguous diagnostic instead of one that looks truncated or + # mid-quote. Matches the same escape pattern in the GH_HOST + # invalid-host error earlier in this file. + $ownerEsc = if ($null -ne $Owner) { $Owner.Replace("'", "''") } else { '' } + $repoEsc = if ($null -ne $Repo) { $Repo.Replace("'", "''") } else { '' } + throw "Resolve-RepoCoords: pass both -Owner and -Repo, or neither (got Owner='$ownerEsc' Repo='$repoEsc'). Partial override would silently mix caller and local repo coordinates." + } + if ($ownerSet -and $repoSet) { return @{ Owner = $Owner.Trim(); Repo = $Repo.Trim() } } + $r = Invoke-Gh -GhArgs @('repo','view','--json','owner,name') + if ($r.ExitCode -ne 0) { + # Surface stdout alongside stderr — gh repo view failure modes + # (auth handshake errors, repo-not-found responses) frequently + # land actionable detail on stdout while stderr only carries + # the exit envelope. Mirror the Invoke-GhGraphQL stderr+stdout + # truncated-preview pattern so operators always see the most + # diagnostic stream from a single failure. Redact credentials + # AND cap each stream at $previewChars before embedding — gh + # auth-handshake failures can echo Authorization headers / PATs, + # and a verbose payload would otherwise produce a multi-MB throw. + # Delegate to Format-CopilotPrAutopilotErrorPreview for the same + # redact → trim → truncate → empty-sentinel formatting used by + # every other throw site (single source of truth). + $previewChars = 500 + $errPart = Format-CopilotPrAutopilotErrorPreview -Text $r.Stderr -MaxChars $previewChars + $outPart = Format-CopilotPrAutopilotErrorPreview -Text $r.Stdout -MaxChars $previewChars + throw "gh repo view failed (exit $($r.ExitCode)) (credentials redacted): stderr=$errPart stdout=$outPart. Pass -Owner and -Repo explicitly, or run from inside a gh-detected repo." + } + $info = ConvertFrom-GhJson -Stdout $r.Stdout -Stderr $r.Stderr -Context 'gh repo view' + if (-not ($info -and $info.owner -and $info.owner.login -and $info.name)) { + throw "gh repo view returned unexpected shape (missing owner.login or name); cannot auto-resolve repo coordinates. Pass -Owner and -Repo explicitly." + } + @{ Owner = $info.owner.login; Repo = $info.name } +} + +# Format-IsoUtcString — centralise the ISO-8601 UTC normalisation that +# 01-request-review.ps1 (events.created_at), 02-check-review-status.ps1 +# (reviews.submittedAt), and 03-list-open-threads.ps1 (comments.createdAt) +# all need to perform. PowerShell 7+ `ConvertFrom-Json` auto-deserialises +# ISO timestamp strings to `[datetime]` (Windows PowerShell 5.1 leaves +# them as `[string]`), and `[datetime].ToString()` is culture-dependent +# and NOT round-trippable as ISO-8601. We handle BOTH input shapes here: +# - [datetime] (PS 7+ auto-conversion path) → format via InvariantCulture +# - [string] (PS 5.1 path, OR pre-serialised values from caller) +# → pass through verbatim (already ISO). +# Calling `.ToUniversalTime().ToString('yyyy-MM-ddTHH:mm:ssZ')` keeps the +# on-wire JSON contract identical to the value GitHub originally sent +# on PS 7+, while the string passthrough preserves the original wire +# form on PS 5.1. Null, empty, OR whitespace-only inputs all normalise +# to '' so callers can rely on a strict empty-string sentinel for "no +# timestamp" regardless of host PS version. +function Format-IsoUtcString { + param($Value) + if ($null -eq $Value) { return '' } + # Use InvariantCulture so the formatted timestamp is locale-stable + # (e.g., Thai/Arabic cultures use non-Western numerals by default, + # and other cultures may apply different separators). The JSON + # consumers of this string require strict ISO-8601. + if ($Value -is [datetime]) { + return $Value.ToUniversalTime().ToString('yyyy-MM-ddTHH:mm:ssZ', [System.Globalization.CultureInfo]::InvariantCulture) + } + $s = [string]$Value + if ([string]::IsNullOrWhiteSpace($s)) { return '' } + return $s +} + +# Run the prerequisite check as a side-effect of dot-sourcing. +Assert-GhReady