diff --git a/.github/instructions/gh-aw-workflows.instructions.md b/.github/instructions/gh-aw-workflows.instructions.md index c8cd70ecc3be..7de8af04f07e 100644 --- a/.github/instructions/gh-aw-workflows.instructions.md +++ b/.github/instructions/gh-aw-workflows.instructions.md @@ -29,6 +29,8 @@ agent job: | Platform steps | ✅ Yes | ✅ Yes | ✅ Yes | Platform-controlled | | Agent container | ❌ Scrubbed | ❌ Scrubbed | ❌ Scrubbed | ✅ But sandboxed | +**⚠️ Agent container credential nuance:** `GITHUB_TOKEN` and `gh` CLI credentials are scrubbed inside the agent container. However, `COPILOT_TOKEN` (used for LLM inference) is present in the environment via `--env-all`. Any subprocess (e.g., `dotnet build`, `npm install`) inherits this variable. The AWF network firewall, `redact_secrets.cjs` (post-agent log scrubbing), and the threat detection agent limit the blast radius. See [Security Boundaries](#security-boundaries) below. + ### Step Ordering (Critical) User `steps:` **always run before** platform-generated steps. You cannot insert user steps after platform steps. @@ -48,6 +50,41 @@ By default, `gh aw compile` automatically injects a fork guard into the activati To **allow fork PRs**, add `forks: ["*"]` to the `pull_request` trigger in the `.md` frontmatter. The compiler removes the auto-injected guard from the compiled `if:` conditions. This is safe when the workflow uses the `Checkout-GhAwPr.ps1` pattern (checkout + trusted-infra restore) and the agent is sandboxed. +## Security Boundaries + +### Key Principles (from [GitHub Security Lab](https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/)) + +1. **Never execute untrusted PR code with elevated credentials.** The classic "pwn-request" attack is `pull_request_target` + checkout PR + run build scripts with `GITHUB_TOKEN`. The attack surface includes build scripts (`make`, `build.ps1`), package manager hooks (`npm postinstall`, MSBuild targets), and test runners. + +2. **Treating PR contents as passive data is safe.** Reading, analyzing, or diffing PR code is fine — the danger is *executing* it. Our gh-aw workflows read code for evaluation; they never build or run it. + +3. **`pull_request_target` grants write permissions and secrets access.** This is by design — the workflow YAML comes from the base branch (trusted). But any step that checks out and runs fork code in this context creates a vulnerability. + +4. **`pull_request` from forks has no secrets access.** GitHub withholds secrets because the workflow YAML comes from the fork (untrusted). This is the safe default for CI builds on fork PRs. + +5. **The `workflow_run` pattern separates privilege from code execution.** Build in an unprivileged `pull_request` job → pass artifacts → process in a privileged `workflow_run` job. This is architecturally what gh-aw does: agent runs read-only, `safe_outputs` job has write permissions. + +### gh-aw Defense Layers + +| Layer | What it does | What it doesn't do | +|-------|-------------|-------------------| +| **AWF network firewall** | Restricts outbound to allowlisted domains | Doesn't prevent reading env vars inside the container | +| **`redact_secrets.cjs`** | Scrubs known secret values from logs/artifacts post-agent | Doesn't catch encoded/obfuscated values | +| **Threat detection agent** | Reviews agent outputs before safe-outputs publishes them | Can miss novel exfiltration techniques | +| **Safe-outputs permission separation** | Write operations happen in separate job, not the agent | Agent can still request writes via safe-output tools | +| **`max: 1` on `add-comment`** | Limits agent to one comment | That one comment could contain sensitive data (mitigated by redaction) | +| **XPIA prompt** | Instructs LLM to resist prompt injection from untrusted content | LLM compliance is probabilistic, not guaranteed | +| **`pre_activation` role check** | Gates on write-access collaborators | Does not apply if `roles: all` is set | + +### Rules for gh-aw Workflow Authors + +- ✅ **DO** treat PR contents as passive data (read, analyze, diff) +- ✅ **DO** run data-gathering scripts in `steps:` (pre-agent, trusted context) not inside the agent +- ✅ **DO** use `Checkout-GhAwPr.ps1` for `workflow_dispatch` to restore trusted `.github/` from base +- ❌ **DO NOT** run `dotnet build`, `npm install`, or any build command on untrusted PR code inside the agent — build tool hooks (MSBuild targets, postinstall scripts) can read `COPILOT_TOKEN` from the environment +- ❌ **DO NOT** execute workspace scripts (`.ps1`, `.sh`, `.py`) after checking out a fork PR in `steps:` — those run with `GITHUB_TOKEN` +- ❌ **DO NOT** set `roles: all` on workflows that process PR content — this allows any user to trigger the workflow + ## Fork PR Handling ### The "pwn-request" Threat Model @@ -70,7 +107,7 @@ Reference: https://securitylab.github.com/resources/github-actions-preventing-pw For `/slash-command` triggers on fork PRs, `checkout_pr_branch.cjs` runs AFTER all user steps and re-checks out the fork branch. This overwrites any files restored by user steps (e.g., `.github/skills/`). A fork could include a crafted `SKILL.md` that alters the agent's evaluation behavior. -**Accepted residual risk:** The agent runs in a sandboxed container with all credentials scrubbed. The worst outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The agent has no ability to push code, access secrets, or exfiltrate data. The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`). +**Accepted residual risk:** The agent runs in a sandboxed container with `GITHUB_TOKEN` and `gh` CLI credentials scrubbed. `COPILOT_TOKEN` (for LLM inference) remains in the environment but the AWF network firewall restricts outbound connections to an allowlist of domains, `redact_secrets.cjs` scrubs known secret values from logs/outputs post-agent, and the threat detection agent reviews outputs before they are published. The worst practical outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`). **Upstream issue:** [github/gh-aw#18481](https://github.com/github/gh-aw/issues/18481) — "Using gh-aw in forks of repositories" diff --git a/.github/scripts/Checkout-GhAwPr.ps1 b/.github/scripts/Checkout-GhAwPr.ps1 index a2f9533bb7d2..d65c0cbe50f8 100644 --- a/.github/scripts/Checkout-GhAwPr.ps1 +++ b/.github/scripts/Checkout-GhAwPr.ps1 @@ -4,23 +4,23 @@ .DESCRIPTION Checks out a PR branch and restores trusted agent infrastructure (skills, - instructions) from the base branch. Works for both same-repo and fork PRs. + instructions) from the base branch. This gives the agent the PR's code + changes with the latest skills and instructions from main. - This script is only invoked for workflow_dispatch triggers. For pull_request - and issue_comment, the gh-aw platform's checkout_pr_branch.cjs handles PR - checkout automatically (it runs as a platform step after all user steps). - workflow_dispatch skips the platform checkout entirely, so this script is - the only thing that gets the PR code onto disk. + This script is only invoked for workflow_dispatch triggers. For + pull_request_target and issue_comment, the gh-aw platform's + checkout_pr_branch.cjs handles PR checkout automatically. + workflow_dispatch skips the platform checkout entirely, so this script + is the only thing that gets the PR code onto disk. - SECURITY NOTE: This script checks out PR code onto disk. This is safe - because NO subsequent user steps execute workspace code — the gh-aw - platform copies the workspace into a sandboxed container with scrubbed - credentials before starting the agent. The classic "pwn-request" attack - requires checkout + execution; we only do checkout. + SECURITY: Before checkout, the script verifies the PR is not from a + fork and that the author has write access (write, maintain, or admin). + Fork PRs are evaluated via pull_request_target instead (where the + platform handles checkout safely inside a sandboxed container). DO NOT add steps after this that run scripts from the workspace - (e.g., ./build.sh, pwsh ./script.ps1). That would create an actual - fork code execution vulnerability. See: + (e.g., ./build.sh, pwsh ./script.ps1). That would create a code + execution vulnerability. See: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ .NOTES @@ -42,9 +42,35 @@ if (-not $env:PR_NUMBER -or $env:PR_NUMBER -eq '0') { $PrNumber = $env:PR_NUMBER +# ── Verify PR is same-repo and author has write access ─────────────────────── + +$PrInfo = gh pr view $PrNumber --repo $env:GITHUB_REPOSITORY --json author,isCrossRepository --jq '{author: .author.login, isFork: .isCrossRepository}' | ConvertFrom-Json +if ($LASTEXITCODE -ne 0) { + Write-Host "❌ Failed to fetch PR #$PrNumber metadata" + exit 1 +} + +if (-not $PrInfo -or -not $PrInfo.author) { + Write-Host "❌ PR #$PrNumber returned empty or malformed metadata" + exit 1 +} + +$Permission = gh api "repos/$($env:GITHUB_REPOSITORY)/collaborators/$($PrInfo.author)/permission" --jq '.permission' +if ($LASTEXITCODE -ne 0) { + Write-Host "❌ Failed to check permissions for '$($PrInfo.author)'" + exit 1 +} + +$AllowedRoles = @('admin', 'write', 'maintain') +if ($Permission -notin $AllowedRoles) { + Write-Host "⏭️ PR author '$($PrInfo.author)' has '$Permission' access. workflow_dispatch only processes PRs from authors with write access." + exit 0 +} + +$IsFork = if ($PrInfo.isFork) { "fork" } else { "same-repo" } +Write-Host "✅ PR #$PrNumber by '$($PrInfo.author)' ($Permission access, $IsFork)" + # ── Save base branch SHA ───────────────────────────────────────────────────── -# Must be captured BEFORE checkout replaces HEAD. -# Exported for potential use by downstream platform steps (e.g., checkout_pr_branch.cjs) $BaseSha = git rev-parse HEAD if ($LASTEXITCODE -ne 0) { @@ -52,6 +78,7 @@ if ($LASTEXITCODE -ne 0) { exit 1 } Add-Content -Path $env:GITHUB_ENV -Value "BASE_SHA=$BaseSha" +Write-Host "Base branch SHA: $BaseSha" # ── Checkout PR branch ────────────────────────────────────────────────────── @@ -65,10 +92,9 @@ Write-Host "✅ Checked out PR #$PrNumber" git log --oneline -1 # ── Restore agent infrastructure from base branch ──────────────────────────── -# This script only runs for workflow_dispatch (other triggers use the platform's -# checkout_pr_branch.cjs instead). For workflow_dispatch the platform checkout is -# skipped, so this restore IS the final workspace state. -# rm -rf first to prevent fork-added files from surviving the restore. +# Replace skills and instructions with base branch versions to ensure the agent +# always uses trusted infrastructure from main. Uses git checkout to read files +# directly from the commit tree — works in shallow clones (no history traversal). if (Test-Path '.github/skills/') { Remove-Item -Recurse -Force '.github/skills/' } if (Test-Path '.github/instructions/') { Remove-Item -Recurse -Force '.github/instructions/' } diff --git a/.github/workflows/copilot-evaluate-tests.lock.yml b/.github/workflows/copilot-evaluate-tests.lock.yml index 5b5dc7f9b37d..70b131dae9ec 100644 --- a/.github/workflows/copilot-evaluate-tests.lock.yml +++ b/.github/workflows/copilot-evaluate-tests.lock.yml @@ -22,16 +22,16 @@ # # Evaluates test quality, coverage, and appropriateness on PRs that add or modify tests # -# gh-aw-metadata: {"schema_version":"v2","frontmatter_hash":"d671028235c1b911c7a816a257b07b02793a6b57747b4358f792af183e26ca07","compiler_version":"v0.62.2","strict":true} +# gh-aw-metadata: {"schema_version":"v2","frontmatter_hash":"1cead4a91eff14e2881035fceb96c10526d1265400b4b5d59c73951f208f9aa1","compiler_version":"v0.62.2","strict":true} name: "Evaluate PR Tests" "on": + # bots: # Bots processed as bot check in pre-activation job + # - copilot[bot] # Bots processed as bot check in pre-activation job issue_comment: types: - created - pull_request: - # forks: # Fork filtering applied via job conditions - # - "*" # Fork filtering applied via job conditions + pull_request_target: paths: - src/**/tests/** - src/**/test/** @@ -39,13 +39,17 @@ name: "Evaluate PR Tests" - opened - synchronize - reopened - - ready_for_review workflow_dispatch: inputs: pr_number: description: PR number to evaluate required: true type: number + suppress_comment: + default: false + description: Dry-run — evaluate but do not post a comment on the PR + required: false + type: boolean permissions: {} @@ -59,7 +63,7 @@ jobs: activation: needs: pre_activation if: > - (needs.pre_activation.outputs.activated == 'true') && ((github.event_name == 'pull_request' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && + (needs.pre_activation.outputs.activated == 'true') && ((github.event_name == 'pull_request_target' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && startsWith(github.event.comment.body, '/evaluate-tests'))) runs-on: ubuntu-slim @@ -132,6 +136,8 @@ jobs: - name: Compute current body text id: sanitized uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_ALLOWED_BOTS: copilot[bot] with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -151,6 +157,7 @@ jobs: GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} GH_AW_GITHUB_WORKSPACE: ${{ github.workspace }} + GH_AW_INPUTS_SUPPRESS_COMMENT: ${{ inputs.suppress_comment }} GH_AW_IS_PR_COMMENT: ${{ github.event.issue.pull_request && 'true' || '' }} run: | bash ${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh @@ -212,6 +219,7 @@ jobs: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt GH_AW_EXPR_93C755A4: ${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number }} GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_INPUTS_SUPPRESS_COMMENT: ${{ inputs.suppress_comment }} with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -231,6 +239,7 @@ jobs: GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} GH_AW_GITHUB_WORKSPACE: ${{ github.workspace }} + GH_AW_INPUTS_SUPPRESS_COMMENT: ${{ inputs.suppress_comment }} GH_AW_IS_PR_COMMENT: ${{ github.event.issue.pull_request && 'true' || '' }} GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED: ${{ needs.pre_activation.outputs.activated }} with: @@ -253,6 +262,7 @@ jobs: GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, GH_AW_GITHUB_RUN_ID: process.env.GH_AW_GITHUB_RUN_ID, GH_AW_GITHUB_WORKSPACE: process.env.GH_AW_GITHUB_WORKSPACE, + GH_AW_INPUTS_SUPPRESS_COMMENT: process.env.GH_AW_INPUTS_SUPPRESS_COMMENT, GH_AW_IS_PR_COMMENT: process.env.GH_AW_IS_PR_COMMENT, GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED: process.env.GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED } @@ -321,7 +331,7 @@ jobs: - env: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number }} - if: github.event_name == 'pull_request' + if: github.event_name == 'pull_request_target' name: Gate — skip if no test source files in diff run: "TEST_FILES=$(gh pr diff \"$PR_NUMBER\" --repo \"$GITHUB_REPOSITORY\" --name-only \\\n | grep -E '\\.(cs|xaml)$' \\\n | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \\\n || true)\nif [ -z \"$TEST_FILES\" ]; then\n echo \"⏭️ No test source files (.cs/.xaml) found in PR diff. Skipping evaluation.\"\n exit 1\nfi\necho \"✅ Found test files to evaluate:\"\necho \"$TEST_FILES\" | head -20\n" - env: @@ -989,7 +999,7 @@ jobs: pre_activation: if: > - (github.event_name == 'pull_request' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && + (github.event_name == 'pull_request_target' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && startsWith(github.event.comment.body, '/evaluate-tests')) runs-on: ubuntu-slim @@ -1006,6 +1016,7 @@ jobs: uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 env: GH_AW_REQUIRED_ROLES: admin,maintainer,write + GH_AW_ALLOWED_BOTS: copilot[bot] with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | diff --git a/.github/workflows/copilot-evaluate-tests.md b/.github/workflows/copilot-evaluate-tests.md index 854c2ec407d8..cd85ac9a4461 100644 --- a/.github/workflows/copilot-evaluate-tests.md +++ b/.github/workflows/copilot-evaluate-tests.md @@ -1,9 +1,8 @@ --- description: Evaluates test quality, coverage, and appropriateness on PRs that add or modify tests on: - pull_request: - types: [opened, synchronize, reopened, ready_for_review] - forks: ["*"] + pull_request_target: + types: [opened, synchronize, reopened] paths: - 'src/**/tests/**' - 'src/**/test/**' @@ -15,9 +14,16 @@ on: description: 'PR number to evaluate' required: true type: number + suppress_comment: + description: 'Dry-run — evaluate but do not post a comment on the PR' + required: false + type: boolean + default: false + bots: + - "copilot[bot]" if: >- - (github.event_name == 'pull_request' && github.event.pull_request.draft == false) || + (github.event_name == 'pull_request_target' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && github.event.issue.pull_request && @@ -57,7 +63,7 @@ timeout-minutes: 15 steps: - name: Gate — skip if no test source files in diff - if: github.event_name == 'pull_request' + if: github.event_name == 'pull_request_target' env: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number }} @@ -73,9 +79,10 @@ steps: echo "✅ Found test files to evaluate:" echo "$TEST_FILES" | head -20 - # Only needed for workflow_dispatch — for pull_request and issue_comment, + # Only needed for workflow_dispatch — for pull_request_target and issue_comment, # the gh-aw platform's checkout_pr_branch.cjs handles PR checkout automatically. # workflow_dispatch skips the platform checkout entirely, so we must do it here. + # The script gates on PR author having write access before checkout. - name: Checkout PR and restore agent infrastructure if: github.event_name == 'workflow_dispatch' env: @@ -110,11 +117,27 @@ If the file is **missing**, the fork PR branch is likely not rebased on the late ❌ **Cannot evaluate**: this PR's branch does not include the evaluate-pr-tests skill (`.github/skills/evaluate-pr-tests/SKILL.md` is missing). -**Fix**: rebase your fork on the latest `main` branch, or use the **workflow_dispatch** trigger (Actions tab → "Evaluate PR Tests" → "Run workflow" → enter PR number) which handles this automatically. +**Fix**: rebase your fork on the latest `main` branch and push again. The evaluation will trigger automatically once the skill file is available. ``` Then stop — do not proceed with the evaluation. +## Dry-run mode + +When triggered via `workflow_dispatch` with `suppress_comment` = `${{ inputs.suppress_comment }}`: +- If **true**, perform the full evaluation but **do not** post a comment on the PR. Write the evaluation to the workflow log only. This is useful for testing the skill without spamming the PR. +- If **false** (default), post the comment as normal. + +## When no action is needed + +If there is nothing to evaluate (PR has no test files, PR is a docs-only change, etc.), you **must** call the `noop` tool with a message explaining why: + +```json +{"noop": {"message": "No action needed: [brief explanation, e.g. 'PR contains no test files']"}} +``` + +Do not post a comment and do not silently exit — always use `noop` so the workflow run shows a clear reason. + ## Running the skill 1. Use `gh pr view ` to fetch PR metadata (title, body, labels, base branch). If `gh` CLI is unavailable, use the GitHub MCP tools instead. @@ -124,7 +147,9 @@ Then stop — do not proceed with the evaluation. ## Posting Results -Call `add_comment` with `item_number` set to the PR number. Wrap the report in a collapsible `
` block: +If dry-run mode is active (`suppress_comment` is true), log the evaluation report to stdout and stop — do **not** call `add_comment`. + +Otherwise, call `add_comment` with `item_number` set to the PR number. Wrap the report in a collapsible `
` block: ```markdown ## 🧪 PR Test Evaluation