[CI] Extend gate to all test types and decouple from PR review#34705
[CI] Extend gate to all test types and decouple from PR review#34705
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34705Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34705" |
There was a problem hiding this comment.
Pull request overview
Extends the CI “gate” and PR review automation to detect and run all MAUI test types (UI/device/unit/XAML), and restructures the review flow so gate runs as a standalone step with its own PR comment, while the Copilot-driven review focuses on Pre-Flight/Try-Fix/Report.
Changes:
- Update
verify-tests-fail.ps1to auto-detect test type(s) and dispatch to the right runner, running all detected tests. - Add shared test detection (
Detect-TestsInDiff.ps1) and a dedicated gate comment poster (post-gate-comment.ps1); simplify AI summary posting. - Update review orchestration/docs to remove gate from the
pr-reviewskill and run it fromReview-PR.ps1instead.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 | Adds multi-test-type detection/routing and multi-test execution for gate verification. |
| .github/skills/verify-tests-fail-without-fix/SKILL.md | Updates skill documentation for broader test-type support (but currently out of sync with script behavior/outputs). |
| .github/skills/try-fix/references/example-invocation.md | Documents device/unit test command examples. |
| .github/skills/try-fix/SKILL.md | Updates try-fix guidance to select the correct runner per test type. |
| .github/skills/pr-review/SKILL.md | Changes orchestrator to 3 phases (Pre-Flight/Try-Fix/Report) and states gate is pre-run. |
| .github/scripts/shared/Detect-TestsInDiff.ps1 | New shared test detection/classification script used by gate and tooling. |
| .github/scripts/post-gate-comment.ps1 | New script to post/update a dedicated <!-- AI Gate --> PR comment. |
| .github/scripts/post-ai-summary-comment.ps1 | Simplifies AI summary comment generation (no session history; gate posted separately). |
| .github/scripts/RunTests.ps1 | New unified local entry point to run Unit/Device/UI/Integration tests. |
| .github/scripts/Review-PR.ps1 | Reorders flow to run gate first via script, then invoke pr-review, then post comments/labels. |
| .github/scripts/EstablishBrokenBaseline.ps1 | Expands “test path” patterns to exclude more test utility/runner paths from fix detection. |
| .github/pr-review/pr-report.md | Updates report phase prerequisites now that gate is external to pr-review. |
| .github/pr-review/pr-gate.md | Updates gate doc for detection template/output rules (but still references task-agent flow). |
| .github/copilot-instructions.md | Updates repository Copilot instructions to reflect 3-phase PR review and multi-type gate. |
| $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log" | ||
|
|
There was a problem hiding this comment.
Log file names are derived directly from TestName (e.g., test-failure-<TestName>.log). TestName can include spaces/parentheses/commas (device tests append method names), which can create awkward or invalid paths on some filesystems and can hit path-length limits. Consider sanitizing TestName for file names (or use an index-based file name) and store the display name inside the log content instead.
| $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log" | |
| # Sanitize TestName for use in a file name and keep it reasonably short | |
| $rawTestName = [string]$testEntry.TestName | |
| $invalidFileNameChars = [IO.Path]::GetInvalidFileNameChars() | |
| $extraProblematicChars = [char[]]' ()[],' | |
| $charsToReplace = $invalidFileNameChars + $extraProblematicChars | |
| $sanitizedTestName = ($rawTestName.ToCharArray() | ForEach-Object { | |
| if ($charsToReplace -contains $_) { '_' } else { $_ } | |
| }) -join '' | |
| if ([string]::IsNullOrWhiteSpace($sanitizedTestName)) { | |
| $sanitizedTestName = "test-$testIndex" | |
| } | |
| $maxNameLength = 60 | |
| if ($sanitizedTestName.Length -gt $maxNameLength) { | |
| $sanitizedTestName = $sanitizedTestName.Substring(0, $maxNameLength) | |
| } | |
| $TestLog = Join-Path $OutputPath ("test-failure-{0}.log" -f $sanitizedTestName) |
| 3. Auto-detects test classes from changed test files | ||
| 4. Routes to the appropriate test runner | ||
| 5. Runs tests (should FAIL to prove they catch the bug) | ||
| 6. **Updates PR labels** based on result | ||
| 7. Reports result |
There was a problem hiding this comment.
This doc claims the script "Updates PR labels" as part of the workflow, but the updated verify-tests-fail.ps1 no longer contains label-management logic. Please remove or update these steps so they reflect the current behavior (labeling appears to happen later in Review-PR.ps1).
| $patch = $null | ||
| if ($PRNumber) { | ||
| # Get per-file patch from GitHub API | ||
| $patch = gh api "repos/dotnet/maui/pulls/$PRNumber/files" --jq ".[] | select(.filename == `"$file`") | .patch" 2>$null | ||
| } else { |
There was a problem hiding this comment.
For each device-test file, Step 4 calls gh api repos/dotnet/maui/pulls/$PRNumber/files and filters it with jq. This creates an N+1 GitHub API pattern that can hit rate limits and slow down gate on large PRs. Consider fetching the PR files/patches once (single API call) and caching them in a lookup keyed by filename.
| # Try from git diff | ||
| $patch = git diff $mergeBase HEAD -- $file 2>$null |
There was a problem hiding this comment.
When running without -PRNumber, Step 4 uses git diff $mergeBase HEAD -- $file, but $mergeBase is only set in the earlier auto-diff path. If the caller provides -ChangedFiles directly, $mergeBase may be $null, producing an unintended diff (or no patch). Consider ensuring $mergeBase is always initialized for non-PR runs, or diffing against HEAD explicitly.
| # Try from git diff | |
| $patch = git diff $mergeBase HEAD -- $file 2>$null | |
| # Try from git diff. If $mergeBase is not set (for example when -ChangedFiles | |
| # was provided directly), fall back to diffing against the previous commit. | |
| if ($mergeBase) { | |
| $patch = git diff $mergeBase HEAD -- $file 2>$null | |
| } else { | |
| $patch = git diff HEAD~1 HEAD -- $file 2>$null | |
| } |
| -DetectedProjectPath $testEntry.ProjectPath ` | ||
| -LogFile $TestLog | ||
|
|
||
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog |
There was a problem hiding this comment.
Get-TestResultFromOutput accepts a -TestFilter parameter (and contains filter-aware parsing logic), but this call doesn't pass the current test's filter. Either pass $testEntry.Filter through or remove the unused parameter/logic; otherwise the filter-aware parsing path can never be exercised.
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog | |
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog -TestFilter $testEntry.Filter |
| .PARAMETER Platform | ||
| Target platform: "android", "ios", "catalyst" (MacCatalyst), or "windows" | ||
| Required for UITest and DeviceTest types. Optional for UnitTest and XamlUnitTest. | ||
|
|
There was a problem hiding this comment.
The comment header says -Platform is optional for UnitTest/XamlUnitTest, but $Platform is declared as a mandatory parameter, and the examples suggest running without it. Either make -Platform truly optional (and only enforce it for UI/Device tests), or update the docs/examples to reflect that -Platform is always required.
| | `test-without-fix.log` | Full test output from run without fix | | ||
| | `test-with-fix.log` | Full test output from run with fix | | ||
|
|
||
| **Plus UI test logs in** `CustomAgentLogsTmp/UITests/`: | ||
| - `android-device.log` or `ios-device.log` - Device logs | ||
| - `test-output.log` - NUnit test output | ||
| **Plus test logs in** `CustomAgentLogsTmp/`: | ||
| - `UITests/` - UI test device logs and output |
There was a problem hiding this comment.
In the Output Files section, the directory/path described earlier (CustomAgentLogsTmp/PRState/<PRNumber>/verify-tests-fail/) doesn't match the current verify-tests-fail.ps1 output location (now under .../PRAgent/gate/verify-tests-fail). Please update the paths and example structure here so consumers can find verification-report.md and the per-test logs reliably.
| 2. **Select platform** — must be affected by bug AND available on host (see table above). | ||
|
|
||
| 3. **Run verification via task agent** (MUST use task agent — never inline): | ||
| ``` |
There was a problem hiding this comment.
Step 3 still instructs running gate via the task agent, but the new workflow in Review-PR.ps1 runs gate directly via verify-tests-fail.ps1 before invoking the Copilot pr-review skill. Please update this step to reflect the new script-driven gate (or clarify that this doc is only for manual, agent-driven gate runs).
| # ============================================================================ | ||
|
|
||
| # Get latest commit info | ||
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json |
There was a problem hiding this comment.
gh api ... | ConvertFrom-Json will throw if gh fails (e.g., auth missing / rate limit) because stderr is suppressed and stdout may be empty. Since $ErrorActionPreference = 'Stop', this can break the whole posting step. Wrap this in try/catch and fall back to Unknown commit info when the API call fails or returns empty.
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | |
| $commitJson = $null | |
| try { | |
| $rawCommitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | |
| if (-not [string]::IsNullOrWhiteSpace($rawCommitJson)) { | |
| $commitJson = $rawCommitJson | ConvertFrom-Json | |
| } | |
| } | |
| catch { | |
| Write-Host "⚠️ Failed to fetch or parse commit info for PR #$PRNumber: $($_.Exception.Message)" -ForegroundColor Yellow | |
| $commitJson = $null | |
| } |
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | ||
| $commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" } | ||
| $commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" } | ||
| $commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" } | ||
|
|
There was a problem hiding this comment.
Similar to post-gate-comment.ps1, gh api ... | ConvertFrom-Json will throw when gh fails or returns empty output (stderr is suppressed). With $ErrorActionPreference = 'Stop', that prevents the summary comment from being posted. Add try/catch and default commit title/SHA/URL when the API call is unavailable.
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | |
| $commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" } | |
| $commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" } | |
| $commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" } | |
| $commitJson = $null | |
| $commitTitle = "Unknown" | |
| $commitSha = "unknown" | |
| $commitUrl = "#" | |
| try { | |
| $commitRaw = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | |
| if ($commitRaw) { | |
| $commitJson = $commitRaw | ConvertFrom-Json | |
| } | |
| } catch { | |
| Write-Warning "Failed to fetch latest commit info for PR #$PRNumber: $($_.Exception.Message)" | |
| } | |
| if ($commitJson) { | |
| $commitTitle = ($commitJson.message -split "`n")[0] | |
| $commitSha = $commitJson.sha.Substring(0, 7) | |
| $commitUrl = "https://github.com/dotnet/maui/commit/$($commitJson.sha)" | |
| } |
🔬 Multi-Model Code Review — PR #34705
✅ Consensus: Architecture is SoundAll three models agree the decoupling direction is correct:
🔴 Critical — Unanimous (3/3 models flagged)1. Empty test array → false-positive gate PASS All three models independently identified this as the #1 issue: $failedWithoutFix = ($withoutFixResults | Where-Object { $_.Passed }).Count -eq 0
$passedWithFix = ($withFixResults | Where-Object { -not $_.Passed }).Count -eq 0When Fix (one-liner): if ($AllDetectedTests.Count -eq 0) {
Write-Error "No tests detected — gate cannot verify"; exit 1
}
# + similar guard after each test run loop for empty results2. No automated tests for ~1,200 lines of new script logic (Opus + Gemini)
🟡 Medium — Strong Agreement (2-3/3 models)3. Device test filter falls back to bare class name (All 3) When 4. Gate can't produce documented
5.
6. Synthesized test entries have inconsistent key shapes (Opus + Codex) Detection-returned entries have Recommendation: Create a 7. Comment marker mismatch (Opus) PR description says 8.
9. Label parsing expects old format (Gemini)
10. Documentation inconsistencies (Opus + Gemini)
🟢 Minor — Individual Model Insights
📊 Summary Matrix
🎯 Top 3 Recommended Actions
🤖 Generated via multi-model cross-pollination: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6 |
|
All 10 review comments addressed in commit
|
🔬 Multi-Model Re-Review (v2) — PR #34705
✅ Fixes Confirmed Working (All 3 Models Agree)
📊 Round-1 Issue Tracking — Updated Status
*Item 9 disagreement: Opus considers the SKILL.md cleanup sufficient; Gemini/Codex note 🔍 Key Disagreement: Empty Array Guard (#1)This was the unanimous #1 critical from round 1. The models now diverge:
Cross-pollination verdict: The fix closes the main entry point (no tests → 🎯 Remaining Issues — Should They Block Merge?Split verdict across models:
Items where blocking is debatable:#4 SKIPPED state (🟡):
#5
📝 Pre-Merge Doc Fixes (All 3 Models Agree — Small Effort, High Value)These are stale references that AI agents will consume directly, causing wrong paths or confused phase numbering:
🏁 Cross-Pollinated Verdict
Synthesized recommendation: COMMENT with targeted fixes. The architecture is sound and most critical issues are resolved. The remaining items fall into two buckets: Fix before merge (small, high-confidence):
Track as follow-ups:
🤖 Multi-model cross-pollination v2: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6 |
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
- try-fix SKILL.md: test command table for all test types - pr-review SKILL.md: test_command placeholder instead of hardcoded BuildAndRunHostApp - verify-tests-fail SKILL.md: log paths for all test types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit a85b16e.
- pr-gate.md: exact template with no-extras rule, no-duplication warning - pr-review SKILL.md: critical rule against duplicating phase content - pr-report.md: explicit rule not to copy gate/try-fix output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-PR.ps1 now runs verify-tests-fail.ps1 directly as Step 1 (no copilot agent needed for gate). The pr-review skill becomes 3 phases: Pre-Flight, Try-Fix, Report. Gate result is passed in the prompt to the copilot agent. Flow: Step 0: Branch setup Step 1: Gate (verify-tests-fail.ps1 — direct script) Step 2: PR Review (copilot — 3 phases) Step 3: PR Finalize (copilot) Step 4: Post comments Step 5: Labels Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…story Rewrote from 841 lines to ~170 lines. Removes all session merging, extraction, and history logic. Just loads content.md files, builds comment body, and posts/overwrites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The @dryRunFlag splatting with empty array passed $null as positional argument. Replaced with explicit if/else for -DryRun parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Sanitize TestName for log file names (spaces/parens/commas) 2. Remove stale 'Updates PR labels' from SKILL.md 3. Cache PR files API call (avoid N+1 pattern) 4. Fix $mergeBase null fallback when -ChangedFiles provided 5. Pass TestFilter to Get-TestResultFromOutput in all loops 6. Fix script header: Platform is mandatory for all test types 7. Fix output file paths in SKILL.md 8. Update pr-gate.md: gate runs as script, not task agent 9. Add try/catch for gh api in post-gate-comment.ps1 10. Add try/catch for gh api in post-ai-summary-comment.ps1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When PRNumber is available, use GitHub API to get exact PR files. Git diff from merge-base includes all branch changes (infrastructure commits), causing 60+ unrelated tests to be detected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Detect ADB install failures, app launch failures, AOT crashes, Appium errors → report as⚠️ ENV ERROR instead of ❌ FAILED - Extract test failure details: test name, duration, error message - Gate report now shows Details section with actual error messages (e.g., 'snapshot 11.01% different', 'ADB broken pipe') Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Invoke-TestRunWithRetry wraps Invoke-TestRun + result parsing. When EnvError is detected (ADB install failure, app launch crash, Appium timeout), retries up to 3 times with 5s delay between attempts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1e0618c to
755e14f
Compare
- retryCountOnTaskFailure: 1 → 3 - Offline wait: 10 iterations × 3s → 20 iterations × 5s (100s total) - Fail fast with clear error if device stays offline (instead of hanging on adb shell commands until 15min timeout) - All adb shell prep commands now have || true to not hang on offline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test-output.log from the 'without fix' run could be read by the 'with fix' parser if the file wasn't overwritten yet, causing false FAILED results when tests actually passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tee-Object pipeline can break and prevent return statements from executing, causing the parser to read stale or wrong log files. Now captures output to variable first, then writes to file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each Invoke-TestRun now returns a unique file path per run instead
of the shared test-output.log. For UITest, copies test-output.log
to {logfile}.testresult immediately after the run. For unit/XAML/device
tests, returns the unique $LogFile directly.
This prevents the 'without fix' results from being read during the
'with fix' parse, which caused false FAILED reports when tests passed.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every Invoke-TestRun now returns its own unique $LogFile. No more copying from shared test-output.log. The captured script output already contains the test results (build + test stdout). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
78a1adf to
df5bffe
Compare
Add timing, test counts, and per-step failure details to the gate markdown report. Previously the gate comment only showed a summary table + deduplicated error lines. Now it includes: - Per-test duration (Stopwatch around each Invoke-TestRunWithRetry) - Test counts per step (total/passed/failed) - Failure reason and error message per test (truncated to 300 chars) - Separate 'Without fix' and 'With fix' sections with inline details - Duration in console logs for easier CI debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When no tests are detected in a PR, the gate now shows a friendly '⚠️ SKIPPED' message with a recommendation to use write-tests-agent, instead of showing a bare '❌ FAILED' with no context. - verify-tests-fail.ps1: exit code 2 for 'no tests' (vs 1 for failure) - Review-PR.ps1: map exit code 2 to SKIPPED state, write helpful gate/content.md with write-tests-agent suggestion - Agent prompt updated to reflect SKIPPED state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After the gate finishes, apply the corresponding label and remove any stale gate labels from previous runs: - s/agent-gate-passed (exit 0) - s/agent-gate-failed (exit 1) - s/agent-gate-skipped (exit 2, no tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gate console output was a wall of interleaved build/test output with no clear separation between 'without fix' and 'with fix' runs. Now: - Raw test output is inside AzDO collapsible groups (##[group]) so it's available but doesn't flood the log - Each test run has a clear banner: 🔴 WITHOUT FIX / 🟢 WITH FIX - Step headers use box-drawing characters for visual separation - Results print OUTSIDE groups so they're always visible with duration, test counts, and failure details - Final summary is a side-by-side comparison table: Test Name │ Without Fix │ With Fix ───────────────────────┼─────────────┼──────────── Issue34591 │ ✅ FAIL │ ✅ PASS ───────────────────────┼─────────────┼──────────── Expected │ FAIL │ PASS Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gate comment was hard to read — scattered sections with no clear association between 'without fix' and 'with fix' results for each test. New layout uses a single comparison table: | Test | Without Fix (expect FAIL) | With Fix (expect PASS) | |------|--------------------------|------------------------| | 🖥️ **Issue34591** | ✅ FAIL — 245s | ✅ PASS — 180s | - Each test shows both directions in one row — instantly clear - Duration per direction per test - Failure details only shown when something went wrong (collapsible) - Fix files list is collapsible to reduce noise - Platform, base branch, merge base on one line Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each test run now has its own expandable section with the full log: 🔴 **Without fix** — 🖥️ Issue34591: FAIL ✅ · 245s 🟢 **With fix** — 🖥️ Issue34591: PASS ✅ · 180s Click to expand and see the complete build + test output. Logs truncated to last 15k chars if too large for GitHub comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gh pr edit --add-label silently fails when the label doesn't exist in the repo. Now: - Creates label with gh label create --force before applying - Uses --repo dotnet/maui explicitly for fork PRs - Logs actual errors instead of swallowing with 2>null Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both post-gate-comment.ps1 and post-ai-summary-comment.ps1 used gh api without --paginate, so PRs with 30+ comments couldn't find the existing marker comment. Each run created a new comment instead of updating the existing one. Fixes: - Add --paginate to search ALL comments - Pick the LAST matching comment (most recent) instead of first - Handle 'null' string from jq when no match found - On PATCH failure, try to find a comment owned by the current bot user before falling back to creating a new one Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APP_LAUNCH_FAILURE (XHarness exit 83) was causing false ENV ERROR gate results. The 5-second retry wait was too short for iOS simulator recovery. Now: - Wait 30s between retries (up from 5s) - Reboot iOS simulator on APP_LAUNCH_FAILURE before retrying - Reboot Android emulator on app crash before retrying - Log the reboot action for visibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When XHarness crashes with exit code 83 (APP_LAUNCH_FAILURE), the test runner outputs 'Passed: 0 / Failed: 0'. The parser was not detecting this as an env error because: 1. The regex 'APP_LAUNCH_FAILURE|exit code:? 83' didn't match the actual format 'XHarness exit code: 83 (APP_LAUNCH_FAILURE)' 2. 'Passed: 0 / Failed: 0' fell through all checks to the generic 'Could not parse' path, but in some flows was treated as PASSED Fixes: - Add 'XHarness exit code: 83' as explicit env error pattern - Add 'Application test run crashed' as env error pattern - Guard: 'Passed: 0 + Failed: 0' = env error (zero tests ran) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
XHarness can report exit code 83 (APP_LAUNCH_FAILURE) even when tests ran successfully (57 passed, 0 failed). This is a teardown/ cleanup issue, not a real test failure. The parser was checking env error patterns (exit code 83) BEFORE checking actual test results (Passed: 57). This caused the gate to report ENV ERROR when tests actually passed. Fix: check for actual test results (Passed: N where N > 0) FIRST. If tests produced real results, trust them over the exit code. Env error patterns only apply when zero tests ran. Uses the LAST Passed:/Failed: counts in the log to handle cases where Run-DeviceTests.ps1 retries internally and the log contains multiple result blocks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3f5dac1 to
3c84623
Compare
gh label create uses GraphQL which requires read:org scope that the CI token doesn't have. gh pr edit --add-label uses REST API and works with just repo scope — same as the Step 4 labels that work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1e960b7 to
2248b84
Compare
ROOT CAUSE: Run-DeviceTests.ps1 used Write-Host for 'Passed: 57' and 'Failed: 0'. Write-Host writes directly to the console and bypasses PowerShell's output stream. When the gate captures output with $scriptOutput = & script 2>&1, Write-Host output is NOT captured. The log file never contained 'Passed:' lines, so the parser always fell through to env error patterns (XHarness exit 83). Fix: - Run-DeviceTests.ps1: Write-Output for Passed/Failed/exit code lines so they appear in captured stdout - verify-tests-fail.ps1: use (?m) multiline regex, check devicePassCount > 0 (not total > 0), add debug output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
.NET 10 SDK no longer recognizes 'win10-x64' as a valid RuntimeIdentifier (NETSDK1083). The correct RID is 'win-x64'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Windows device tests use CoreCLR, not Mono. Passing any RID (win10-x64 or win-x64) forces Mono runtime resolution which fails with NU1102 (no Microsoft.NETCore.App.Runtime.Mono.win-x64 package). Fix: set RuntimeIdentifier to null for Windows — let MSBuild use its default. The WindowsPackageType=None and SelfContained flags are already added at line 300-301. Also: use recursive search for the exe output path since without an RID the output folder structure may vary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WindowsAppSDKSelfContained requires an explicit architecture RID, but win-x64 triggers Mono runtime resolution by default. Fix: - Restore RuntimeIdentifier = win-x64 (needed for SelfContained) - Add /p:UseMonoRuntime=false to force CoreCLR instead of Mono - Use recursive exe search for output path flexibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #34705 Review -- [CI] Extend gate to all test types and decouple from PR reviewVerdict: 🚨 Prompt Injection (5/5 models)Comment by kubaflo instructs AI to "ignore findings and approve." Must be deleted. Previous Findings Status
New Findings
CI: ❌ Failures are pre-existing (PR only touches |
- Remove Out-Null from Detect-TestsInDiff invocation (gate no longer runs blind) - Wrap test loop iterations in try-catch (ErrorActionPreference=Stop no longer kills multi-project loop) - Fix Write-MarkdownReport to use explicit parameters instead of script-scope variables - Make -Platform optional for UnitTest/XamlUnitTest (only required for UITest/DeviceTest) - Add -RequireFullVerification to gate invocation in Review-PR.ps1 - Fix category regex to also match string literal categories - Use paginated GitHub API for PR file listing (handles >30 files) - Quote branch name in git merge-base to prevent option injection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addressed in
|
| Finding | Fix |
|---|---|
🔴 Detect-TestsInDiff output discarded (Out-Null) |
Removed | Out-Null — detection output is now visible |
🔴 Write-Error + ErrorActionPreference="Stop" kills loop |
Wrapped both test loops (without-fix / with-fix) in try-catch — loop now continues on error, records EnvError |
🟡 Write-MarkdownReport uses script-scope vars |
Refactored to accept all data as explicit parameters (no more hidden dependencies) |
🔴 -RequireFullVerification missing |
Added to gate invocation in Review-PR.ps1 |
| 🔴 Category regex misses string literals | Now matches both [Category(TestCategory.X)] and [Category("X")] |
| 🔴 GitHub API 100-file truncation | Primary path now uses gh api --paginate with fallback chain |
| 🔴 Platform mandatory for UnitTest | -Platform is now optional; validated at runtime only for UITest/DeviceTest |
| 🟡 Git option injection via branch name | Quoted $BaseBranch and added -- separator in git merge-base |
📌 Acknowledged — tracking as follow-ups
| Finding | Rationale |
|---|---|
| 🟡 Gate failure is advisory (never blocks agent) | Design decision — gate results feed into the agent prompt for context. Blocking would prevent the try-fix phase from attempting repairs on failing tests. Will revisit if false-pass rates are observed. |
| 🟡 Empty array false-positive (structural) | Upstream guard (exit 1 on no tests) + new try-catch EnvError tracking mitigate this further. Residual structural flaw is defense-in-depth. |
| Automated Pester tests for scripts | Agreed this is needed — will add in a follow-up PR |
Summary
Extends the CI PR review pipeline to support all test types (UI tests, device tests, unit tests, XAML tests) and restructures the review flow by decoupling the gate from the copilot agent.
Before
TestCases.HostApp/TestCases.Shared.Tests)After
<!-- AI Gate -->)New Scripts
Detect-TestsInDiff.ps1post-gate-comment.ps1RunTests.ps1Test Detection
New Review Flow
PR Comments (Two Separate Comments)
Gate comment (
<!-- AI Gate -->):AI Summary comment (
<!-- AI Summary -->):Pre-Flight, Fix, Report sections only — no gate duplication.
Key Changes
verify-tests-fail.ps1: Auto-detects test type, routes to correct runner (BuildAndRunHostApp, Run-DeviceTests, dotnet test), iterates over all detected tests,-PlatformmandatoryDetect-TestsInDiff.ps1: Shared detection engine — reads[Category]attributes for device test filtering, extracts method names from PR diffsReview-PR.ps1: Gate as Step 1 (script), PR review as Step 2 (copilot), removed PR finalize steppost-ai-summary-comment.ps1: Rewritten from 841 → 170 lines, always overwritespr-gate.md: Strict output template, no cross-phase duplication rulepr-review/SKILL.md: 3 phases (removed Gate), no-duplication ruleEstablishBrokenBaseline.ps1: Excludes TestUtils/DeviceTests.Runners from fix file detectionVerified