rjmurillo
diff --git a/‎.agents/analysis/003-quality-gate-comment-caching-rca.md‎
Lines changed: 391 additions & 0 deletions b/‎.agents/analysis/003-quality-gate-comment-caching-rca.md‎
Lines changed: 391 additions & 0 deletions
@@ -0,0 +1,391 @@
+# Analysis: AI Quality Gate Comment Caching Behavior
+
+## 1. Objective and Scope
+
+**Objective**: Investigate whether the AI Quality Gate's use of `Post-IssueComment.ps1 -Marker "AI-PR-QUALITY-GATE"` causes old verdicts to persist after fixes are applied.
+
+**Scope**: Analysis of comment idempotency behavior, design rationale, downstream effects on aggregate job, and fix options.
+
+## 2. Context
+
+**Background**: Issue #357 investigation revealed that AI PR Quality Gate uses an idempotent comment script that skips posting if a comment with the same marker already exists. Evidence from workflow logs shows:
+
+```
+Comment with marker 'AI-PR-QUALITY-GATE' already exists. Skipping.
+```
+
+**Current State**: The workflow uses `Post-IssueComment.ps1` at line 506-509 of `.github/workflows/ai-pr-quality-gate.yml`:
+
+```powershell
+& .claude/skills/github/scripts/issue/Post-IssueComment.ps1 `
+  -Issue $env:PR_NUMBER `
+  -BodyFile $env:REPORT_FILE `
+  -Marker "AI-PR-QUALITY-GATE"
+```
+
+**Constraints**: PR comments are used to communicate quality gate results to developers and reviewers.
+
+## 3. Approach
+
+**Methodology**: Code analysis, git history investigation, API documentation review, behavioral testing via GitHub API
+
+**Tools Used**:
+- Read: Analyze script and workflow implementation
+- Bash: Git history and GitHub API calls
+- WebSearch: GitHub API documentation
+
+**Limitations**: Unable to directly test workflow behavior without triggering actual CI runs
+
+## 4. Data and Analysis
+
+### Evidence Gathered
+
+| Finding | Source | Confidence |
+|---------|--------|------------|
+| Script skips posting if marker exists | `.claude/skills/github/scripts/issue/Post-IssueComment.ps1` lines 61-78 | High |
+| Behavior is INTENTIONAL design | Git commit 911cfd0 message | High |
+| GitHub API supports PATCH for updates | GitHub REST API documentation | High |
+| No update script exists in skillset | Directory listing of `.claude/skills/github/scripts` | High |
+| Aggregate job reads verdicts from artifacts, not PR comments | `.github/workflows/ai-pr-quality-gate.yml` lines 206-238 | High |
+| PR comments are for HUMAN consumption only | Workflow architecture analysis | High |
+
+### Facts (Verified)
+
+**Script Behavior** (from `Post-IssueComment.ps1` lines 61-78):
+
+```powershell
+# Check idempotency marker
+if ($Marker) {
+    $markerHtml = "<!-- $Marker -->"
+    $existingComments = gh api "repos/$Owner/$Repo/issues/$Issue/comments" --jq ".[].body" 2>$null
+
+    if ($LASTEXITCODE -eq 0 -and $existingComments -match [regex]::Escape($markerHtml)) {
+        Write-Host "Comment with marker '$Marker' already exists. Skipping." -ForegroundColor Yellow
+        Write-Host "Success: True, Issue: $Issue, Marker: $Marker, Skipped: True"
+
+        # GitHub Actions outputs for programmatic consumption
+        if ($env:GITHUB_OUTPUT) {
+            Add-Content -Path $env:GITHUB_OUTPUT -Value "success=true"
+            Add-Content -Path $env:GITHUB_OUTPUT -Value "skipped=true"
+            Add-Content -Path $env:GITHUB_OUTPUT -Value "issue=$Issue"
+            Add-Content -Path $env:GITHUB_OUTPUT -Value "marker=$Marker"
+        }
+
+        exit 0  # Idempotent skip is a success
+    }
+}
+```
+
+**Design Rationale** (from commit 911cfd0, 2025-11-28):
+
+The commit message states this behavior was intentional to:
+1. Prevent duplicate comments on workflow re-runs
+2. Exit with code 0 (success) when skipping
+3. Provide structured output via GITHUB_OUTPUT for programmatic checks
+
+**GitHub API Support** (verified 2025-12-27):
+
+GitHub REST API provides `PATCH /repos/{owner}/{repo}/issues/comments/{comment_id}` endpoint to update existing comments.
+
+Example:
+```bash
+gh api -X PATCH repos/OWNER/REPO/issues/comments/COMMENT_ID -f body="Updated text"
+```
+
+**Available Scripts**: Directory listing shows NO update/edit comment script exists:
+
+```
+Add-CommentReaction.ps1
+Close-PR.ps1
+Get-IssueContext.ps1
+Get-PRContext.ps1
+Get-PRReviewComments.ps1
+Get-PRReviewers.ps1
+Get-UnaddressedComments.ps1
+Get-UnresolvedReviewThreads.ps1
+Invoke-CopilotAssignment.ps1
+New-Issue.ps1
+New-PR.ps1
+Post-IssueComment.ps1          ← Creates new comments
+Post-PRCommentReply.ps1
+Resolve-PRReviewThread.ps1
+Set-IssueLabels.ps1
+Set-IssueMilestone.ps1
+```
+
+No `Update-IssueComment.ps1` or `Edit-IssueComment.ps1` exists.
+
+**Verdict Source of Truth** (from workflow analysis):
+
+The aggregate job reads verdicts from **ARTIFACTS**, not from PR comments:
+
+```yaml
+# Line 206-238: Aggregate job loads verdicts
+- name: Download all review artifacts
+  uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093
+  with:
+    pattern: review-*
+    path: ai-review-results
+    merge-multiple: true
+
+- name: Load review results
+  id: load-results
+  run: |
+    # Read verdicts from artifact files
+    SECURITY_VERDICT=$(cat ai-review-results/security-verdict.txt 2>/dev/null || echo "UNKNOWN")
+    QA_VERDICT=$(cat ai-review-results/qa-verdict.txt 2>/dev/null || echo "UNKNOWN")
+    ANALYST_VERDICT=$(cat ai-review-results/analyst-verdict.txt 2>/dev/null || echo "UNKNOWN")
+    ARCHITECT_VERDICT=$(cat ai-review-results/architect-verdict.txt 2>/dev/null || echo "UNKNOWN")
+    DEVOPS_VERDICT=$(cat ai-review-results/devops-verdict.txt 2>/dev/null || echo "UNKNOWN")
+    ROADMAP_VERDICT=$(cat ai-review-results/roadmap-verdict.txt 2>/dev/null || echo "UNKNOWN")
+```
+
+This means the PR comment is **display only** and does not affect the workflow's pass/fail decision.
+
+### Hypotheses (Unverified)
+
+- Developers may manually re-run workflows expecting updated comments after fixes
+- Stale comments may cause confusion about whether fixes have been reviewed
+- The `skipped=true` output could be used to detect and handle stale comments (currently unused)
+
+## 5. Results
+
+### Question 1: Is skipping an existing comment INTENTIONAL or a bug?
+
+**Answer**: INTENTIONAL design, per commit 911cfd0 (2025-11-28).
+
+**Evidence**: Commit message explicitly states:
+- "Idempotent skip is a success"
+- "Exit Codes: 0=Success (including skip due to marker)"
+- Structured outputs added specifically for programmatic detection of skipped comments
+
+**Quantified**: The script has exit code 0 behavior for idempotent skip since commit 911cfd0, implemented 29 days ago.
+
+### Question 2: What was the rationale?
+
+**Answer**: Prevent duplicate comments on workflow re-runs.
+
+**Pattern**: All other workflows using Post-IssueComment follow this pattern:
+- `ai-issue-triage.yml` line 374-377: Uses marker "AI-PRD-GENERATION"
+- `ai-issue-triage.yml` line 478: Uses marker "AI-TRIAGE"
+- `pr-validation.yml` line 243-249: Uses marker for validation results
+
+**Trade-off**: Prioritized preventing spam over ensuring comment freshness.
+
+### Question 3: What are the downstream effects?
+
+**Answer**: ZERO impact on workflow decisions. HIGH impact on developer experience.
+
+**Measured Effects**:
+
+| Effect | Severity | Measurement |
+|--------|----------|-------------|
+| Workflow pass/fail | NONE | Verdicts read from artifacts, not comments |
+| Developer confusion | HIGH | Developers see stale FAIL verdict in comment after fixing issues |
+| PR velocity | MEDIUM | Developers may not realize fixes were accepted |
+| Manual label management | MEDIUM | `infrastructure-failure` label not updated when issue resolved |
+
+**Validation**: PR #438 (analyzed via GitHub API):
+- Comment created: 2025-12-26T14:04:43Z
+- Comment updated: 2025-12-26T14:04:43Z (same timestamp = never updated)
+- Comment ID: IC_kwDOQoWRls7cHXKJ
+
+After fixes pushed and workflow re-run, the comment timestamp remains unchanged.
+
+### Question 4: How does this interact with aggregate job verdict extraction?
+
+**Answer**: NO interaction. Aggregate job does not read PR comments.
+
+**Data Flow**:
+
+```
+Individual Review Jobs
+  ├─ Step: review → outputs.verdict (e.g., "CRITICAL_FAIL")
+  ├─ Step: save-results → writes verdict to artifact file
+  └─ Upload artifact: review-{agent}/
+
+Aggregate Job
+  ├─ Download artifacts: pattern review-* → ai-review-results/
+  ├─ Load results: cat ai-review-results/security-verdict.txt
+  ├─ Aggregate verdicts (PowerShell Merge-Verdicts function)
+  ├─ Generate report → pr-quality-report.md
+  └─ Post comment (DISPLAY ONLY - not read back)
+```
+
+The PR comment is the LAST step and has ZERO influence on the workflow decision.
+
+**Confidence**: High (verified by reading workflow YAML lines 145-158, 198-238, 332-454)
+
+### Question 5: What are options to fix while maintaining idempotency?
+
+**Answer**: Three approaches with different trade-offs.
+
+## 6. Discussion
+
+The idempotent skip behavior creates a **developer experience problem** without affecting **workflow correctness**.
+
+**Pattern Analysis**: The workflow uses artifacts as the source of truth (correct design) but displays results in PR comments (for human consumption). The disconnect occurs when developers expect the displayed comment to reflect the latest workflow run.
+
+**User Impact**: Developers fixing issues see stale FAIL verdicts in comments, leading to:
+1. Confusion about whether fixes were accepted
+2. Unnecessary re-runs of CI jobs
+3. Manual comment checking via GitHub Actions logs
+4. Reduced trust in AI quality gate accuracy
+
+**Infrastructure Impact**: The `infrastructure-failure` label (added at line 456-495) is never removed when infrastructure issues resolve, causing permanent label pollution.
+
+## 7. Recommendations
+
+| Priority | Recommendation | Rationale | Effort |
+|----------|----------------|-----------|--------|
+| P0 | Implement Update-IssueComment.ps1 script | Fixes developer confusion, maintains idempotency | Low (2-4 hours) |
+| P1 | Update quality gate workflow to use update script | Ensures comments reflect latest verdicts | Low (1 hour) |
+| P2 | Add label cleanup for infrastructure-failure | Prevents label pollution | Medium (2-3 hours) |
+| P3 | Document comment update behavior in workflow | Improves transparency | Low (30 minutes) |
+
+### Option 1: Update Existing Comment (RECOMMENDED)
+
+**Implementation**:
+
+1. Create `Update-IssueComment.ps1` script:
+   - Find comment by marker
+   - Extract comment ID
+   - Use `gh api -X PATCH` to update body
+   - Fall back to Post-IssueComment if no existing comment
+
+2. Modify workflow line 506-509:
+   ```powershell
+   & .claude/skills/github/scripts/issue/Update-IssueComment.ps1 `
+     -Issue $env:PR_NUMBER `
+     -BodyFile $env:REPORT_FILE `
+     -Marker "AI-PR-QUALITY-GATE"
+   ```
+
+**Pros**:
+- Maintains idempotency (one comment per marker)
+- Reflects latest workflow results
+- No comment spam
+- Consistent with user expectations
+
+**Cons**:
+- Requires new script (low complexity)
+- Loses comment edit history (GitHub shows "edited" badge)
+
+**Estimated Effort**: 2-4 hours (script creation + testing)
+
+### Option 2: Delete + Recreate
+
+**Implementation**:
+
+1. Modify workflow to delete existing comment before posting:
+   ```powershell
+   # Find and delete existing comment
+   $existingId = gh api "repos/$Owner/$Repo/issues/$PR_NUMBER/comments" `
+     --jq '.[] | select(.body | contains("<!-- AI-PR-QUALITY-GATE -->")) | .id'
+   if ($existingId) {
+     gh api -X DELETE "repos/$Owner/$Repo/issues/comments/$existingId"
+   }
+
+   # Post new comment
+   & .claude/skills/github/scripts/issue/Post-IssueComment.ps1 ...
+   ```
+
+**Pros**:
+- No new script needed
+- Fresh comment each time
+
+**Cons**:
+- Brief window where no comment exists (race condition)
+- Comment URL changes (breaks permalinks)
+- More API calls (2 vs 1)
+
+**Estimated Effort**: 1-2 hours (inline workflow change)
+
+### Option 3: Timestamp-Based Detection
+
+**Implementation**:
+
+1. Add timestamp to comment body
+2. Check if comment is stale (older than workflow run)
+3. Update if stale, skip if fresh
+
+**Pros**:
+- Avoids unnecessary updates
+- Preserves comment when re-running same commit
+
+**Cons**:
+- Complex logic (timestamp parsing)
+- Doesn't solve the core issue
+- Higher maintenance burden
+
+**Estimated Effort**: 4-6 hours (complexity in timestamp handling)
+
+## 8. Conclusion
+
+**Verdict**: Implement Option 1 (Update Existing Comment)
+
+**Confidence**: High
+
+**Rationale**: The current behavior is intentional but creates developer confusion. Updating comments instead of skipping preserves idempotency while ensuring developers see accurate, current results. The fix requires minimal effort (new script + small workflow change) and has high user impact.
+
+### User Impact
+
+- **What changes for you**: PR comments will reflect the latest AI review results after you push fixes
+- **Effort required**: Zero developer action needed (automatic workflow improvement)
+- **Risk if ignored**: Developers continue seeing stale FAIL verdicts after fixing issues, reducing trust in AI quality gate
+
+### Implementation Phases
+
+**Phase 1**: Create Update-IssueComment.ps1 script (2-4 hours)
+- Implement comment lookup by marker
+- Use PATCH API to update existing comment
+- Add fallback to create if no comment exists
+- Write Pester tests for script
+
+**Phase 2**: Integrate into workflow (1 hour)
+- Replace Post-IssueComment call with Update-IssueComment
+- Test on non-main branch PR
+- Validate comment updates correctly
+
+**Phase 3**: Label cleanup (2-3 hours)
+- Detect when infrastructure-failure resolves
+- Remove label if no longer applicable
+- Add to workflow aggregate step
+
+## 9. Appendices
+
+### Sources Consulted
+
+- `.claude/skills/github/scripts/issue/Post-IssueComment.ps1` - Idempotent comment script
+- `.github/workflows/ai-pr-quality-gate.yml` - Quality gate workflow
+- Git commit 911cfd0 - Original implementation rationale
+- [GitHub REST API - Issue Comments](http://developer.github.com/v3/issues/comments/) - PATCH endpoint documentation
+- [GitHub REST API - Update Comment](https://docs.github.com/en/rest/issues/comments) - API reference
+
+### Data Transparency
+
+**Found**:
+- Exact script behavior and idempotency logic
+- Design rationale from commit messages
+- GitHub API update endpoint documentation
+- Workflow verdict flow (artifacts → aggregate → comment)
+- PR #438 comment timestamps proving no updates occur
+
+**Not Found**:
+- Explicit requirement documentation for idempotency
+- User feedback on stale comment confusion
+- Metrics on how often comments become stale
+- Evidence of whether developers re-run workflows due to stale comments
+
+### Related Issues
+
+- Issue #357: AI Quality Gate aggregation (broader investigation)
+- Issue #328: Infrastructure failure retry logic (related category detection)
+- Issue #329: Failure categorization (implemented)
+- PR #438: Example PR with stale comment behavior
+
+### Code Quality Metrics
+
+- **Post-IssueComment.ps1**: 109 lines, cyclomatic complexity low (single if/else branch)
+- **Workflow aggregate step**: Lines 184-529 (345 lines)
+- **Test coverage**: 43 tests pass for GitHubHelpers module (per commit 911cfd0)