Skip to content

feat(ci): add investigation-only claim backstop (ADR-034 Phase 3)#1214

Open
rjmurillo-bot wants to merge 6 commits intomainfrom
feat/652-autonomous
Open

feat(ci): add investigation-only claim backstop (ADR-034 Phase 3)#1214
rjmurillo-bot wants to merge 6 commits intomainfrom
feat/652-autonomous

Conversation

@rjmurillo-bot
Copy link
Collaborator

@rjmurillo-bot rjmurillo-bot commented Feb 20, 2026

Summary

Implements CI backstop for investigation-only QA skip claims per ADR-034 Phase 3 (Issue #652).

Changes

  • .github/scripts/validate_investigation_claims.py: Python script that validates commits claiming "SKIPPED: investigation-only" have only allowed files (.agents/sessions/, .agents/analysis/, .serena/memories/, etc.)
  • .github/scripts/tests/test_validate_investigation_claims.py: 22 pytest tests covering allowlist matching, claim detection, and validation logic
  • .github/workflows/investigation-claim-backstop.yml: GitHub Actions workflow triggered on session log changes
  • README.md: Updated project documentation

How It Works

  1. Workflow triggers on PRs that modify .agents/sessions/*.json
  2. Script identifies session logs with "SKIPPED: investigation-only" evidence
  3. For each claim, retrieves the commit and checks file list
  4. Reports violations where non-investigation files were committed

Test Plan

  • All 22 new tests pass
  • Ruff linting passes
  • Workflow follows ADR-006 thin workflows pattern
  • CI validates workflow syntax

Fixes #652 (P2: CI backstop acceptance criterion)

Note: P1 (agent self-service skill) was already completed in PR #811.

Add CI workflow to validate investigation-only QA skip claims.
Detects commits claiming "SKIPPED: investigation-only" but containing
files outside the allowed paths.

Files:
- validate_investigation_claims.py: Script to check claims
- test_validate_investigation_claims.py: 22 pytest tests
- investigation-claim-backstop.yml: CI workflow for PR validation

Partial fix for #652: Implements P2 CI backstop acceptance criterion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@rjmurillo-bot rjmurillo-bot enabled auto-merge (squash) February 20, 2026 08:25
@github-actions github-actions bot added enhancement New feature or request area-workflows GitHub Actions workflows automation Automated workflows and processes github-actions GitHub Actions workflow updates labels Feb 20, 2026
@github-actions
Copy link
Contributor

PR Validation Report

Caution

Status: FAIL

Description Validation

Check Status
Description matches diff FAIL

QA Validation

Check Status
Code changes detected True
QA report exists false

⚠️ Blocking Issues

  • PR description does not match actual changes

⚡ Warnings

  • QA report not found for code changes (recommended before merge)

Powered by PR Validation workflow

@coderabbitai coderabbitai bot requested a review from rjmurillo February 20, 2026 08:26
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a CI backstop for 'investigation-only' QA skip claims, ensuring that commits with such claims only modify allowed files. The changes include a Python script for validation, comprehensive unit tests, and a GitHub Actions workflow to trigger the validation. The implementation appears robust, with good test coverage for various scenarios. The use of subprocess.run for git commands is appropriate, and error handling for file parsing is present. The solution aligns with the ADR-034 Phase 3 requirements. Two security review comments regarding command injection prevention in subprocess.run calls have been retained.

@coderabbitai coderabbitai bot added agent-devops CI/CD pipeline agent agent-qa Testing and verification agent priority:P2 Normal: Standard enhancement or bug fix, moderate impact labels Feb 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Adds a CI backstop and validator for ADR-034 "investigation-only" skip claims: new script scans .agents/sessions/*.json, detects "SKIPPED: investigation-only" claims, maps sessions to commits, lists files in those commits, checks against an allowlist, reports violations; includes tests and a GitHub Actions workflow that posts warnings on violations.

Changes

Cohort / File(s) Summary
Validator script
.github/scripts/validate_investigation_claims.py
New CLI/library implementing allowlist matching, detection of "SKIPPED: investigation-only" in session JSON, resolution of commit SHAs for session files, listing files in a commit, aggregation of ClaimViolation/ValidationResult, JSON/text outputs, and GHA output variables.
Unit tests
.github/scripts/tests/test_validate_investigation_claims.py
New unit tests for file_matches_allowlist, session_claims_investigation_only (including malformed/missing JSON), and validate_investigation_claims flows (uses mocks for commit/file helpers).
GitHub Actions workflow
.github/workflows/investigation-claim-backstop.yml
New workflow triggered on PRs touching .agents/sessions/*.json; checks out code pinned to a SHA, runs validator on Python 3.12 (ubuntu-24.04-arm), 5m timeout, posts INVESTIGATION-CLAIM-BACKSTOP comment when violations found.
Docs / README
README.md
Minor troubleshooting text change: removed <code> markup around /install-plugin and added an alternative install bullet.
Traceability doc
.agents/governance/traceability-protocol.md
Updated workflow diagrams to insert an initial triage sequence and adjust placement of PRE-COMMIT validation and linking/comment steps.
Agent memory
.agents/memory/episodes/episode-2026-02-25-session-1.json
Added new memory episode JSON with metadata (id, timestamp, outcome: partial, metrics).

Sequence Diagram

sequenceDiagram
    participant PR as Pull Request
    participant GA as GitHub Actions
    participant VAL as Validator
    participant SESS as Session Files
    participant GIT as Git CLI
    participant API as GitHub Comment API

    PR->>GA: Trigger workflow for changed session JSONs
    GA->>VAL: Run validator targeting session-dir
    VAL->>SESS: Read session JSON files
    VAL->>VAL: Detect "SKIPPED: investigation-only" claims (regex)
    VAL->>GIT: git log -1 -- <session_file>  -> get commit SHA
    GIT-->>VAL: Return commit SHA
    VAL->>GIT: git show --name-only <commit> -> list changed files
    GIT-->>VAL: Return changed files
    VAL->>VAL: Compare changed files to allowlist -> collect violations
    VAL-->>GA: Emit verdict, violation_count, violations (stdout / outputs)
    alt Violations found
        GA->>API: Post INVESTIGATION-CLAIM-BACKSTOP comment with details
    else No violations
        GA-->>PR: Workflow completes without comment
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • rjmurillo-bot
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title follows conventional commit format with feat(ci) prefix and clearly summarizes the main change: adding an investigation-only claim backstop.
Description check ✅ Passed Description is directly related to the changeset, detailing the validation script, tests, workflow, and how the CI backstop works to enforce ADR-034.
Linked Issues check ✅ Passed PR implements P2 CI backstop acceptance criterion from #652: validates investigation-only claims against allowlist, flags violations, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes align with #652 objectives except README.md (minor documentation fix) and traceability-protocol.md (process doc update) which are tangential but harmless.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/652-autonomous

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added area-infrastructure Build, CI/CD, configuration agent-security Security assessment agent labels Feb 20, 2026
Remove <code> tag inside <details><summary> that violated markdownlint
MD033 rule (no-inline-html). Plain text is sufficient in this context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the infrastructure-failure CI infrastructure failure (Copilot CLI auth, rate limits, etc.) label Feb 21, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

AI Quality Gate Review

Warning

⚠️ Final Verdict: WARN

Walkthrough

This PR was reviewed by six AI agents in parallel, analyzing different aspects of the changes:

  • Security Agent: Scans for vulnerabilities, secrets exposure, and security anti-patterns
  • QA Agent: Evaluates test coverage, error handling, and code quality
  • Analyst Agent: Assesses code quality, impact analysis, and maintainability
  • Architect Agent: Reviews design patterns, system boundaries, and architectural concerns
  • DevOps Agent: Evaluates CI/CD, build pipelines, and infrastructure changes
  • Roadmap Agent: Assesses strategic alignment, feature scope, and user value

Review Summary

Agent Verdict Category Status
Security PASS N/A
QA PASS N/A
Analyst WARN N/A ⚠️
Architect WARN N/A ⚠️
DevOps PASS N/A
Roadmap PASS N/A

💡 Quick Access: Click on individual agent jobs (e.g., "🔒 security Review", "🧪 qa Review") in the workflow run to see detailed findings and step summaries.

Security Review Details

Now I have enough context to complete the security review.

Security Review: PR #1214

PR Type Classification

File Category
.github/scripts/validate_investigation_claims.py CODE
.github/scripts/tests/test_validate_investigation_claims.py CODE
.github/workflows/investigation-claim-backstop.yml WORKFLOW
.agents/sessions/2026-02-25-session-1.json CONFIG
.agents/memory/episodes/episode-2026-02-25-session-1.json CONFIG
.agents/governance/traceability-protocol.md DOCS
README.md DOCS

Primary categories: CODE, WORKFLOW - Full security review required.


Findings

Severity Category Finding Location CWE
Low Input Validation Ref validation only checks leading dash. Other git metacharacters (.., :) not blocked. validate_investigation_claims.py:228-230 CWE-20

Security Controls Implemented [PASS]

  1. CWE-78 Command Injection Prevention: Line 228-230 validates refs cannot start with - to prevent argument injection in git commands. This is the correct mitigation pattern.

  2. Subprocess Security: All subprocess calls use:

    • capture_output=True (no shell=True)
    • timeout=30 (prevents hang attacks)
    • check=False with proper exit code handling
    • Argument arrays (not string concatenation)
  3. Path Handling: Uses pathlib.Path throughout with glob() for file discovery. No user-controlled path traversal vectors.

  4. Workflow Permissions: Minimal permissions (contents: read, pull-requests: write). No write-all or overly permissive settings.

  5. Action Pinning: Both actions SHA-pinned with version comments per project standards:

    • actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.2.2
    • actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
  6. Bot Exclusion: Workflow excludes dependabot[bot] and github-actions[bot] to prevent automated actor abuse.

  7. Secret Handling: Uses ${{ github.token }} properly. No hardcoded credentials.

  8. Concurrency Control: Uses cancel-in-progress: true to prevent resource exhaustion.


Low-Severity Observation (Non-Blocking)

The _validate_ref() function at line 228 only blocks refs starting with -. While this prevents the primary argument injection vector, git refs with .. could theoretically be used for traversal in some git operations. However:

  • The refs are only used in git diff and git log where .. is valid syntax
  • Values come from CLI args with defaults, not untrusted user input in CI context
  • Risk is theoretical with no practical exploit path

Recommendation: No action required. Current validation is appropriate for the threat model.


Recommendations

None required. The implementation follows secure coding practices.


Verdict

VERDICT: PASS
MESSAGE: No security vulnerabilities found. Command injection prevention implemented (CWE-78). Actions SHA-pinned. Minimal workflow permissions. Subprocess calls use safe patterns with timeouts.
QA Review Details

Now let me check the README changes and verify test execution details:

Let me use view instead:

Based on my analysis, I have sufficient information to provide the QA verdict.


QA Review: PR #1214

PR TYPE: MIXED (CODE + WORKFLOW + DOCS)
FILES:
- CODE: .github/scripts/validate_investigation_claims.py, .github/scripts/tests/test_validate_investigation_claims.py
- WORKFLOW: .github/workflows/investigation-claim-backstop.yml
- DOCS: README.md, .agents/sessions/*.json, .agents/memory/episodes/*.json, .agents/governance/traceability-protocol.md

Test Coverage Assessment

Area Status Evidence Files Checked
Unit tests Adequate 22 tests in test_validate_investigation_claims.py validate_investigation_claims.py
Edge cases Covered Invalid JSON, case-insensitive matching, no-claim scenarios Lines 86-117
Error paths Tested Invalid JSON returns false (line 114-117), mock commit failures session_claims_investigation_only
Assertions Present Each test has explicit assert statements All 22 tests

Quality Concerns

Severity Issue Location Evidence Required Fix
LOW Duplicate allowlist patterns validate_investigation_claims.py:56-66 vs investigation_allowlist.py:19-29 Script defines _ALLOWLIST_PATTERNS locally but also imports from investigation_allowlist module Consider removing local duplicate to use single source of truth
LOW Missing test for _validate_ref validate_investigation_claims.py:228-230 CWE-78 guard function not directly tested Add unit test for dash-prefix rejection

Regression Risk Assessment

  • Risk Level: Low
  • Affected Components: New CI workflow, isolated validation script
  • Breaking Changes: None (additive feature)
  • Required Testing: pytest passed (4483 passed, 3 skipped)

Code Quality Review

Check Status Evidence
Function length [PASS] Largest function _run_session_validation ~60 lines
Cyclomatic complexity [PASS] No function exceeds 10 branches
Code duplication [PASS] Patterns use shared module
Magic numbers [PASS] Timeout value (30) is reasonable constant
Input validation [PASS] _validate_ref guards against CWE-78 injection (line 228-230)
Error handling [PASS] Subprocess failures logged, not swallowed (lines 108-110, 151)

Workflow Compliance (ADR-006)

Check Status Evidence
Thin workflow [PASS] Logic delegated to Python script
SHA-pinned actions [PASS] actions/checkout@34e114..., actions/setup-python@a26af6...
No ${{ }} in run blocks [PASS] Uses env: block for variables
Concurrency control [PASS] cancel-in-progress: true

Pre-executed Test Results

  • pytest: 4483 passed, 3 skipped, 5 warnings in 30.12s
  • Status: [PASS]

VERDICT: PASS
MESSAGE: New validation script has comprehensive test coverage (22 tests), follows ADR-006 thin workflow pattern, and all pre-executed tests pass.

PR TYPE: MIXED (CODE + WORKFLOW + DOCS)

EVIDENCE:
- Tests found: 22 tests for 8 functions in validate_investigation_claims.py
- Test execution: PASS (4483 passed, 3 skipped)
- Edge cases: Covered (invalid JSON, case-insensitive, no-claim, allowlist/reject patterns)
- Error handling: Tested (JSONDecodeError, subprocess failures return gracefully)
- Blocking issues: 0
Analyst Review Details

Let me check the local repository for these files directly.

Now I have enough context to complete the analysis. Let me also check the post_issue_comment.py script referenced in the workflow and examine the .agents/memory/episodes file that's part of the PR.

I now have sufficient context to complete the PR analysis. Let me produce the final report.


PR #1214 Analysis: Investigation-Only Claim Backstop (ADR-034 Phase 3)

Code Quality Score

Criterion Score (1-5) Notes
Readability 5 Clear docstrings, typed parameters, logical function decomposition
Maintainability 4 Good dataclasses, but duplicates allowlist patterns between _ALLOWLIST_PATTERNS and scripts/modules/investigation_allowlist.py
Consistency 5 Follows existing patterns: ADR-035 exit codes, ai_review_common imports, test structure
Simplicity 4 Dual-mode validation (session + diff) adds complexity but is well-documented

Overall: 4.5/5

Impact Assessment

  • Scope: Module-wide (adds new CI check, does not modify existing validators)
  • Risk Level: Low (advisory-only mode, always returns exit 0 for diff validation)
  • Affected Components:
    • .github/workflows/investigation-claim-backstop.yml (new workflow)
    • .github/scripts/validate_investigation_claims.py (new script)
    • scripts/modules/investigation_allowlist.py (reused shared module)
    • scripts/ai_review_common/ (existing dependency)

Findings

Priority Category Finding Location
Medium DRY Violation _ALLOWLIST_PATTERNS duplicates scripts/modules/investigation_allowlist.py patterns. Divergence risk over time. .github/scripts/validate_investigation_claims.py:56-66
Low Unused Code file_matches_allowlist() (lines 93-96) duplicates test_file_matches_allowlist() from shared module. Script imports shared module but also has local copy. .github/scripts/validate_investigation_claims.py:93-96
Low Test Coverage Tests mock git operations but do not test _run_diff_validation or main() CLI paths. .github/scripts/tests/test_validate_investigation_claims.py
Info Security CWE-78 mitigation present via _validate_ref() rejecting refs starting with -. .github/scripts/validate_investigation_claims.py:228-230
Info Pattern Alignment Workflow follows ADR-006 thin workflows pattern (logic in Python, YAML minimal). .github/workflows/investigation-claim-backstop.yml

Recommendations

  1. Remove duplicate allowlist constant: Import from scripts.modules.investigation_allowlist instead of defining _ALLOWLIST_PATTERNS locally. Single source of truth prevents drift.

  2. Remove file_matches_allowlist() local function: Use test_file_matches_allowlist() from imported module. Function already exists and is imported but unused in session validation path.

  3. Add tests for diff-mode validation: _run_diff_validation() and CLI main() paths lack direct test coverage. Integration tests with mocked git diff output would strengthen confidence.

Architectural Alignment

  • [PASS] Implements ADR-034 Phase 3 CI backstop requirement
  • [PASS] Uses ADR-035 exit code semantics (0=success, 1=violation)
  • [PASS] SHA-pinned GitHub Actions per AGENTS.md requirements
  • [PASS] Uses ARM runner per ADR-025 cost optimization
  • [PASS] Workflow excludes dependabot/github-actions bot actors

Verdict

VERDICT: WARN
MESSAGE: Code quality is good. Two DRY violations (duplicate allowlist constant and function) should be consolidated before merge to prevent future divergence. Not blocking but recommended fix.
Architect Review Details

I have sufficient information. Let me provide the architectural review.


Design Quality Assessment

Aspect Rating (1-5) Notes
Pattern Adherence 4 Follows ADR-006 thin workflows. Good use of dataclasses.
Boundary Respect 4 Clear separation between validation script and workflow.
Coupling 3 Duplicated allowlist patterns vs shared module.
Cohesion 4 Single responsibility: validate investigation claims.
Extensibility 4 Multiple validation modes (session, diff, both).

Overall Design Score: 4/5


Architectural Concerns

Severity Concern Location Recommendation
High Duplicated allowlist validate_investigation_claims.py:56-66 Remove _ALLOWLIST_PATTERNS and use only scripts.modules.investigation_allowlist
Medium Inconsistent allowlist usage validate_investigation_claims.py:93-96 Use test_file_matches_allowlist instead of local file_matches_allowlist
Medium Allowlist drift from ADR-034 investigation_allowlist.py:27-28 ADR-034 excludes .agents/critique/ but module includes it. Requires clarification.
Low Import path manipulation validate_investigation_claims.py:42-46 Standard pattern but fragile. Consider package installation.

Allowlist Consistency Analysis

The PR introduces a local _ALLOWLIST_PATTERNS constant that partially duplicates scripts/modules/investigation_allowlist.py:

Local patterns (validate_investigation_claims.py):

  • .agents/sessions/, .agents/analysis/, .agents/retrospective/
  • .serena/memories, .agents/security/, .agents/memory/
  • .agents/architecture/REVIEW-, .agents/critique/, .agents/memory/episodes/

Shared module (investigation_allowlist.py):

  • Identical patterns

The script imports test_file_matches_allowlist from the shared module but also defines file_matches_allowlist locally. The diff validation path uses the shared module (correct), but session validation uses the local copy (duplication risk).


ADR-034 Compliance Check

ADR-034 states .agents/critique/ should be excluded (consequential artifacts):

EXCLUDE: 5/6 agents flagged as loophole; consequential artifacts

However, investigation_allowlist.py:27 includes ^\.agents/critique/. This appears to be a pre-existing inconsistency, not introduced by this PR.


Breaking Change Assessment

  • Breaking Changes: No
  • Impact Scope: None
  • Migration Required: No

Technical Debt Analysis

  • Debt Added: Low (minor duplication)
  • Debt Reduced: Low (codifies ADR-034 Phase 3)
  • Net Impact: Neutral

ADR Assessment

  • ADR Required: No (implements existing ADR-034 Phase 3)
  • Decisions Identified: None. PR implements accepted architecture.
  • Existing ADR: ADR-034 (Investigation Session QA Exemption)
  • Recommendation: N/A

Recommendations

  1. Remove local _ALLOWLIST_PATTERNS: The file_matches_allowlist function duplicates test_file_matches_allowlist from the shared module. Consolidate to single source of truth.

  2. Verify .agents/critique/ policy: ADR-034 debate excluded critique artifacts, but shared module includes them. Confirm which is authoritative and align.

  3. Consider workflow step naming: Step name "Validate Investigation Claims" is descriptive. [PASS]


Verdict

VERDICT: WARN
MESSAGE: Duplicated allowlist pattern between local constant and shared module creates drift risk. Remove _ALLOWLIST_PATTERNS and use test_file_matches_allowlist consistently. ADR-034 critique policy inconsistency is pre-existing, not blocking.
DevOps Review Details

Let me also check the README changes and verify any shared module imports.

Now I have all the information I need to complete the DevOps review. Let me compile my analysis:


DevOps Review: PR #1214

PR Scope Detection

Category File Patterns Files
WORKFLOW .github/workflows/ investigation-claim-backstop.yml
SCRIPT .github/scripts/ validate_investigation_claims.py, tests/test_validate_investigation_claims.py
CONFIG .agents/sessions/*.json, .agents/memory/episodes/*.json Session artifacts
DOCS *.md README.md, traceability-protocol.md

Review Focus: Full CI/CD and script quality review for WORKFLOW and SCRIPT files.


Pipeline Impact Assessment

Area Impact Notes
Build None No build changes
Test Low Adds 22 new pytest tests
Deploy None No deployment changes
Cost Low ARM runner, 5min timeout

CI/CD Quality Checks

Check Status Location
YAML syntax valid investigation-claim-backstop.yml
Actions pinned to SHA Lines 33, 38 (SHA + version comment)
Secrets secure Uses github.token, not logged
Permissions minimal Line 14-16: contents: read, pull-requests: write
Concurrency configured Line 18-20: prevents duplicate runs
Timeout set Line 27: 5 minutes
Bot exclusion Line 28: skips dependabot/github-actions
Shell scripts robust Uses heredoc with 'EOF' for literal interpretation
Exit codes documented ADR-035 compliant (lines 10-13 of script)
Input validation Line 228-231: _validate_ref() prevents CWE-78

Findings

Severity Category Finding Location Fix
Low Redundancy _ALLOWLIST_PATTERNS duplicates scripts/modules/investigation_allowlist.py validate_investigation_claims.py:56-66 Use shared module only; remove local duplicate
Low Code quality file_matches_allowlist() duplicates test_file_matches_allowlist() from shared module validate_investigation_claims.py:93-96 Import and use test_file_matches_allowlist

Note: The script imports from scripts.modules.investigation_allowlist (line 49-52) but also defines a local _ALLOWLIST_PATTERNS and file_matches_allowlist(). This is intentional per comment on line 55: "Must match .claude/skills/session/scripts/test_investigation_eligibility.py". However, this creates drift risk. Consider consolidating to single source of truth.


Template Assessment

  • PR Template: Adequate (used correctly in PR description)
  • Issue Templates: N/A (not modified)
  • Template Issues: None

Automation Opportunities

Opportunity Type Benefit Effort
Consolidate allowlist patterns Refactor Prevent drift between modules Low

Recommendations

  1. Consider consolidating allowlist: The script has both local _ALLOWLIST_PATTERNS and imports from investigation_allowlist.py. Consider using only the shared module to prevent drift. Current design is acceptable but creates maintenance burden.

Verdict

VERDICT: PASS
MESSAGE: Workflow follows best practices (SHA-pinned actions, minimal permissions, proper error handling, CWE-78 mitigation). Minor code duplication is acceptable for current scope.

Details:

  • Actions SHA-pinned with version comments per AGENTS.md
  • ARM runner (ADR-025) for cost optimization
  • Proper concurrency and timeout configuration
  • Heredoc with quoted delimiter prevents variable expansion
  • Input validation protects against argument injection
  • Exit codes follow ADR-035
  • Test coverage: 22 tests covering core functionality
Roadmap Review Details

Based on my review of the PR, ADR-034, the product roadmap, and the implementation details, I can now provide the strategic alignment assessment.


Roadmap Review: PR #1214

Strategic Alignment Assessment

Criterion Rating Notes
Aligns with project goals High Completes ADR-034 Phase 3; enforces session protocol integrity
Priority appropriate High P2 CI backstop per ADR-034; addresses Issue #652 acceptance criterion
User value clear Medium Maintains protocol integrity; prevents false investigation-only claims
Investment justified High 893 lines for permanent guardrail; prevents validation gaming

Feature Completeness

  • Scope Assessment: Right-sized
  • Ship Ready: Yes
  • MVP Complete: Yes
  • Enhancement Opportunities: None required; implements exactly what ADR-034 Phase 3 specified

Impact Analysis

Dimension Assessment Notes
User Value Medium Protects honest agents from others bypassing QA; audit trail
Business Impact Medium Maintains session protocol credibility; reduces --no-verify normalization
Technical Leverage High Reuses existing investigation_allowlist module; follows ADR-006 thin workflows
Competitive Position Neutral Internal tooling improvement; no external visibility

Concerns

Priority Concern Recommendation
Low Advisory-only mode (exit 0) may reduce deterrent effect Monitor violation_count metrics; escalate to blocking if abuse detected
Low README.md change scope unclear from summary Verify README change is minimal documentation update

Recommendations

  1. This PR completes the ADR-034 implementation roadmap (Phase 1: validator, Phase 2: docs, Phase 3: CI backstop). No strategic concerns.
  2. The shared investigation_allowlist module promotes consistency between pre-commit and CI validation.
  3. Advisory mode is appropriate for initial rollout; blocking mode can be enabled later if violations become common.

Verdict

VERDICT: PASS
MESSAGE: PR delivers ADR-034 Phase 3 CI backstop as specified. Scope is appropriate, implementation follows established patterns (ADR-006 thin workflows, ADR-035 exit codes), and 22 tests provide adequate coverage. Completes Issue #652.

Run Details
Property Value
Run ID 22421423048
Triggered by pull_request on 1214/merge
Commit 1dac9f4a191f78574f392cb2b0af2df92a68dc66

Powered by AI Quality Gate workflow

@rjmurillo rjmurillo closed this Feb 21, 2026
auto-merge was automatically disabled February 21, 2026 06:12

Pull request was closed

@rjmurillo rjmurillo reopened this Feb 21, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Feb 21, 2026
@rjmurillo
Copy link
Owner

Re-triggering CI after description fix

@rjmurillo rjmurillo closed this Feb 21, 2026
@rjmurillo rjmurillo reopened this Feb 21, 2026
@rjmurillo rjmurillo enabled auto-merge (squash) February 21, 2026 06:39
rjmurillo-bot and others added 3 commits February 25, 2026 15:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combine both the branch's session-log-based validation (dataclasses,
allowlist patterns, session file parsing) and main's shared-allowlist
diff-based validation into a unified script with --mode flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix mypy type error (variable shadowing ClaimViolation with str),
guard against bool values in session JSON fields, extract validation
modes into separate functions, and default to diff mode for backward
compatibility with existing tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Session Protocol Compliance Report

Tip

Overall Verdict: PASS

All session protocol requirements satisfied.

What is Session Protocol?

Session logs document agent work sessions and must comply with RFC 2119 requirements:

  • MUST: Required for compliance (blocking failures)
  • SHOULD: Recommended practices (warnings)
  • MAY: Optional enhancements

See .agents/SESSION-PROTOCOL.md for full specification.

Compliance Summary

Session File Verdict MUST Failures
sessions-2026-02-25-session-1.md ✅ COMPLIANT 0

Detailed Validation Results

Click each session to see the complete validation report with specific requirement failures.

📄 sessions-2026-02-25-session-1

=== Session Validation ===
File: /home/runner/work/ai-agents/ai-agents/.agents/sessions/2026-02-25-session-1.json

[PASS] Session log is valid


✨ Zero-Token Validation

This validation uses deterministic script analysis instead of AI:

  • Zero tokens consumed (previously 300K-900K per debug cycle)
  • Instant feedback - see exact failures in this summary
  • No artifact downloads needed to diagnose issues
  • 10x-100x faster debugging

Powered by validate_session_json.py

📊 Run Details
Property Value
Run ID 22421423056
Files Checked 1
Validation Method Deterministic script analysis

Powered by Session Protocol Validator workflow

@rjmurillo
Copy link
Owner

Review Triage Required

Note

Priority: NORMAL - Human approval required before bot responds

Review Summary

Source Reviews Comments
Human 2 2
Bot 2 2

Next Steps

  1. Review human feedback above
  2. Address any CHANGES_REQUESTED from human reviewers
  3. Add triage:approved label when ready for bot to respond to review comments

Powered by PR Maintenance workflow - Add triage:approved label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-devops CI/CD pipeline agent agent-qa Testing and verification agent agent-security Security assessment agent area-infrastructure Build, CI/CD, configuration area-workflows GitHub Actions workflows automation Automated workflows and processes enhancement New feature or request github-actions GitHub Actions workflow updates infrastructure-failure CI infrastructure failure (Copilot CLI auth, rate limits, etc.) priority:P2 Normal: Standard enhancement or bug fix, moderate impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

User Story: Defense-in-depth safeguards for investigation-only mode (Phase 3)

2 participants