ci: staging branch with batched CI and Claude Code review by henrypark133 · Pull Request #682 · nearai/ironclaw

henrypark133 · 2026-03-07T21:26:07Z

Summary

Redesigns CI to use a staging branch with batched testing, replacing per-PR test runs with a 30-minute scheduled batch. This speeds up the PR feedback loop while maintaining thorough quality gates before code reaches main.

New Flow

feature branches → PR → staging
                        checks: fmt + clippy + version-check + regression-test-check
                        (auto-merge on approval)
                           |
                           | (30-min scheduled batch CI)
                           v
                       staging
                           |
                           | tests + e2e + claude review run in parallel
                           |   → issues created for findings (based on confidence matrix)
                           |   → CRITICAL ≥80 confidence? BLOCKS promotion
                           |
                           | all jobs pass + no blocks?
                           |   YES → tag staging-tested + auto-create PR to main + auto-merge
                           |   NO  → blocked, fix needed (issues already created)
                           v
                         main → releases (release-plz + cargo-dist, unchanged)

Issues are created regardless of whether anything blocks. Blocking only affects whether the promotion PR gets merged.

Override: apply skip-claude-gate label to repo to bypass CRITICAL block.

What Changes for PRs

Before: Every PR runs fmt, clippy, full test suite (3 matrix configs), telegram tests, WASM compat, docker build, and E2E tests on web path changes. Slow and expensive.

After: PRs only run fast checks (~2 min): formatting, clippy (3 configs), version-check, and regression-test-check. Full tests are deferred to the staging batch.

Staging CI Batch (`staging-ci.yml`)

Runs every 30 minutes via cron (+ manual workflow_dispatch):

Change detection — compares staging-tested tag to staging HEAD; skips if no new commits
Parallel jobs — tests, E2E, and Claude Code review run simultaneously; all three must pass
Gate — CRITICAL findings with ≥80 confidence block promotion (overridable)
Tag update — moves staging-tested tag forward on success
Auto-promote — creates PR from staging→main and auto-merges via GitHub App token (reuses existing GH_RELEASES_MANAGER_APP_* secrets)
Report — summary of all job results in workflow summary

Concurrency group staging-ci prevents overlapping runs. cancel-in-progress: false lets running suites finish.

Claude Code Review (`claude-review.yml`)

Reusable workflow called by staging-ci. Reviews the batch diff using anthropics/claude-code-action@v1:

Reads CLAUDE.md for project conventions
Reads full source files for changed code (not just diff hunks)
Traces data flows and checks OWASP top 10 categories
Assigns severity (CRITICAL/HIGH/MEDIUM/LOW) and confidence (0-100) to each finding

Confidence-based issue creation matrix:

Severity	≥80 confidence	50-79 confidence	<50 confidence
CRITICAL	Blocks merge + creates issue	Creates issue	Creates issue + `low-confidence` label
HIGH	Creates issue	Creates issue	Summary only
MEDIUM	Creates issue	Summary only	Summary only
LOW	Creates issue	Summary only	Summary only

All findings appear in the workflow summary regardless of tier. Issues include permalinks to the exact file:line at the commit SHA.

Workflow File Changes

File	Change
`test.yml`	Triggers → `workflow_call` + `workflow_dispatch`. Removed `version-check` (moved to code_style.yml).
`e2e.yml`	Removed `pull_request` paths trigger. Added `workflow_call`. Kept weekly cron + dispatch.
`coverage.yml`	Added `workflow_call` + `workflow_dispatch`. Keeps `push:main` for Codecov.
`code_style.yml`	Absorbed `version-check` job so version bump validation still runs on PRs.
`staging-ci.yml`	New. Orchestrator: cron, change detection, tests + e2e + claude-review → gate → tag → promote.
`claude-review.yml`	New. Reusable review with confidence scoring and tiered issue creation.
`create-labels.sh`	Added `staging-ci-review`, `skip-claude-gate`, `low-confidence` labels.

Unchanged Workflows

regression-test-check.yml, pr-label-classify.yml, pr-label-scope.yml, release-plz.yml, release.yml — all unchanged.

Pre-merge Setup

Add ANTHROPIC_API_KEY repo secret
Install Claude GitHub App on the repo
Run bash .github/scripts/create-labels.sh to bootstrap new labels

Post-merge Setup

Create staging branch: git checkout main && git checkout -b staging && git push origin staging
Create initial tag: git tag staging-tested staging && git push origin staging-tested
Set staging as default branch (GitHub Settings → General)
Enable auto-merge (GitHub Settings → Pull Requests)
Staging branch protection: require 1 review, status checks Code Style (fmt + clippy) + Regression test enforcement
Main branch protection: keep as-is (App token bypasses it)

Rollback: Revert workflow changes in one commit, change default branch back to main.

Test plan

🤖 Generated with Claude Code

…ssion-check] Redesign CI to speed up PR feedback by deferring heavy tests to a 30-minute batch on the staging branch. PRs only run fmt + clippy + regression-test-check. The batch CI runs tests, E2E, and a Claude Code review with confidence-scored findings that gate promotion to main. - test.yml: replace PR/push triggers with workflow_call + dispatch - e2e.yml: replace PR paths trigger with workflow_call - coverage.yml: add workflow_call + dispatch triggers - code_style.yml: absorb version-check job from test.yml - staging-ci.yml: new orchestrator (30-min cron, change detection, tests + e2e + claude-review → gate → tag + promote-to-main) - claude-review.yml: new reusable review workflow with 0-100 confidence scoring (CRITICAL ≥80 blocks, ≥50 creates issues, <50 CRITICAL gets low-confidence label) - create-labels.sh: add staging-ci-review, skip-claude-gate, low-confidence labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-03-07T21:26:20Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the continuous integration strategy by introducing a staging branch with batched testing, aiming to accelerate the PR feedback loop while maintaining robust quality gates. It integrates an AI-powered code review system, Claude Code Review, which performs in-depth analysis and can block code promotion based on critical findings. This change shifts comprehensive testing from individual pull requests to a scheduled staging environment, allowing PRs to merge faster after passing quick checks.

Highlights

CI Redesign with Staging Branch: The CI process has been redesigned to utilize a staging branch with batched testing, replacing individual per-PR test runs with a scheduled 30-minute batch.
Claude Code Review Integration: Integrated Claude Code Review, which analyzes batch diffs, assigns severity and confidence scores, and creates issues or blocks promotion based on a defined matrix.
Streamlined PR Workflow: Pull requests now only run fast checks (formatting, clippy, version-check, regression-test-check), deferring full test suites to the staging batch for faster feedback.
New Workflows and Labels: Introduced staging-ci.yml as the orchestrator for batched CI and claude-review.yml for reusable Claude review, along with new GitHub labels to support the new CI flow.

Changelog

.github/scripts/create-labels.sh
- Added new labels for staging CI and Claude Code Review.

Ignored Files

Ignored by pattern: .github/workflows/** (6)
- .github/workflows/claude-review.yml
- .github/workflows/code_style.yml
- .github/workflows/coverage.yml
- .github/workflows/e2e.yml
- .github/workflows/staging-ci.yml
- .github/workflows/test.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

Redesigns GitHub Actions CI to batch full test/e2e/review runs on a staging branch every 30 minutes, while keeping PR CI limited to faster checks; adds an automated Claude-based review workflow that can file issues and gate promotion to main.

Changes:

Convert test.yml/e2e.yml into reusable workflows (workflow_call) and remove PR/push triggers for slow suites.
Add new staging-ci.yml orchestrator to detect changes, run tests/e2e/Claude review in parallel, gate, tag staging-tested, and auto-promote staging→main.
Add claude-review.yml reusable workflow plus label bootstrap updates for review automation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
.github/workflows/test.yml	Stops per-PR/per-push execution; becomes reusable + manual dispatch; removes embedded version bump check.
.github/workflows/e2e.yml	Removes PR path-triggered E2E runs; enables workflow_call for staging batch.
.github/workflows/coverage.yml	Adds workflow_call + manual dispatch while keeping push-to-main coverage behavior.
.github/workflows/code_style.yml	Adds `version-check` job into PR “fast checks” and rolls it into the code-style gate.
.github/workflows/staging-ci.yml	New batched staging orchestrator: change detection, parallel jobs, gating, tag update, promotion PR creation/merge, summary reporting.
.github/workflows/claude-review.yml	New reusable Claude review workflow: diff context generation, review invocation, result processing, and issue creation.
.github/scripts/create-labels.sh	Adds labels used by staging CI review flow and confidence tiering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-07T21:30:21Z

.github/workflows/code_style.yml

+          if [[ "${{ needs.version-check.result }}" == "failure" ]]; then
+            echo "Version bump check failed"


The roll-up job only fails version-check when the result is exactly failure. If version-check is cancelled or skipped for any reason, the roll-up will still succeed and branch protection may pass unexpectedly. Prefer failing on anything other than success (or explicitly allow only the states you intend).

Suggested change

if [[ "${{ needs.version-check.result }}" == "failure" ]]; then

echo "Version bump check failed"

if [[ "${{ needs.version-check.result }}" != "success" ]]; then

echo "Version bump check did not succeed (status: ${{ needs.version-check.result }})"

Copilot · 2026-03-07T21:30:21Z

.github/workflows/staging-ci.yml

+    if: needs.check-changes.outputs.has_changes == 'true'
+    uses: ./.github/workflows/test.yml
+
+  # ── Run E2E browser tests ────────────────────────────────────────
+  e2e:
+    name: E2E Browser Tests
+    needs: check-changes
+    if: needs.check-changes.outputs.has_changes == 'true'


tests and e2e call reusable workflows that do not pin actions/checkout to staging. For scheduled runs this is OK only if staging is the repo default branch, but for workflow_dispatch a user can run this workflow from another branch and then the called workflows will test the wrong ref while check-changes/tagging operate on staging. Consider guarding the workflow/jobs to github.ref == 'refs/heads/staging' and/or adding a ref input to the reusable test/e2e workflows and using it in their checkout steps.

Suggested change

if: needs.check-changes.outputs.has_changes == 'true'

uses: ./.github/workflows/test.yml

# ── Run E2E browser tests ────────────────────────────────────────

e2e:

name: E2E Browser Tests

needs: check-changes

if: needs.check-changes.outputs.has_changes == 'true'

if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging'

uses: ./.github/workflows/test.yml

# ── Run E2E browser tests ────────────────────────────────────────

e2e:

name: E2E Browser Tests

needs: check-changes

if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging'

Copilot · 2026-03-07T21:30:22Z

.github/workflows/staging-ci.yml

+      - name: Check for skip-claude-gate label
+        id: label-check
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true)
+          echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT"
+
+      - name: Evaluate gate
+        id: gate
+        run: |
+          HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"
+          SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}"
+          SKIP_INPUT="${{ inputs.skip_claude_gate }}"
+
+          if [ "$HAS_BLOCKING" = "true" ]; then
+            echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"
+            if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then
+              echo "::warning::Gate overridden by skip-claude-gate label or input"


Check for skip-claude-gate label is checking whether the label exists in the repository, not whether any PR/commit/run is marked to bypass the gate. Since the repo will normally have this label created, skip_label will always be >0 and the Claude gate will be permanently overridden. Consider removing the label-based override entirely (rely on the workflow_dispatch input), or switch to a real toggle (e.g., a repo variable/secret), or check for the label on a specific PR/issue that represents the override.

Suggested change

- name: Check for skip-claude-gate label

id: label-check

env:

GH_TOKEN: ${{ github.token }}

run: |

SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true)

echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT"

- name: Evaluate gate

id: gate

run: |

HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"

SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}"

SKIP_INPUT="${{ inputs.skip_claude_gate }}"

if [ "$HAS_BLOCKING" = "true" ]; then

echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"

if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then

echo "::warning::Gate overridden by skip-claude-gate label or input"

- name: Evaluate gate

id: gate

run: |

HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"

SKIP_INPUT="${{ inputs.skip_claude_gate }}"

if [ "$HAS_BLOCKING" = "true" ]; then

echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"

if [ "$SKIP_INPUT" = "true" ]; then

echo "::warning::Gate overridden by skip_claude-gate workflow input"

Copilot · 2026-03-07T21:30:22Z

.github/workflows/staging-ci.yml

+      always() &&
+      needs.check-changes.outputs.has_changes == 'true' &&
+      needs.tests.result == 'success' &&
+      needs.e2e.result == 'success'


The claude-gate job condition requires tests and e2e success, but it does not require claude-review to succeed. If the Claude workflow fails, needs.claude-review.outputs.has_blocking may be empty and the gate will pass, allowing promotion even though the Claude review job failed. Update the if: to also require needs.claude-review.result == 'success' (or treat non-success as blocking).

Suggested change

needs.e2e.result == 'success'

needs.e2e.result == 'success' &&

needs.claude-review.result == 'success'

Copilot · 2026-03-07T21:30:22Z

.github/workflows/staging-ci.yml

+          # Force override from workflow_dispatch
+          if [ "${{ inputs.force }}" = "true" ]; then
+            echo "Force run requested"
+            echo "has_changes=true" >> "$GITHUB_OUTPUT"


When force=true and there are no new commits, has_changes is flipped to true but diff_range remains empty (it was set to empty in the no-changes branch). That makes the called claude-review workflow skip the review entirely due to missing diff_range, which contradicts the intent of a forced run. Consider setting diff_range to a valid empty range (e.g., ${CURRENT_HEAD}..${CURRENT_HEAD}) when forcing so the review job still runs deterministically.

Suggested change

echo "has_changes=true" >> "$GITHUB_OUTPUT"

echo "has_changes=true" >> "$GITHUB_OUTPUT"

# If there were no new commits (LAST_TESTED == CURRENT_HEAD),

# ensure diff_range is a valid, deterministic empty range

if [ -n "$LAST_TESTED" ] && [ "$LAST_TESTED" = "$CURRENT_HEAD" ]; then

echo "diff_range=${CURRENT_HEAD}..${CURRENT_HEAD}" >> "$GITHUB_OUTPUT"

fi

Copilot · 2026-03-07T21:30:22Z

.github/workflows/claude-review.yml

+          prompt: |
+            $(cat /tmp/review-prompt.txt)


prompt: | $(cat /tmp/review-prompt.txt) will be passed to the action literally; YAML inputs don’t perform shell command substitution. As a result Claude likely won’t receive the intended prompt content. Use an input mechanism that actually reads the file (e.g., if the action supports prompt_file), or load the file contents into a step output/env var and reference it via ${{ }} in the with: block.

Suggested change

prompt: |

$(cat /tmp/review-prompt.txt)

prompt_file: /tmp/review-prompt.txt

gemini-code-assist

Code Review

This pull request introduces a significant redesign of the CI process, moving to a staging branch with batched testing to accelerate the PR feedback loop. The provided code change in .github/scripts/create-labels.sh adds the necessary GitHub labels for the new workflow. My review of the change found a minor formatting inconsistency, for which I've left a suggestion to improve code readability and maintainability. The overall strategy described in the pull request is clear and well-articulated.

gemini-code-assist · 2026-03-07T21:30:54Z

.github/scripts/create-labels.sh

 create "skip-regression-check" "9E9E9E" "Acknowledged: fix without regression test"
+create "staging-ci-review"     "D93F0B" "Auto-created by staging CI Claude Code review"
+create "skip-claude-gate"      "FBCA04" "Override: bypass Claude CRITICAL gate on staging CI"
+create "low-confidence"         "C5DEF5" "Claude review finding with <50 confidence"


There's a minor alignment inconsistency here. The other new labels added in this pull request are formatted to have their second argument (the color hex code) start at the same column. To maintain consistency and improve readability, please adjust the spacing for this line.

Suggested change

create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"

create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"

Copilot AI review requested due to automatic review settings March 7, 2026 21:26

github-actions bot added scope: ci CI/CD workflows size: XL 500+ changed lines labels Mar 7, 2026

henrypark133 requested a review from ilblackdragon March 7, 2026 21:26

github-actions bot added the risk: medium Business logic, config, or moderate-risk modules label Mar 7, 2026

github-actions bot added the contributor: core 20+ merged PRs label Mar 7, 2026

henrypark133 requested a review from PierreLeGuen March 7, 2026 21:26

Copilot started reviewing on behalf of henrypark133 March 7, 2026 21:26 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: staging branch with batched CI and Claude Code review#682

ci: staging branch with batched CI and Claude Code review#682
henrypark133 wants to merge 1 commit intomainfrom
worktree-staging-ci

henrypark133 commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if [[ "${{ needs.version-check.result }}" == "failure" ]]; then
		echo "Version bump check failed"

	needs.e2e.result == 'success'
	needs.e2e.result == 'success' &&
	needs.claude-review.result == 'success'

-            echo "has_changes=true" >> "$GITHUB_OUTPUT"
+            echo "has_changes=true" >> "$GITHUB_OUTPUT"
+            # If there were no new commits (LAST_TESTED == CURRENT_HEAD),
+            # ensure diff_range is a valid, deterministic empty range
+            if [ -n "$LAST_TESTED" ] && [ "$LAST_TESTED" = "$CURRENT_HEAD" ]; then
+              echo "diff_range=${CURRENT_HEAD}..${CURRENT_HEAD}" >> "$GITHUB_OUTPUT"
+            fi

	prompt: \|
	$(cat /tmp/review-prompt.txt)
	prompt_file: /tmp/review-prompt.txt

	create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"
	create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"

Conversation

henrypark133 commented Mar 7, 2026

Summary

New Flow

What Changes for PRs

Staging CI Batch (staging-ci.yml)

Claude Code Review (claude-review.yml)

Workflow File Changes

Unchanged Workflows

Pre-merge Setup

Post-merge Setup

Test plan

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Staging CI Batch (`staging-ci.yml`)

Claude Code Review (`claude-review.yml`)