Skip to content

ci: staging branch with batched CI and Claude Code review#682

Open
henrypark133 wants to merge 1 commit intomainfrom
worktree-staging-ci
Open

ci: staging branch with batched CI and Claude Code review#682
henrypark133 wants to merge 1 commit intomainfrom
worktree-staging-ci

Conversation

@henrypark133
Copy link
Collaborator

Summary

Redesigns CI to use a staging branch with batched testing, replacing per-PR test runs with a 30-minute scheduled batch. This speeds up the PR feedback loop while maintaining thorough quality gates before code reaches main.

New Flow

feature branches → PR → staging
                        checks: fmt + clippy + version-check + regression-test-check
                        (auto-merge on approval)
                           |
                           | (30-min scheduled batch CI)
                           v
                       staging
                           |
                           | tests + e2e + claude review run in parallel
                           |   → issues created for findings (based on confidence matrix)
                           |   → CRITICAL ≥80 confidence? BLOCKS promotion
                           |
                           | all jobs pass + no blocks?
                           |   YES → tag staging-tested + auto-create PR to main + auto-merge
                           |   NO  → blocked, fix needed (issues already created)
                           v
                         main → releases (release-plz + cargo-dist, unchanged)

Issues are created regardless of whether anything blocks. Blocking only affects whether the promotion PR gets merged.

Override: apply skip-claude-gate label to repo to bypass CRITICAL block.

What Changes for PRs

Before: Every PR runs fmt, clippy, full test suite (3 matrix configs), telegram tests, WASM compat, docker build, and E2E tests on web path changes. Slow and expensive.

After: PRs only run fast checks (~2 min): formatting, clippy (3 configs), version-check, and regression-test-check. Full tests are deferred to the staging batch.

Staging CI Batch (staging-ci.yml)

Runs every 30 minutes via cron (+ manual workflow_dispatch):

  1. Change detection — compares staging-tested tag to staging HEAD; skips if no new commits
  2. Parallel jobs — tests, E2E, and Claude Code review run simultaneously; all three must pass
  3. Gate — CRITICAL findings with ≥80 confidence block promotion (overridable)
  4. Tag update — moves staging-tested tag forward on success
  5. Auto-promote — creates PR from staging→main and auto-merges via GitHub App token (reuses existing GH_RELEASES_MANAGER_APP_* secrets)
  6. Report — summary of all job results in workflow summary

Concurrency group staging-ci prevents overlapping runs. cancel-in-progress: false lets running suites finish.

Claude Code Review (claude-review.yml)

Reusable workflow called by staging-ci. Reviews the batch diff using anthropics/claude-code-action@v1:

  • Reads CLAUDE.md for project conventions
  • Reads full source files for changed code (not just diff hunks)
  • Traces data flows and checks OWASP top 10 categories
  • Assigns severity (CRITICAL/HIGH/MEDIUM/LOW) and confidence (0-100) to each finding

Confidence-based issue creation matrix:

Severity ≥80 confidence 50-79 confidence <50 confidence
CRITICAL Blocks merge + creates issue Creates issue Creates issue + low-confidence label
HIGH Creates issue Creates issue Summary only
MEDIUM Creates issue Summary only Summary only
LOW Creates issue Summary only Summary only

All findings appear in the workflow summary regardless of tier. Issues include permalinks to the exact file:line at the commit SHA.

Workflow File Changes

File Change
test.yml Triggers → workflow_call + workflow_dispatch. Removed version-check (moved to code_style.yml).
e2e.yml Removed pull_request paths trigger. Added workflow_call. Kept weekly cron + dispatch.
coverage.yml Added workflow_call + workflow_dispatch. Keeps push:main for Codecov.
code_style.yml Absorbed version-check job so version bump validation still runs on PRs.
staging-ci.yml New. Orchestrator: cron, change detection, tests + e2e + claude-review → gate → tag → promote.
claude-review.yml New. Reusable review with confidence scoring and tiered issue creation.
create-labels.sh Added staging-ci-review, skip-claude-gate, low-confidence labels.

Unchanged Workflows

regression-test-check.yml, pr-label-classify.yml, pr-label-scope.yml, release-plz.yml, release.yml — all unchanged.

Pre-merge Setup

  1. Add ANTHROPIC_API_KEY repo secret
  2. Install Claude GitHub App on the repo
  3. Run bash .github/scripts/create-labels.sh to bootstrap new labels

Post-merge Setup

  1. Create staging branch: git checkout main && git checkout -b staging && git push origin staging
  2. Create initial tag: git tag staging-tested staging && git push origin staging-tested
  3. Set staging as default branch (GitHub Settings → General)
  4. Enable auto-merge (GitHub Settings → Pull Requests)
  5. Staging branch protection: require 1 review, status checks Code Style (fmt + clippy) + Regression test enforcement
  6. Main branch protection: keep as-is (App token bypasses it)

Rollback: Revert workflow changes in one commit, change default branch back to main.

Test plan

  • code_style.yml still runs on PRs (fmt + clippy + version-check)
  • test.yml no longer triggers on PRs or pushes
  • e2e.yml no longer triggers on PR path changes
  • Manual workflow_dispatch of staging-ci.yml with force=true — all three jobs run
  • If any job fails, promotion is blocked
  • Claude review creates issues with correct severity labels and confidence scores
  • CRITICAL ≥80 blocks; skip-claude-gate label overrides
  • Issues are created even when promotion succeeds (non-blocking findings)
  • staging-tested tag updates on success
  • Promotion PR created and auto-merged to main
  • release-plz.yml fires after promote merge

🤖 Generated with Claude Code

…ssion-check]

Redesign CI to speed up PR feedback by deferring heavy tests to a
30-minute batch on the staging branch. PRs only run fmt + clippy +
regression-test-check. The batch CI runs tests, E2E, and a Claude
Code review with confidence-scored findings that gate promotion to
main.

- test.yml: replace PR/push triggers with workflow_call + dispatch
- e2e.yml: replace PR paths trigger with workflow_call
- coverage.yml: add workflow_call + dispatch triggers
- code_style.yml: absorb version-check job from test.yml
- staging-ci.yml: new orchestrator (30-min cron, change detection,
  tests + e2e + claude-review → gate → tag + promote-to-main)
- claude-review.yml: new reusable review workflow with 0-100
  confidence scoring (CRITICAL ≥80 blocks, ≥50 creates issues,
  <50 CRITICAL gets low-confidence label)
- create-labels.sh: add staging-ci-review, skip-claude-gate,
  low-confidence labels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 21:26
@github-actions github-actions bot added scope: ci CI/CD workflows size: XL 500+ changed lines labels Mar 7, 2026
@github-actions github-actions bot added the risk: medium Business logic, config, or moderate-risk modules label Mar 7, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the continuous integration strategy by introducing a staging branch with batched testing, aiming to accelerate the PR feedback loop while maintaining robust quality gates. It integrates an AI-powered code review system, Claude Code Review, which performs in-depth analysis and can block code promotion based on critical findings. This change shifts comprehensive testing from individual pull requests to a scheduled staging environment, allowing PRs to merge faster after passing quick checks.

Highlights

  • CI Redesign with Staging Branch: The CI process has been redesigned to utilize a staging branch with batched testing, replacing individual per-PR test runs with a scheduled 30-minute batch.
  • Claude Code Review Integration: Integrated Claude Code Review, which analyzes batch diffs, assigns severity and confidence scores, and creates issues or blocks promotion based on a defined matrix.
  • Streamlined PR Workflow: Pull requests now only run fast checks (formatting, clippy, version-check, regression-test-check), deferring full test suites to the staging batch for faster feedback.
  • New Workflows and Labels: Introduced staging-ci.yml as the orchestrator for batched CI and claude-review.yml for reusable Claude review, along with new GitHub labels to support the new CI flow.
Changelog
  • .github/scripts/create-labels.sh
    • Added new labels for staging CI and Claude Code Review.
Ignored Files
  • Ignored by pattern: .github/workflows/** (6)
    • .github/workflows/claude-review.yml
    • .github/workflows/code_style.yml
    • .github/workflows/coverage.yml
    • .github/workflows/e2e.yml
    • .github/workflows/staging-ci.yml
    • .github/workflows/test.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Redesigns GitHub Actions CI to batch full test/e2e/review runs on a staging branch every 30 minutes, while keeping PR CI limited to faster checks; adds an automated Claude-based review workflow that can file issues and gate promotion to main.

Changes:

  • Convert test.yml/e2e.yml into reusable workflows (workflow_call) and remove PR/push triggers for slow suites.
  • Add new staging-ci.yml orchestrator to detect changes, run tests/e2e/Claude review in parallel, gate, tag staging-tested, and auto-promote stagingmain.
  • Add claude-review.yml reusable workflow plus label bootstrap updates for review automation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/test.yml Stops per-PR/per-push execution; becomes reusable + manual dispatch; removes embedded version bump check.
.github/workflows/e2e.yml Removes PR path-triggered E2E runs; enables workflow_call for staging batch.
.github/workflows/coverage.yml Adds workflow_call + manual dispatch while keeping push-to-main coverage behavior.
.github/workflows/code_style.yml Adds version-check job into PR “fast checks” and rolls it into the code-style gate.
.github/workflows/staging-ci.yml New batched staging orchestrator: change detection, parallel jobs, gating, tag update, promotion PR creation/merge, summary reporting.
.github/workflows/claude-review.yml New reusable Claude review workflow: diff context generation, review invocation, result processing, and issue creation.
.github/scripts/create-labels.sh Adds labels used by staging CI review flow and confidence tiering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +72 to +73
if [[ "${{ needs.version-check.result }}" == "failure" ]]; then
echo "Version bump check failed"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roll-up job only fails version-check when the result is exactly failure. If version-check is cancelled or skipped for any reason, the roll-up will still succeed and branch protection may pass unexpectedly. Prefer failing on anything other than success (or explicitly allow only the states you intend).

Suggested change
if [[ "${{ needs.version-check.result }}" == "failure" ]]; then
echo "Version bump check failed"
if [[ "${{ needs.version-check.result }}" != "success" ]]; then
echo "Version bump check did not succeed (status: ${{ needs.version-check.result }})"

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +91
if: needs.check-changes.outputs.has_changes == 'true'
uses: ./.github/workflows/test.yml

# ── Run E2E browser tests ────────────────────────────────────────
e2e:
name: E2E Browser Tests
needs: check-changes
if: needs.check-changes.outputs.has_changes == 'true'
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests and e2e call reusable workflows that do not pin actions/checkout to staging. For scheduled runs this is OK only if staging is the repo default branch, but for workflow_dispatch a user can run this workflow from another branch and then the called workflows will test the wrong ref while check-changes/tagging operate on staging. Consider guarding the workflow/jobs to github.ref == 'refs/heads/staging' and/or adding a ref input to the reusable test/e2e workflows and using it in their checkout steps.

Suggested change
if: needs.check-changes.outputs.has_changes == 'true'
uses: ./.github/workflows/test.yml
# ── Run E2E browser tests ────────────────────────────────────────
e2e:
name: E2E Browser Tests
needs: check-changes
if: needs.check-changes.outputs.has_changes == 'true'
if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging'
uses: ./.github/workflows/test.yml
# ── Run E2E browser tests ────────────────────────────────────────
e2e:
name: E2E Browser Tests
needs: check-changes
if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging'

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +136
- name: Check for skip-claude-gate label
id: label-check
env:
GH_TOKEN: ${{ github.token }}
run: |
SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true)
echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT"

- name: Evaluate gate
id: gate
run: |
HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"
SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}"
SKIP_INPUT="${{ inputs.skip_claude_gate }}"

if [ "$HAS_BLOCKING" = "true" ]; then
echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"
if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then
echo "::warning::Gate overridden by skip-claude-gate label or input"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for skip-claude-gate label is checking whether the label exists in the repository, not whether any PR/commit/run is marked to bypass the gate. Since the repo will normally have this label created, skip_label will always be >0 and the Claude gate will be permanently overridden. Consider removing the label-based override entirely (rely on the workflow_dispatch input), or switch to a real toggle (e.g., a repo variable/secret), or check for the label on a specific PR/issue that represents the override.

Suggested change
- name: Check for skip-claude-gate label
id: label-check
env:
GH_TOKEN: ${{ github.token }}
run: |
SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true)
echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT"
- name: Evaluate gate
id: gate
run: |
HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"
SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}"
SKIP_INPUT="${{ inputs.skip_claude_gate }}"
if [ "$HAS_BLOCKING" = "true" ]; then
echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"
if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then
echo "::warning::Gate overridden by skip-claude-gate label or input"
- name: Evaluate gate
id: gate
run: |
HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}"
SKIP_INPUT="${{ inputs.skip_claude_gate }}"
if [ "$HAS_BLOCKING" = "true" ]; then
echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)"
if [ "$SKIP_INPUT" = "true" ]; then
echo "::warning::Gate overridden by skip_claude-gate workflow input"

Copilot uses AI. Check for mistakes.
always() &&
needs.check-changes.outputs.has_changes == 'true' &&
needs.tests.result == 'success' &&
needs.e2e.result == 'success'
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The claude-gate job condition requires tests and e2e success, but it does not require claude-review to succeed. If the Claude workflow fails, needs.claude-review.outputs.has_blocking may be empty and the gate will pass, allowing promotion even though the Claude review job failed. Update the if: to also require needs.claude-review.result == 'success' (or treat non-success as blocking).

Suggested change
needs.e2e.result == 'success'
needs.e2e.result == 'success' &&
needs.claude-review.result == 'success'

Copilot uses AI. Check for mistakes.
# Force override from workflow_dispatch
if [ "${{ inputs.force }}" = "true" ]; then
echo "Force run requested"
echo "has_changes=true" >> "$GITHUB_OUTPUT"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When force=true and there are no new commits, has_changes is flipped to true but diff_range remains empty (it was set to empty in the no-changes branch). That makes the called claude-review workflow skip the review entirely due to missing diff_range, which contradicts the intent of a forced run. Consider setting diff_range to a valid empty range (e.g., ${CURRENT_HEAD}..${CURRENT_HEAD}) when forcing so the review job still runs deterministically.

Suggested change
echo "has_changes=true" >> "$GITHUB_OUTPUT"
echo "has_changes=true" >> "$GITHUB_OUTPUT"
# If there were no new commits (LAST_TESTED == CURRENT_HEAD),
# ensure diff_range is a valid, deterministic empty range
if [ -n "$LAST_TESTED" ] && [ "$LAST_TESTED" = "$CURRENT_HEAD" ]; then
echo "diff_range=${CURRENT_HEAD}..${CURRENT_HEAD}" >> "$GITHUB_OUTPUT"
fi

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +155
prompt: |
$(cat /tmp/review-prompt.txt)
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt: | $(cat /tmp/review-prompt.txt) will be passed to the action literally; YAML inputs don’t perform shell command substitution. As a result Claude likely won’t receive the intended prompt content. Use an input mechanism that actually reads the file (e.g., if the action supports prompt_file), or load the file contents into a step output/env var and reference it via ${{ }} in the with: block.

Suggested change
prompt: |
$(cat /tmp/review-prompt.txt)
prompt_file: /tmp/review-prompt.txt

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant redesign of the CI process, moving to a staging branch with batched testing to accelerate the PR feedback loop. The provided code change in .github/scripts/create-labels.sh adds the necessary GitHub labels for the new workflow. My review of the change found a minor formatting inconsistency, for which I've left a suggestion to improve code readability and maintainability. The overall strategy described in the pull request is clear and well-articulated.

create "skip-regression-check" "9E9E9E" "Acknowledged: fix without regression test"
create "staging-ci-review" "D93F0B" "Auto-created by staging CI Claude Code review"
create "skip-claude-gate" "FBCA04" "Override: bypass Claude CRITICAL gate on staging CI"
create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a minor alignment inconsistency here. The other new labels added in this pull request are formatted to have their second argument (the color hex code) start at the same column. To maintain consistency and improve readability, please adjust the spacing for this line.

Suggested change
create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"
create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants