ci: staging branch with batched CI and Claude Code review#682
ci: staging branch with batched CI and Claude Code review#682henrypark133 wants to merge 1 commit intomainfrom
Conversation
…ssion-check] Redesign CI to speed up PR feedback by deferring heavy tests to a 30-minute batch on the staging branch. PRs only run fmt + clippy + regression-test-check. The batch CI runs tests, E2E, and a Claude Code review with confidence-scored findings that gate promotion to main. - test.yml: replace PR/push triggers with workflow_call + dispatch - e2e.yml: replace PR paths trigger with workflow_call - coverage.yml: add workflow_call + dispatch triggers - code_style.yml: absorb version-check job from test.yml - staging-ci.yml: new orchestrator (30-min cron, change detection, tests + e2e + claude-review → gate → tag + promote-to-main) - claude-review.yml: new reusable review workflow with 0-100 confidence scoring (CRITICAL ≥80 blocks, ≥50 creates issues, <50 CRITICAL gets low-confidence label) - create-labels.sh: add staging-ci-review, skip-claude-gate, low-confidence labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly overhauls the continuous integration strategy by introducing a staging branch with batched testing, aiming to accelerate the PR feedback loop while maintaining robust quality gates. It integrates an AI-powered code review system, Claude Code Review, which performs in-depth analysis and can block code promotion based on critical findings. This change shifts comprehensive testing from individual pull requests to a scheduled staging environment, allowing PRs to merge faster after passing quick checks. Highlights
Changelog
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Redesigns GitHub Actions CI to batch full test/e2e/review runs on a staging branch every 30 minutes, while keeping PR CI limited to faster checks; adds an automated Claude-based review workflow that can file issues and gate promotion to main.
Changes:
- Convert
test.yml/e2e.ymlinto reusable workflows (workflow_call) and remove PR/push triggers for slow suites. - Add new
staging-ci.ymlorchestrator to detect changes, run tests/e2e/Claude review in parallel, gate, tagstaging-tested, and auto-promotestaging→main. - Add
claude-review.ymlreusable workflow plus label bootstrap updates for review automation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/test.yml | Stops per-PR/per-push execution; becomes reusable + manual dispatch; removes embedded version bump check. |
| .github/workflows/e2e.yml | Removes PR path-triggered E2E runs; enables workflow_call for staging batch. |
| .github/workflows/coverage.yml | Adds workflow_call + manual dispatch while keeping push-to-main coverage behavior. |
| .github/workflows/code_style.yml | Adds version-check job into PR “fast checks” and rolls it into the code-style gate. |
| .github/workflows/staging-ci.yml | New batched staging orchestrator: change detection, parallel jobs, gating, tag update, promotion PR creation/merge, summary reporting. |
| .github/workflows/claude-review.yml | New reusable Claude review workflow: diff context generation, review invocation, result processing, and issue creation. |
| .github/scripts/create-labels.sh | Adds labels used by staging CI review flow and confidence tiering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [[ "${{ needs.version-check.result }}" == "failure" ]]; then | ||
| echo "Version bump check failed" |
There was a problem hiding this comment.
The roll-up job only fails version-check when the result is exactly failure. If version-check is cancelled or skipped for any reason, the roll-up will still succeed and branch protection may pass unexpectedly. Prefer failing on anything other than success (or explicitly allow only the states you intend).
| if [[ "${{ needs.version-check.result }}" == "failure" ]]; then | |
| echo "Version bump check failed" | |
| if [[ "${{ needs.version-check.result }}" != "success" ]]; then | |
| echo "Version bump check did not succeed (status: ${{ needs.version-check.result }})" |
| if: needs.check-changes.outputs.has_changes == 'true' | ||
| uses: ./.github/workflows/test.yml | ||
|
|
||
| # ── Run E2E browser tests ──────────────────────────────────────── | ||
| e2e: | ||
| name: E2E Browser Tests | ||
| needs: check-changes | ||
| if: needs.check-changes.outputs.has_changes == 'true' |
There was a problem hiding this comment.
tests and e2e call reusable workflows that do not pin actions/checkout to staging. For scheduled runs this is OK only if staging is the repo default branch, but for workflow_dispatch a user can run this workflow from another branch and then the called workflows will test the wrong ref while check-changes/tagging operate on staging. Consider guarding the workflow/jobs to github.ref == 'refs/heads/staging' and/or adding a ref input to the reusable test/e2e workflows and using it in their checkout steps.
| if: needs.check-changes.outputs.has_changes == 'true' | |
| uses: ./.github/workflows/test.yml | |
| # ── Run E2E browser tests ──────────────────────────────────────── | |
| e2e: | |
| name: E2E Browser Tests | |
| needs: check-changes | |
| if: needs.check-changes.outputs.has_changes == 'true' | |
| if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging' | |
| uses: ./.github/workflows/test.yml | |
| # ── Run E2E browser tests ──────────────────────────────────────── | |
| e2e: | |
| name: E2E Browser Tests | |
| needs: check-changes | |
| if: needs.check-changes.outputs.has_changes == 'true' && github.ref == 'refs/heads/staging' |
| - name: Check for skip-claude-gate label | ||
| id: label-check | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true) | ||
| echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Evaluate gate | ||
| id: gate | ||
| run: | | ||
| HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}" | ||
| SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}" | ||
| SKIP_INPUT="${{ inputs.skip_claude_gate }}" | ||
|
|
||
| if [ "$HAS_BLOCKING" = "true" ]; then | ||
| echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)" | ||
| if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then | ||
| echo "::warning::Gate overridden by skip-claude-gate label or input" |
There was a problem hiding this comment.
Check for skip-claude-gate label is checking whether the label exists in the repository, not whether any PR/commit/run is marked to bypass the gate. Since the repo will normally have this label created, skip_label will always be >0 and the Claude gate will be permanently overridden. Consider removing the label-based override entirely (rely on the workflow_dispatch input), or switch to a real toggle (e.g., a repo variable/secret), or check for the label on a specific PR/issue that represents the override.
| - name: Check for skip-claude-gate label | |
| id: label-check | |
| env: | |
| GH_TOKEN: ${{ github.token }} | |
| run: | | |
| SKIP=$(gh label list --json name -q '.[].name' | grep -c 'skip-claude-gate' || true) | |
| echo "skip_label=${SKIP}" >> "$GITHUB_OUTPUT" | |
| - name: Evaluate gate | |
| id: gate | |
| run: | | |
| HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}" | |
| SKIP_LABEL="${{ steps.label-check.outputs.skip_label }}" | |
| SKIP_INPUT="${{ inputs.skip_claude_gate }}" | |
| if [ "$HAS_BLOCKING" = "true" ]; then | |
| echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)" | |
| if [ "$SKIP_LABEL" -gt 0 ] || [ "$SKIP_INPUT" = "true" ]; then | |
| echo "::warning::Gate overridden by skip-claude-gate label or input" | |
| - name: Evaluate gate | |
| id: gate | |
| run: | | |
| HAS_BLOCKING="${{ needs.claude-review.outputs.has_blocking }}" | |
| SKIP_INPUT="${{ inputs.skip_claude_gate }}" | |
| if [ "$HAS_BLOCKING" = "true" ]; then | |
| echo "::warning::Claude review found blocking issues (CRITICAL ≥80 confidence)" | |
| if [ "$SKIP_INPUT" = "true" ]; then | |
| echo "::warning::Gate overridden by skip_claude-gate workflow input" |
| always() && | ||
| needs.check-changes.outputs.has_changes == 'true' && | ||
| needs.tests.result == 'success' && | ||
| needs.e2e.result == 'success' |
There was a problem hiding this comment.
The claude-gate job condition requires tests and e2e success, but it does not require claude-review to succeed. If the Claude workflow fails, needs.claude-review.outputs.has_blocking may be empty and the gate will pass, allowing promotion even though the Claude review job failed. Update the if: to also require needs.claude-review.result == 'success' (or treat non-success as blocking).
| needs.e2e.result == 'success' | |
| needs.e2e.result == 'success' && | |
| needs.claude-review.result == 'success' |
| # Force override from workflow_dispatch | ||
| if [ "${{ inputs.force }}" = "true" ]; then | ||
| echo "Force run requested" | ||
| echo "has_changes=true" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
When force=true and there are no new commits, has_changes is flipped to true but diff_range remains empty (it was set to empty in the no-changes branch). That makes the called claude-review workflow skip the review entirely due to missing diff_range, which contradicts the intent of a forced run. Consider setting diff_range to a valid empty range (e.g., ${CURRENT_HEAD}..${CURRENT_HEAD}) when forcing so the review job still runs deterministically.
| echo "has_changes=true" >> "$GITHUB_OUTPUT" | |
| echo "has_changes=true" >> "$GITHUB_OUTPUT" | |
| # If there were no new commits (LAST_TESTED == CURRENT_HEAD), | |
| # ensure diff_range is a valid, deterministic empty range | |
| if [ -n "$LAST_TESTED" ] && [ "$LAST_TESTED" = "$CURRENT_HEAD" ]; then | |
| echo "diff_range=${CURRENT_HEAD}..${CURRENT_HEAD}" >> "$GITHUB_OUTPUT" | |
| fi |
| prompt: | | ||
| $(cat /tmp/review-prompt.txt) |
There was a problem hiding this comment.
prompt: | $(cat /tmp/review-prompt.txt) will be passed to the action literally; YAML inputs don’t perform shell command substitution. As a result Claude likely won’t receive the intended prompt content. Use an input mechanism that actually reads the file (e.g., if the action supports prompt_file), or load the file contents into a step output/env var and reference it via ${{ }} in the with: block.
| prompt: | | |
| $(cat /tmp/review-prompt.txt) | |
| prompt_file: /tmp/review-prompt.txt |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant redesign of the CI process, moving to a staging branch with batched testing to accelerate the PR feedback loop. The provided code change in .github/scripts/create-labels.sh adds the necessary GitHub labels for the new workflow. My review of the change found a minor formatting inconsistency, for which I've left a suggestion to improve code readability and maintainability. The overall strategy described in the pull request is clear and well-articulated.
| create "skip-regression-check" "9E9E9E" "Acknowledged: fix without regression test" | ||
| create "staging-ci-review" "D93F0B" "Auto-created by staging CI Claude Code review" | ||
| create "skip-claude-gate" "FBCA04" "Override: bypass Claude CRITICAL gate on staging CI" | ||
| create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence" |
There was a problem hiding this comment.
There's a minor alignment inconsistency here. The other new labels added in this pull request are formatted to have their second argument (the color hex code) start at the same column. To maintain consistency and improve readability, please adjust the spacing for this line.
| create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence" | |
| create "low-confidence" "C5DEF5" "Claude review finding with <50 confidence" |
Summary
Redesigns CI to use a staging branch with batched testing, replacing per-PR test runs with a 30-minute scheduled batch. This speeds up the PR feedback loop while maintaining thorough quality gates before code reaches main.
New Flow
Issues are created regardless of whether anything blocks. Blocking only affects whether the promotion PR gets merged.
Override: apply
skip-claude-gatelabel to repo to bypass CRITICAL block.What Changes for PRs
Before: Every PR runs fmt, clippy, full test suite (3 matrix configs), telegram tests, WASM compat, docker build, and E2E tests on web path changes. Slow and expensive.
After: PRs only run fast checks (~2 min): formatting, clippy (3 configs), version-check, and regression-test-check. Full tests are deferred to the staging batch.
Staging CI Batch (
staging-ci.yml)Runs every 30 minutes via cron (+ manual
workflow_dispatch):staging-testedtag to staging HEAD; skips if no new commitsstaging-testedtag forward on successGH_RELEASES_MANAGER_APP_*secrets)Concurrency group
staging-ciprevents overlapping runs.cancel-in-progress: falselets running suites finish.Claude Code Review (
claude-review.yml)Reusable workflow called by staging-ci. Reviews the batch diff using
anthropics/claude-code-action@v1:Confidence-based issue creation matrix:
low-confidencelabelAll findings appear in the workflow summary regardless of tier. Issues include permalinks to the exact file:line at the commit SHA.
Workflow File Changes
test.ymlworkflow_call+workflow_dispatch. Removedversion-check(moved to code_style.yml).e2e.ymlpull_requestpaths trigger. Addedworkflow_call. Kept weekly cron + dispatch.coverage.ymlworkflow_call+workflow_dispatch. Keepspush:mainfor Codecov.code_style.ymlversion-checkjob so version bump validation still runs on PRs.staging-ci.ymlclaude-review.ymlcreate-labels.shstaging-ci-review,skip-claude-gate,low-confidencelabels.Unchanged Workflows
regression-test-check.yml,pr-label-classify.yml,pr-label-scope.yml,release-plz.yml,release.yml— all unchanged.Pre-merge Setup
ANTHROPIC_API_KEYrepo secretbash .github/scripts/create-labels.shto bootstrap new labelsPost-merge Setup
stagingbranch:git checkout main && git checkout -b staging && git push origin staginggit tag staging-tested staging && git push origin staging-testedstagingas default branch (GitHub Settings → General)Code Style (fmt + clippy)+Regression test enforcementRollback: Revert workflow changes in one commit, change default branch back to main.
Test plan
code_style.ymlstill runs on PRs (fmt + clippy + version-check)test.ymlno longer triggers on PRs or pushese2e.ymlno longer triggers on PR path changesworkflow_dispatchofstaging-ci.ymlwithforce=true— all three jobs runskip-claude-gatelabel overridesstaging-testedtag updates on successrelease-plz.ymlfires after promote merge🤖 Generated with Claude Code