|
| 1 | +# Case Study: CI Failure — `git push` Rejected Due to Concurrent Push Race Condition |
| 2 | + |
| 3 | +**Issue**: [#51 - Fix CI/CD](https://github.com/link-foundation/sandbox/issues/51) |
| 4 | +**CI Run**: [22267653514, Job 64416730310](https://github.com/link-foundation/sandbox/actions/runs/22267653514/job/64416730310) |
| 5 | +**Date**: 2026-02-22 |
| 6 | +**Status**: Investigation Complete — Fix Applied |
| 7 | + |
| 8 | +## Executive Summary |
| 9 | + |
| 10 | +The "Measure Disk Space and Update README" CI workflow failed at the "Commit and push changes" step with: |
| 11 | + |
| 12 | +``` |
| 13 | +To https://github.com/link-foundation/sandbox |
| 14 | + ! [rejected] main -> main (fetch first) |
| 15 | +error: failed to push some refs to 'https://github.com/link-foundation/sandbox' |
| 16 | +hint: Updates were rejected because the remote contains work that you do not |
| 17 | +hint: have locally. This is usually caused by another repository pushing to |
| 18 | +hint: the same ref. If you want to integrate the remote changes, use |
| 19 | +hint: 'git pull' before pushing again. |
| 20 | +``` |
| 21 | + |
| 22 | +All 18+ measurement steps succeeded. The failure occurred only during the final `git push` step — **after 18 minutes of valid computation** — because another workflow had pushed a version bump commit to `main` within 1 second of this job starting. |
| 23 | + |
| 24 | +## Timeline of Events |
| 25 | + |
| 26 | +| Time (UTC) | Event | |
| 27 | +|------------|-------| |
| 28 | +| 2026-02-22T00:57:44 | PR #50 merged into `main` — creates merge commit `411b384e` | |
| 29 | +| 2026-02-22T00:57:47 | Two workflows triggered simultaneously on push to `main`: <br> • "Build and Release Docker Image" (run `22267653513`) <br> • "Measure Disk Space and Update README" (run `22267653514`) | |
| 30 | +| 2026-02-22T00:57:51 | Measure Disk Space job checks out commit `411b384e` (start of long computation) | |
| 31 | +| 2026-02-22T00:57:52 | Build and Release "Apply Changesets" job (run time: 6s) pushes version bump commit `feba582c` to `main` — only **1 second** after Measure Disk Space started | |
| 32 | +| 2026-02-22T00:57:52–01:16:23 | Measure Disk Space job runs 18 minutes of disk measurement (all steps succeed) | |
| 33 | +| 2026-02-22T01:16:23 | Measure Disk Space: `git commit` succeeds locally (26 components, 7545MB total) | |
| 34 | +| 2026-02-22T01:16:23 | Measure Disk Space: `git push origin main` **fails** — remote now at `feba582c`, local at `411b384e` | |
| 35 | +| 2026-02-22T01:16:23 | Workflow exits with code 1 | |
| 36 | + |
| 37 | +## Root Cause Analysis |
| 38 | + |
| 39 | +### Primary Cause: No Pull-Before-Push in Long-Running CI Job |
| 40 | + |
| 41 | +The `measure-disk-space.yml` workflow "Commit and push changes" step does a direct `git push origin main` without first doing a `git pull`: |
| 42 | + |
| 43 | +```yaml |
| 44 | +- name: Commit and push changes |
| 45 | + if: steps.changes.outputs.has_changes == 'true' && steps.validate.outputs.valid == 'true' |
| 46 | + run: | |
| 47 | + git config user.name "github-actions[bot]" |
| 48 | + git config user.email "github-actions[bot]@users.noreply.github.com" |
| 49 | + git add README.md data/disk-space-measurements.json |
| 50 | + TOTAL_SIZE=$(python3 -c "..." 2>/dev/null || echo "unknown") |
| 51 | + git commit -m "chore: update component disk space measurements (${TOTAL_SIZE}MB total)" |
| 52 | + git push origin main # <-- FAILS if any other push happened during the 18-min run |
| 53 | + echo "Changes committed and pushed successfully" |
| 54 | +``` |
| 55 | +
|
| 56 | +Because this workflow takes **~18 minutes** to run (package installations for disk measurement), the window for a conflicting push is very wide. Any push to `main` during those 18 minutes causes this step to fail. |
| 57 | + |
| 58 | +### Why the Concurrency Setting Didn't Help |
| 59 | + |
| 60 | +The workflow has a concurrency group configured: |
| 61 | + |
| 62 | +```yaml |
| 63 | +concurrency: |
| 64 | + group: measure-disk-space-${{ github.ref }} |
| 65 | + cancel-in-progress: true |
| 66 | +``` |
| 67 | + |
| 68 | +This only prevents **two instances of the same workflow** from running simultaneously. It does **not** prevent other workflows (like "Build and Release Docker Image") from pushing to `main` while the measurement is running. |
| 69 | + |
| 70 | +### Contributing Factor: Release Workflow Pushes to Main Within Seconds |
| 71 | + |
| 72 | +The "Build and Release Docker Image" workflow has an "Apply Changesets" job that runs in ~6 seconds and pushes a version bump commit to `main`. This runs on every push to `main` that includes changeset files. When PR #50 was merged: |
| 73 | + |
| 74 | +1. The merge commit `411b384e` triggered both workflows |
| 75 | +2. The release workflow applied the changeset and pushed `feba582c` within 1 second |
| 76 | +3. The measure workflow was already past its checkout step and couldn't see the new commit |
| 77 | +4. 18 minutes later, the measurement results were ready but the push failed |
| 78 | + |
| 79 | +### This Is a Recurring Failure Mode |
| 80 | + |
| 81 | +Looking at historical CI runs, this same failure pattern has caused multiple CI failures in the past: |
| 82 | +- Run `22261112919` — failed at "Run disk space measurement" (permission denied, Issue #46 era) |
| 83 | +- Run `22263724056` — failed at "Run disk space measurement" (permission denied, Issue #46 era) |
| 84 | +- Run `22265618808` — failed at "Fail on invalid measurements" (Issue #49 era, sed bug) |
| 85 | +- Run `22267653514` — failed at "Commit and push changes" **(this issue, git push rejection)** |
| 86 | + |
| 87 | +## Impact |
| 88 | + |
| 89 | +- **Wasted compute**: 18 minutes of CI time thrown away per occurrence |
| 90 | +- **Misleading failure**: All measurement steps passed; the failure is at the final push step |
| 91 | +- **Data loss**: Valid measurement data (26 components, 7545MB total) never committed to repository |
| 92 | +- **Frequency**: Any push to `main` during the ~18 minute measurement window triggers this failure |
| 93 | + |
| 94 | +## Possible Solutions |
| 95 | + |
| 96 | +### Solution 1: Pull-Then-Rebase Before Push (Recommended ✓) |
| 97 | + |
| 98 | +Add a `git pull --rebase origin main` before the `git push`: |
| 99 | + |
| 100 | +```bash |
| 101 | +git pull --rebase origin main |
| 102 | +git push origin main |
| 103 | +``` |
| 104 | + |
| 105 | +**Pros**: Simple, robust — handles the race without data loss. The measurement data (README.md, JSON) is non-conflicting with version bumps which only change the `VERSION` file. |
| 106 | + |
| 107 | +**Cons**: Very small risk of conflict if another measurement was committed simultaneously (same files changed). |
| 108 | + |
| 109 | +### Solution 2: Retry Loop with Pull-Rebase |
| 110 | + |
| 111 | +```bash |
| 112 | +MAX_RETRIES=3 |
| 113 | +for i in $(seq 1 $MAX_RETRIES); do |
| 114 | + git pull --rebase origin main && git push origin main && break |
| 115 | + [ $i -lt $MAX_RETRIES ] && sleep $((i * 5)) |
| 116 | +done |
| 117 | +``` |
| 118 | + |
| 119 | +**Pros**: Handles the edge case where multiple retries are needed (e.g., multiple concurrent pushes). |
| 120 | + |
| 121 | +**Cons**: Adds complexity; the simple single pull-rebase should be sufficient given only one instance of this workflow runs at a time (via `concurrency`). |
| 122 | + |
| 123 | +### Solution 3: Use a Third-Party Action (e.g., `stefanzweifel/git-auto-commit-action`) |
| 124 | + |
| 125 | +Actions like [`stefanzweifel/git-auto-commit-action`](https://github.com/stefanzweifel/git-auto-commit-action) and [`ad-m/github-push-action`](https://github.com/ad-m/github-push-action) implement retry logic internally. |
| 126 | + |
| 127 | +**Pros**: Battle-tested, handles many edge cases. |
| 128 | + |
| 129 | +**Cons**: Adds a dependency; the problem is simple enough to solve without an additional action. |
| 130 | + |
| 131 | +### Solution 4: Separate the Measurement from the Commit |
| 132 | + |
| 133 | +Run measurement and commit as two separate workflows triggered sequentially. Measurement uploads artifacts; a separate short-lived commit job downloads artifacts and commits. |
| 134 | + |
| 135 | +**Pros**: The commit job would start fresh with the latest main. |
| 136 | + |
| 137 | +**Cons**: Much more complex architecture change; overkill for this problem. |
| 138 | + |
| 139 | +### Chosen Fix |
| 140 | + |
| 141 | +**Solution 1**: Add `git pull --rebase origin main` before `git push origin main` in the "Commit and push changes" step. This is the simplest, most direct fix that addresses the root cause without adding complexity. |
| 142 | + |
| 143 | +The measurement data (README.md and data/disk-space-measurements.json) and the version bump (VERSION file) change different files, so rebase will always succeed cleanly. |
| 144 | + |
| 145 | +## Other CI Steps Review |
| 146 | + |
| 147 | +Per the issue request to "double check all other steps in the same CI/CD flow," the other steps were reviewed: |
| 148 | + |
| 149 | +| Step | Status | Notes | |
| 150 | +|------|--------|-------| |
| 151 | +| Set up job | ✓ | Standard GitHub Actions setup | |
| 152 | +| Checkout repository | ✓ | `fetch-depth: 0` correctly fetches full history | |
| 153 | +| Free up disk space | ✓ | Correctly avoids `apt-get remove` (per issue-29 learnings) | |
| 154 | +| Create data directory | ✓ | Simple `mkdir -p data` | |
| 155 | +| Run disk space measurement | ✓ | Uses `set -o pipefail` correctly; fixed by issues #35, #46, #49 | |
| 156 | +| Update README with component sizes | ✓ | No issues found | |
| 157 | +| Check for changes | ✓ | Correctly uses `git diff --quiet` | |
| 158 | +| Validate measurements | ✓ | Good validation thresholds | |
| 159 | +| **Commit and push changes** | **✗ FIXED** | **Missing `git pull --rebase` before `git push`** | |
| 160 | +| Fail on invalid measurements | ✓ | Correct safeguard | |
| 161 | +| Upload measurement artifacts | ✓ | No issues found | |
| 162 | +| Summary | ✓ | No issues found | |
| 163 | + |
| 164 | +## Related Resources |
| 165 | + |
| 166 | +- [GitHub Community Discussion: Error in git push github actions](https://github.com/orgs/community/discussions/25710) |
| 167 | +- [Solution to `error: failed to push some refs` on GitHub Actions](https://jonathansoma.com/everything/git/github-actions-refs-error/) |
| 168 | +- [peaceiris/actions-gh-pages Issue #1078: support: action failed with "fetch first" hint](https://github.com/peaceiris/actions-gh-pages/issues/1078) |
| 169 | +- [GitHub Docs: Control the concurrency of workflows and jobs](https://docs.github.com/en/actions/using-jobs/using-concurrency) |
| 170 | +- [Dealing with flaky GitHub Actions – epiforecasts](https://epiforecasts.io/posts/2022-04-11-robust-actions/) |
| 171 | + |
| 172 | +## Artifacts |
| 173 | + |
| 174 | +- [`ci-run-22267653514-failed.log`](./ci-run-22267653514-failed.log) — Failed run log (the git push rejection) |
| 175 | +- [`ci-run-22267653514-full.log`](../../ci-run-22267653514.log) — Full run log |
| 176 | +- [`ci-run-22265618808-failed.log`](./ci-run-22265618808-failed.log) — Previous failure (Issue #49 era) |
| 177 | +- [`ci-run-22263724056-failed.log`](./ci-run-22263724056-failed.log) — Earlier failure (Issue #46 era) |
0 commit comments