Skip to content

Commit 03a9d8d

Browse files
authored
Merge pull request #52 from link-foundation/issue-51-09375a949530
1.3.8: Fix CI failure: git push rejected due to concurrent push race condition (Issue #51)
2 parents feba582 + 91acabc commit 03a9d8d

File tree

3 files changed

+200
-0
lines changed

3 files changed

+200
-0
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
bump: patch
3+
---
4+
5+
Fix CI failure: `git push` rejected due to race condition with release workflow (Issue #51)
6+
7+
The "Measure Disk Space and Update README" workflow takes ~18 minutes to run. When a PR
8+
is merged to `main`, the release workflow runs simultaneously and pushes a version bump
9+
commit within seconds. After 18 minutes of valid measurement computation, the push step
10+
failed with:
11+
12+
! [rejected] main -> main (fetch first)
13+
error: failed to push some refs to 'https://github.com/link-foundation/sandbox'
14+
15+
Fix: add `git pull --rebase origin main` before `git push origin main` in the "Commit
16+
and push changes" step. Measurement data (README.md, data/) and version bumps (VERSION)
17+
touch different files, so the rebase always succeeds cleanly.

.github/workflows/measure-disk-space.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,12 @@ jobs:
158158
159159
git commit -m "chore: update component disk space measurements (${TOTAL_SIZE}MB total)"
160160
161+
# Pull-rebase before push to handle concurrent pushes that may have occurred
162+
# during the ~18 minute measurement (e.g., version bump from release workflow).
163+
# Measurement data (README.md, data/) and version bumps (VERSION) touch different
164+
# files, so the rebase always succeeds cleanly. See docs/case-studies/issue-51.
165+
git pull --rebase origin main
166+
161167
git push origin main
162168
163169
echo "Changes committed and pushed successfully"
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Case Study: CI Failure — `git push` Rejected Due to Concurrent Push Race Condition
2+
3+
**Issue**: [#51 - Fix CI/CD](https://github.com/link-foundation/sandbox/issues/51)
4+
**CI Run**: [22267653514, Job 64416730310](https://github.com/link-foundation/sandbox/actions/runs/22267653514/job/64416730310)
5+
**Date**: 2026-02-22
6+
**Status**: Investigation Complete — Fix Applied
7+
8+
## Executive Summary
9+
10+
The "Measure Disk Space and Update README" CI workflow failed at the "Commit and push changes" step with:
11+
12+
```
13+
To https://github.com/link-foundation/sandbox
14+
! [rejected] main -> main (fetch first)
15+
error: failed to push some refs to 'https://github.com/link-foundation/sandbox'
16+
hint: Updates were rejected because the remote contains work that you do not
17+
hint: have locally. This is usually caused by another repository pushing to
18+
hint: the same ref. If you want to integrate the remote changes, use
19+
hint: 'git pull' before pushing again.
20+
```
21+
22+
All 18+ measurement steps succeeded. The failure occurred only during the final `git push` step — **after 18 minutes of valid computation** — because another workflow had pushed a version bump commit to `main` within 1 second of this job starting.
23+
24+
## Timeline of Events
25+
26+
| Time (UTC) | Event |
27+
|------------|-------|
28+
| 2026-02-22T00:57:44 | PR #50 merged into `main` — creates merge commit `411b384e` |
29+
| 2026-02-22T00:57:47 | Two workflows triggered simultaneously on push to `main`: <br> &bull; "Build and Release Docker Image" (run `22267653513`) <br> &bull; "Measure Disk Space and Update README" (run `22267653514`) |
30+
| 2026-02-22T00:57:51 | Measure Disk Space job checks out commit `411b384e` (start of long computation) |
31+
| 2026-02-22T00:57:52 | Build and Release "Apply Changesets" job (run time: 6s) pushes version bump commit `feba582c` to `main` — only **1 second** after Measure Disk Space started |
32+
| 2026-02-22T00:57:52–01:16:23 | Measure Disk Space job runs 18 minutes of disk measurement (all steps succeed) |
33+
| 2026-02-22T01:16:23 | Measure Disk Space: `git commit` succeeds locally (26 components, 7545MB total) |
34+
| 2026-02-22T01:16:23 | Measure Disk Space: `git push origin main` **fails** — remote now at `feba582c`, local at `411b384e` |
35+
| 2026-02-22T01:16:23 | Workflow exits with code 1 |
36+
37+
## Root Cause Analysis
38+
39+
### Primary Cause: No Pull-Before-Push in Long-Running CI Job
40+
41+
The `measure-disk-space.yml` workflow "Commit and push changes" step does a direct `git push origin main` without first doing a `git pull`:
42+
43+
```yaml
44+
- name: Commit and push changes
45+
if: steps.changes.outputs.has_changes == 'true' && steps.validate.outputs.valid == 'true'
46+
run: |
47+
git config user.name "github-actions[bot]"
48+
git config user.email "github-actions[bot]@users.noreply.github.com"
49+
git add README.md data/disk-space-measurements.json
50+
TOTAL_SIZE=$(python3 -c "..." 2>/dev/null || echo "unknown")
51+
git commit -m "chore: update component disk space measurements (${TOTAL_SIZE}MB total)"
52+
git push origin main # <-- FAILS if any other push happened during the 18-min run
53+
echo "Changes committed and pushed successfully"
54+
```
55+
56+
Because this workflow takes **~18 minutes** to run (package installations for disk measurement), the window for a conflicting push is very wide. Any push to `main` during those 18 minutes causes this step to fail.
57+
58+
### Why the Concurrency Setting Didn't Help
59+
60+
The workflow has a concurrency group configured:
61+
62+
```yaml
63+
concurrency:
64+
group: measure-disk-space-${{ github.ref }}
65+
cancel-in-progress: true
66+
```
67+
68+
This only prevents **two instances of the same workflow** from running simultaneously. It does **not** prevent other workflows (like "Build and Release Docker Image") from pushing to `main` while the measurement is running.
69+
70+
### Contributing Factor: Release Workflow Pushes to Main Within Seconds
71+
72+
The "Build and Release Docker Image" workflow has an "Apply Changesets" job that runs in ~6 seconds and pushes a version bump commit to `main`. This runs on every push to `main` that includes changeset files. When PR #50 was merged:
73+
74+
1. The merge commit `411b384e` triggered both workflows
75+
2. The release workflow applied the changeset and pushed `feba582c` within 1 second
76+
3. The measure workflow was already past its checkout step and couldn't see the new commit
77+
4. 18 minutes later, the measurement results were ready but the push failed
78+
79+
### This Is a Recurring Failure Mode
80+
81+
Looking at historical CI runs, this same failure pattern has caused multiple CI failures in the past:
82+
- Run `22261112919` — failed at "Run disk space measurement" (permission denied, Issue #46 era)
83+
- Run `22263724056` — failed at "Run disk space measurement" (permission denied, Issue #46 era)
84+
- Run `22265618808` — failed at "Fail on invalid measurements" (Issue #49 era, sed bug)
85+
- Run `22267653514` — failed at "Commit and push changes" **(this issue, git push rejection)**
86+
87+
## Impact
88+
89+
- **Wasted compute**: 18 minutes of CI time thrown away per occurrence
90+
- **Misleading failure**: All measurement steps passed; the failure is at the final push step
91+
- **Data loss**: Valid measurement data (26 components, 7545MB total) never committed to repository
92+
- **Frequency**: Any push to `main` during the ~18 minute measurement window triggers this failure
93+
94+
## Possible Solutions
95+
96+
### Solution 1: Pull-Then-Rebase Before Push (Recommended ✓)
97+
98+
Add a `git pull --rebase origin main` before the `git push`:
99+
100+
```bash
101+
git pull --rebase origin main
102+
git push origin main
103+
```
104+
105+
**Pros**: Simple, robust — handles the race without data loss. The measurement data (README.md, JSON) is non-conflicting with version bumps which only change the `VERSION` file.
106+
107+
**Cons**: Very small risk of conflict if another measurement was committed simultaneously (same files changed).
108+
109+
### Solution 2: Retry Loop with Pull-Rebase
110+
111+
```bash
112+
MAX_RETRIES=3
113+
for i in $(seq 1 $MAX_RETRIES); do
114+
git pull --rebase origin main && git push origin main && break
115+
[ $i -lt $MAX_RETRIES ] && sleep $((i * 5))
116+
done
117+
```
118+
119+
**Pros**: Handles the edge case where multiple retries are needed (e.g., multiple concurrent pushes).
120+
121+
**Cons**: Adds complexity; the simple single pull-rebase should be sufficient given only one instance of this workflow runs at a time (via `concurrency`).
122+
123+
### Solution 3: Use a Third-Party Action (e.g., `stefanzweifel/git-auto-commit-action`)
124+
125+
Actions like [`stefanzweifel/git-auto-commit-action`](https://github.com/stefanzweifel/git-auto-commit-action) and [`ad-m/github-push-action`](https://github.com/ad-m/github-push-action) implement retry logic internally.
126+
127+
**Pros**: Battle-tested, handles many edge cases.
128+
129+
**Cons**: Adds a dependency; the problem is simple enough to solve without an additional action.
130+
131+
### Solution 4: Separate the Measurement from the Commit
132+
133+
Run measurement and commit as two separate workflows triggered sequentially. Measurement uploads artifacts; a separate short-lived commit job downloads artifacts and commits.
134+
135+
**Pros**: The commit job would start fresh with the latest main.
136+
137+
**Cons**: Much more complex architecture change; overkill for this problem.
138+
139+
### Chosen Fix
140+
141+
**Solution 1**: Add `git pull --rebase origin main` before `git push origin main` in the "Commit and push changes" step. This is the simplest, most direct fix that addresses the root cause without adding complexity.
142+
143+
The measurement data (README.md and data/disk-space-measurements.json) and the version bump (VERSION file) change different files, so rebase will always succeed cleanly.
144+
145+
## Other CI Steps Review
146+
147+
Per the issue request to "double check all other steps in the same CI/CD flow," the other steps were reviewed:
148+
149+
| Step | Status | Notes |
150+
|------|--------|-------|
151+
| Set up job | ✓ | Standard GitHub Actions setup |
152+
| Checkout repository | ✓ | `fetch-depth: 0` correctly fetches full history |
153+
| Free up disk space | ✓ | Correctly avoids `apt-get remove` (per issue-29 learnings) |
154+
| Create data directory | ✓ | Simple `mkdir -p data` |
155+
| Run disk space measurement | ✓ | Uses `set -o pipefail` correctly; fixed by issues #35, #46, #49 |
156+
| Update README with component sizes | ✓ | No issues found |
157+
| Check for changes | ✓ | Correctly uses `git diff --quiet` |
158+
| Validate measurements | ✓ | Good validation thresholds |
159+
| **Commit and push changes** | **✗ FIXED** | **Missing `git pull --rebase` before `git push`** |
160+
| Fail on invalid measurements | ✓ | Correct safeguard |
161+
| Upload measurement artifacts | ✓ | No issues found |
162+
| Summary | ✓ | No issues found |
163+
164+
## Related Resources
165+
166+
- [GitHub Community Discussion: Error in git push github actions](https://github.com/orgs/community/discussions/25710)
167+
- [Solution to `error: failed to push some refs` on GitHub Actions](https://jonathansoma.com/everything/git/github-actions-refs-error/)
168+
- [peaceiris/actions-gh-pages Issue #1078: support: action failed with "fetch first" hint](https://github.com/peaceiris/actions-gh-pages/issues/1078)
169+
- [GitHub Docs: Control the concurrency of workflows and jobs](https://docs.github.com/en/actions/using-jobs/using-concurrency)
170+
- [Dealing with flaky GitHub Actions – epiforecasts](https://epiforecasts.io/posts/2022-04-11-robust-actions/)
171+
172+
## Artifacts
173+
174+
- [`ci-run-22267653514-failed.log`](./ci-run-22267653514-failed.log) — Failed run log (the git push rejection)
175+
- [`ci-run-22267653514-full.log`](../../ci-run-22267653514.log) — Full run log
176+
- [`ci-run-22265618808-failed.log`](./ci-run-22265618808-failed.log) — Previous failure (Issue #49 era)
177+
- [`ci-run-22263724056-failed.log`](./ci-run-22263724056-failed.log) — Earlier failure (Issue #46 era)

0 commit comments

Comments
 (0)