Skip to content

Commit e513802

Browse files
authored
Merge pull request #36 from link-foundation/issue-35-9c4d8d6cb0f2
Fix component sizes not calculated or pushed to README.md on push to main
2 parents 52cbffc + 8f8234f commit e513802

File tree

5 files changed

+363
-28
lines changed

5 files changed

+363
-28
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
bump: patch
3+
---
4+
5+
Fix component sizes not being calculated or pushed to README.md on push to main
6+
7+
- Add measurement scripts to workflow path triggers so fixes re-trigger the workflow
8+
- Replace fragile sed-based JSON manipulation with Python for robustness
9+
- Add pipefail to detect script failures in measurement pipeline

.github/workflows/measure-disk-space.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ on:
66
- main
77
paths:
88
- 'scripts/ubuntu-24-server-install.sh'
9+
- 'scripts/measure-disk-space.sh'
10+
- 'scripts/update-readme-sizes.sh'
911
- 'Dockerfile'
1012
- '.github/workflows/measure-disk-space.yml'
1113
# Allow manual trigger
@@ -69,12 +71,13 @@ jobs:
6971
- name: Run disk space measurement
7072
id: measure
7173
run: |
74+
set -o pipefail
7275
echo "=== Starting disk space measurement ==="
7376
7477
# Make script executable
7578
chmod +x scripts/measure-disk-space.sh
7679
77-
# Run measurement script with sudo
80+
# Run measurement script with sudo (pipefail ensures script failures propagate through tee)
7881
sudo ./scripts/measure-disk-space.sh --json-output data/disk-space-measurements.json 2>&1 | tee measurement.log
7982
8083
echo "=== Measurement complete ==="
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Case Study: Component Sizes Not Calculated or Pushed to README on Push to Main
2+
3+
**Issue:** [#35 - Components sizes are not calculated or pushed to README.md on push to main branch](https://github.com/link-foundation/sandbox/issues/35)
4+
5+
**Date of Investigation:** 2026-01-31
6+
7+
## Executive Summary
8+
9+
The `measure-disk-space.yml` workflow has never successfully measured all component disk sizes and pushed the results to `README.md`. Two root causes were identified: (1) the workflow's path-based trigger did not include the measurement scripts themselves, so fixes to those scripts never re-triggered the workflow, and (2) the `sed`-based JSON manipulation in the measurement script was fragile and failed on component names containing special characters (e.g., `C/C++ Tools`). Additionally, a pipeline masking issue (`| tee`) hid script failures from the workflow, allowing it to continue with incomplete data.
10+
11+
## Timeline of Events
12+
13+
| Time (UTC) | Event | Details |
14+
|------------|-------|---------|
15+
| 2026-01-29 14:07:09 | First workflow run (035998b) | Succeeded superficially, but only recorded 1 component (0MB) due to broken apt (issue-29). No validation existed yet, so 0MB data was committed to main. |
16+
| 2026-01-29 ~14:09 | 0MB data committed | `chore: update component disk space measurements (0MB total)` pushed to main (commit 3d75e41) |
17+
| 2026-01-29 ~14:35 | Issue-29 fix merged | PR #30 fixes apt cleanup, adds validation step (commit a646fe6) |
18+
| 2026-01-29 18:21:09 | Second workflow run (a646fe6) | Triggered by issue-29 fix merge. **Failed** with sed error on "C/C++ Tools" component. Only 2 components measured. Validation correctly rejected. |
19+
| 2026-01-29 18:39-18:48 | Issue-31 fix developed | PR #32 changes sed delimiter from `/` to `\|` |
20+
| 2026-01-29 ~18:48 | Issue-31 fix merged (52cbffc) | sed delimiter fixed, but this only changed `scripts/measure-disk-space.sh` |
21+
| 2026-01-29 18:48+ | **No workflow re-trigger** | The workflow path filter only watches `scripts/ubuntu-24-server-install.sh`, `Dockerfile`, and the workflow file itself — NOT `scripts/measure-disk-space.sh` |
22+
| 2026-01-31 | Issue #35 opened | Component sizes still show 0MB in README |
23+
24+
## Root Cause Analysis
25+
26+
### Root Cause 1: Incomplete Workflow Path Triggers
27+
28+
The `measure-disk-space.yml` workflow was configured to trigger only on changes to:
29+
```yaml
30+
paths:
31+
- 'scripts/ubuntu-24-server-install.sh'
32+
- 'Dockerfile'
33+
- '.github/workflows/measure-disk-space.yml'
34+
```
35+
36+
Missing from this list:
37+
- `scripts/measure-disk-space.sh` — the main measurement script
38+
- `scripts/update-readme-sizes.sh` — the README updater script
39+
40+
This meant that fixing the measurement script (issue-31, commit 52cbffc) did **not** trigger a re-run of the measurement workflow. The fixed code was never executed.
41+
42+
### Root Cause 2: Fragile sed-Based JSON Manipulation
43+
44+
The `add_measurement()` function used `sed` to manipulate JSON, which is inherently fragile:
45+
46+
```bash
47+
# Even with | delimiter (after issue-31 fix), sed is fragile for JSON:
48+
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
49+
current_json=$(echo "$current_json" | sed "s|\]$|,$new_component]|")
50+
```
51+
52+
While the issue-31 fix changed the delimiter from `/` to `|`, this approach remains vulnerable to:
53+
- Any future component name containing `|`
54+
- Regex metacharacters in component values
55+
- Multi-line JSON formatting changes
56+
- Shell quoting edge cases with special characters
57+
58+
### Root Cause 3: Pipeline Masking Script Failures
59+
60+
The workflow ran the measurement script through a pipe:
61+
```yaml
62+
sudo ./scripts/measure-disk-space.sh ... 2>&1 | tee measurement.log
63+
```
64+
65+
Without `set -o pipefail` in the workflow step, bash reports the exit code of the **last** command in the pipeline (`tee`, which always succeeds), not the measurement script. When the script crashed due to the sed error, the workflow continued as if nothing happened, producing incomplete JSON data.
66+
67+
### How the Three Root Causes Interacted
68+
69+
```
70+
Issue-29 fix (a646fe6) merged to main
71+
72+
73+
measure-disk-space workflow triggered (correct — install script changed)
74+
75+
76+
Script crashes on "C/C++ Tools" due to sed / delimiter (issue-31)
77+
│ │
78+
▼ ▼
79+
Pipeline masks failure Only 2 components in JSON
80+
(tee exit code 0) (total_size_mb: 0)
81+
│ │
82+
▼ ▼
83+
Workflow continues Validation catches incomplete data
84+
85+
86+
Workflow fails (correct behavior)
87+
```
88+
89+
```
90+
Issue-31 fix (52cbffc) merged to main
91+
92+
93+
measure-disk-space workflow NOT triggered
94+
(scripts/measure-disk-space.sh not in path triggers)
95+
96+
97+
README still shows 0MB — issue #35 opened
98+
```
99+
100+
## Solution
101+
102+
### Fix 1: Add Measurement Scripts to Workflow Path Triggers
103+
104+
```yaml
105+
paths:
106+
- 'scripts/ubuntu-24-server-install.sh'
107+
- 'scripts/measure-disk-space.sh' # NEW
108+
- 'scripts/update-readme-sizes.sh' # NEW
109+
- 'Dockerfile'
110+
- '.github/workflows/measure-disk-space.yml'
111+
```
112+
113+
This ensures any changes to measurement-related scripts will trigger a re-run.
114+
115+
### Fix 2: Add pipefail to Workflow Measurement Step
116+
117+
```yaml
118+
run: |
119+
set -o pipefail
120+
# ... measurement commands ...
121+
sudo ./scripts/measure-disk-space.sh ... 2>&1 | tee measurement.log
122+
```
123+
124+
This ensures script failures propagate through the `tee` pipeline and are detected by the workflow.
125+
126+
### Fix 3: Replace sed-Based JSON Manipulation with Python
127+
128+
Instead of using sed (which is fragile for structured data), use Python's `json` module:
129+
130+
**Before (fragile sed):**
131+
```bash
132+
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
133+
```
134+
135+
**After (robust Python):**
136+
```bash
137+
python3 -c "
138+
import json, sys
139+
with open('$JSON_OUTPUT_FILE', 'r') as f:
140+
data = json.load(f)
141+
data['components'].append({
142+
'name': sys.argv[1],
143+
'category': sys.argv[2],
144+
'size_bytes': int(sys.argv[3]),
145+
'size_mb': int(sys.argv[4])
146+
})
147+
with open('$JSON_OUTPUT_FILE', 'w') as f:
148+
json.dump(data, f)
149+
" "$name" "$category" "$size_bytes" "$size_mb"
150+
```
151+
152+
Python's `json` module handles all special characters correctly and produces valid JSON output regardless of component names.
153+
154+
## Evidence
155+
156+
### Failed Run Logs (Run 21489818730)
157+
158+
The sed error at the C/C++ Tools component:
159+
```
160+
[✓] Recorded: .NET SDK 8.0 - 481MB
161+
[*] Measuring installation: C/C++ Tools (CMake, Clang, LLVM, LLD)
162+
...
163+
sed: -e expression #1, char 20: unknown option to `s'
164+
=== Measurement complete ===
165+
```
166+
167+
Validation failure:
168+
```
169+
Total size: 0MB
170+
Component count: 2
171+
WARNING: Measurements appear incomplete or invalid!
172+
- Total size: 0MB (expected >= 1000MB)
173+
- Components: 2 (expected >= 10)
174+
```
175+
176+
### "Successful" Run Logs (Run 21481304786)
177+
178+
The earlier run appeared successful but actually failed silently:
179+
```
180+
E: Package 'build-essential' has no installation candidate
181+
E: Unable to locate package expect
182+
[!] Installation of Essential Tools failed
183+
[✓] Recorded: Essential Tools - 0MB
184+
=== Measurement complete ===
185+
```
186+
187+
This run had no validation step, so it committed 0MB data to main.
188+
189+
## Files Modified
190+
191+
- `.github/workflows/measure-disk-space.yml` — Added measurement scripts to path triggers; added `set -o pipefail`
192+
- `scripts/measure-disk-space.sh` — Replaced sed-based JSON manipulation with Python
193+
194+
## Prevention
195+
196+
1. **Include all related scripts in workflow triggers**: When a workflow depends on scripts, ensure those scripts are listed in the `paths` filter
197+
2. **Use language-appropriate tools for data manipulation**: Use Python/jq for JSON, not sed/awk
198+
3. **Enable pipefail in CI steps**: Always use `set -o pipefail` when piping command output through `tee` or other tools
199+
4. **Test script changes trigger workflows**: Verify path filters match all relevant files
200+
201+
## Related Issues
202+
203+
- [#29 - Components size update failed](https://github.com/link-foundation/sandbox/issues/29) — APT cleanup breaking package installation (fixed)
204+
- [#31 - CI/CD failed](https://github.com/link-foundation/sandbox/issues/31) — sed delimiter error with `/` in component names (partially fixed, this issue completes the fix)
205+
206+
## CI Logs
207+
208+
Full CI logs are preserved in:
209+
- `ci-logs/measure-disk-space-failed-21489818730.log` — Failed run with sed error
210+
- `ci-logs/measure-disk-space-success-21481304786.log` — Earlier run with broken apt
211+
212+
Online:
213+
- [GitHub Actions Run 21489818730](https://github.com/link-foundation/sandbox/actions/runs/21489818730) — Failed measurement run
214+
- [GitHub Actions Run 21481304786](https://github.com/link-foundation/sandbox/actions/runs/21481304786) — Earlier "successful" run with 0MB data
215+
216+
## Conclusion
217+
218+
This issue was caused by a combination of three problems: incomplete workflow path triggers, fragile sed-based JSON manipulation, and pipeline error masking. The issue-31 fix addressed the immediate sed delimiter problem but was never re-executed because the workflow path triggers didn't include the measurement script. The comprehensive fix adds the missing path triggers, replaces sed with Python for JSON manipulation (eliminating the entire class of special-character bugs), and adds `pipefail` to detect script failures properly. Once merged, the workflow will trigger and should produce accurate component size measurements.
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# Test script to verify JSON manipulation functions work correctly
5+
# with component names containing special characters like C/C++
6+
7+
JSON_OUTPUT_FILE="/tmp/test-disk-space-measurements.json"
8+
9+
# Initialize JSON
10+
cat > "$JSON_OUTPUT_FILE" << 'EOF'
11+
{
12+
"generated_at": "",
13+
"total_size_mb": 0,
14+
"components": []
15+
}
16+
EOF
17+
18+
echo "=== Initial JSON ==="
19+
cat "$JSON_OUTPUT_FILE"
20+
echo ""
21+
22+
# Add component measurement (same function from measure-disk-space.sh)
23+
add_measurement() {
24+
local name="$1"
25+
local category="$2"
26+
local size_bytes="$3"
27+
local size_mb="$4"
28+
29+
python3 -c "
30+
import json, sys
31+
with open('$JSON_OUTPUT_FILE', 'r') as f:
32+
data = json.load(f)
33+
data['components'].append({
34+
'name': sys.argv[1],
35+
'category': sys.argv[2],
36+
'size_bytes': int(sys.argv[3]),
37+
'size_mb': int(sys.argv[4])
38+
})
39+
with open('$JSON_OUTPUT_FILE', 'w') as f:
40+
json.dump(data, f)
41+
" "$name" "$category" "$size_bytes" "$size_mb"
42+
43+
echo "[✓] Recorded: $name - ${size_mb}MB"
44+
}
45+
46+
# Finalize JSON
47+
finalize_json_output() {
48+
local total_mb="$1"
49+
50+
python3 -c "
51+
import json
52+
from datetime import datetime, timezone
53+
with open('$JSON_OUTPUT_FILE', 'r') as f:
54+
data = json.load(f)
55+
data['generated_at'] = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
56+
data['total_size_mb'] = int('$total_mb')
57+
with open('$JSON_OUTPUT_FILE', 'w') as f:
58+
json.dump(data, f)
59+
"
60+
61+
echo "[✓] Finalized JSON output with total: ${total_mb}MB"
62+
}
63+
64+
# Test with various component names including special characters
65+
echo "=== Adding components ==="
66+
add_measurement "Essential Tools" "System" 737280 0
67+
add_measurement ".NET SDK 8.0" "Runtime" 504913920 481
68+
add_measurement "C/C++ Tools (CMake, Clang, LLVM, LLD)" "Build Tools" 52428800 50
69+
add_measurement "Assembly Tools (NASM, FASM)" "Build Tools" 10485760 10
70+
add_measurement "R Language" "Runtime" 314572800 300
71+
add_measurement "NVM + Node.js 20" "Runtime" 209715200 200
72+
add_measurement "Pyenv + Python (latest)" "Runtime" 524288000 500
73+
add_measurement "Go (latest)" "Runtime" 524288000 500
74+
add_measurement "Rust (via rustup)" "Runtime" 1073741824 1024
75+
add_measurement "SDKMAN + Java 21" "Runtime" 419430400 400
76+
add_measurement "Kotlin (via SDKMAN)" "Runtime" 104857600 100
77+
add_measurement "Homebrew" "Package Manager" 524288000 500
78+
add_measurement "PHP 8.3 (via Homebrew)" "Runtime" 209715200 200
79+
80+
# Finalize
81+
echo ""
82+
echo "=== Finalizing ==="
83+
finalize_json_output 4265
84+
85+
echo ""
86+
echo "=== Final JSON ==="
87+
python3 -m json.tool "$JSON_OUTPUT_FILE"
88+
89+
echo ""
90+
echo "=== Validation ==="
91+
TOTAL=$(python3 -c "import json; print(json.load(open('$JSON_OUTPUT_FILE'))['total_size_mb'])")
92+
COUNT=$(python3 -c "import json; print(len(json.load(open('$JSON_OUTPUT_FILE'))['components']))")
93+
echo "Total size: ${TOTAL}MB"
94+
echo "Component count: ${COUNT}"
95+
96+
if [ "$TOTAL" -ge 1000 ] && [ "$COUNT" -ge 10 ]; then
97+
echo "[✓] PASS: Measurements valid (total >= 1000MB, components >= 10)"
98+
else
99+
echo "[✗] FAIL: Measurements invalid"
100+
exit 1
101+
fi
102+
103+
# Cleanup
104+
rm -f "$JSON_OUTPUT_FILE"
105+
echo ""
106+
echo "=== All tests passed ==="

0 commit comments

Comments
 (0)