Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/fix-sed-special-characters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
bump: patch
---

Fix sed command error when component names contain forward slashes

The disk space measurement script failed when measuring components with `/` in their names
(e.g., "C/C++ Tools (CMake, Clang, LLVM, LLD)") because sed used `/` as the delimiter.
Changed the sed delimiter from `/` to `|` to avoid conflicts with component names.

Fixes #31
204 changes: 204 additions & 0 deletions docs/case-studies/issue-31/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Case Study: CI/CD Failed - sed Command Error with Special Characters

**Issue:** [#31 - CI/CD failed](https://github.com/link-foundation/sandbox/issues/31)

**Date of Investigation:** 2026-01-29

## Executive Summary

The `measure-disk-space.yml` workflow failed with exit code 1 because the disk space measurement script terminated early due to a `sed` command error. The root cause is that component names containing forward slashes (`/`) break the sed substitution command that uses `/` as the delimiter. This caused only 2 of ~25 components to be measured, triggering the validation failure.

## Timeline of Events

| Time (UTC) | Event | Details |
|------------|-------|---------|
| 2026-01-29 18:21:13 | Workflow started | Run ID: 21489818730 |
| 2026-01-29 18:21:15 | Checkout completed | Repository checked out successfully |
| 2026-01-29 18:23:20 | Disk cleanup completed | ~20GB freed, 42GB available |
| 2026-01-29 18:23:20 | Measurement started | Baseline: 30428MB |
| 2026-01-29 18:24:01 | Essential Tools measured | 0MB (already installed) |
| 2026-01-29 18:24:11 | .NET SDK 8.0 measured | 481MB |
| 2026-01-29 18:24:11 | C/C++ Tools measurement started | Installation began |
| 2026-01-29 18:24:16 | **sed error occurred** | Script terminated |
| 2026-01-29 18:24:16 | Validation failed | 0MB total, 2 components |
| 2026-01-29 18:24:16 | Workflow failed | Exit code 1 |

## Root Cause Analysis

### The Bug

The `add_measurement()` function in `scripts/measure-disk-space.sh` uses `sed` with forward slash (`/`) as the delimiter:

```bash
# Line 114 (main script) and line 344 (embedded sandbox script)
current_json=$(echo "$current_json" | sed "s/\"components\": \[\]/\"components\": [$new_component]/")

# Line 117 (main script) and line 346 (embedded sandbox script)
current_json=$(echo "$current_json" | sed "s/\]$/,$new_component]/")
```

The `$new_component` variable contains JSON like:
```json
{"name": "C/C++ Tools (CMake, Clang, LLVM, LLD)", "category": "Build Tools", "size_bytes": ..., "size_mb": ...}
```

When the component name contains `/` (as in `C/C++ Tools`), it's interpreted as a sed delimiter, causing:
```
sed: -e expression #1, char 20: unknown option to `s'
```

### Evidence from Logs

From [Run 21489818730](https://github.com/link-foundation/sandbox/actions/runs/21489818730/job/61909023165):

**Successful measurements before the error:**
```
[✓] Recorded: Essential Tools - 0MB
[*] Measuring installation: .NET SDK 8.0
...
[✓] Recorded: .NET SDK 8.0 - 481MB
[*] Measuring installation: C/C++ Tools (CMake, Clang, LLVM, LLD)
```

**The sed error:**
```
sed: -e expression #1, char 20: unknown option to `s'
=== Measurement complete ===
```

**Resulting corrupted JSON:**
```json
{
"generated_at": "",
"total_size_mb": 0,
"components": [
{"name": "Essential Tools", "category": "System", "size_bytes": 737280, "size_mb": 0},
{"name": ".NET SDK 8.0", "category": "Runtime", "size_bytes": 504913920, "size_mb": 481}
]
}
```

Note:
- `generated_at` is empty (never finalized)
- `total_size_mb` is 0 (never calculated)
- Only 2 components instead of ~25

**Validation failure:**
```
Total size: 0MB
Component count: 2
WARNING: Measurements appear incomplete or invalid!
- Total size: 0MB (expected >= 1000MB)
- Components: 2 (expected >= 10)
```

### Why Only 2 Components Were Recorded

The script order was:
1. Essential Tools - **SUCCESS** (no `/` in name)
2. .NET SDK 8.0 - **SUCCESS** (no `/` in name)
3. C/C++ Tools (CMake, Clang, LLVM, LLD) - **FAILED** (contains `/`)

The sed error caused the script to exit due to `set -euo pipefail`, preventing all subsequent measurements.

### Flow Diagram

```
┌─────────────────────────────────────┐
│ 1. Measure "Essential Tools" │
│ Name: "Essential Tools" │
│ No / in name - sed works ✓ │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 2. Measure ".NET SDK 8.0" │
│ Name: ".NET SDK 8.0" │
│ No / in name - sed works ✓ │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 3. Measure "C/C++ Tools (...)" │
│ Name contains "/" ◄───────────────── Problem!
│ sed "s/.../...$new_component/" │
│ Interpreted as: s/.../...C/C++... │
│ "C" becomes invalid sed flag ◄───── sed error!
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 4. set -euo pipefail │
│ Script exits immediately │
│ finalize_json_output() never runs │
│ total_size_mb stays 0 │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 5. Validation fails │
│ 0MB < 1000MB threshold │
│ 2 components < 10 threshold │
│ Workflow exits with error │
└─────────────────────────────────────┘
```

## Solution

### The Fix

Replace the sed delimiter `/` with a character that won't appear in component names. The pipe character `|` is a good choice since it's unlikely to appear in component names or JSON values:

**Before (broken):**
```bash
current_json=$(echo "$current_json" | sed "s/\"components\": \[\]/\"components\": [$new_component]/")
current_json=$(echo "$current_json" | sed "s/\]$/,$new_component]/")
```

**After (fixed):**
```bash
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
current_json=$(echo "$current_json" | sed "s|\]$|,$new_component]|")
```

This change needs to be made in:
1. `scripts/measure-disk-space.sh` - main script (lines 114, 117, 133, 134)
2. The embedded sandbox-measure.sh script (lines 344, 346)

### Alternative Solutions Considered

1. **Escape forward slashes in the variable** - Complex and error-prone
2. **Use a different JSON manipulation tool (jq)** - Would add a dependency
3. **Change the component name** - Would hide useful information
4. **Use Python instead of sed** - More complex, but more robust

The chosen solution (using `|` as delimiter) is the simplest and most direct fix.

## Files Modified

- `scripts/measure-disk-space.sh` - Fixed sed delimiter from `/` to `|`

## Prevention

To prevent similar issues in the future:

1. **Use robust delimiters**: When using sed with variable content, use delimiters that won't appear in the data (like `|`, `#`, or `@`)

2. **Consider using jq for JSON manipulation**: The jq tool is designed for JSON and handles special characters properly

3. **Add unit tests**: Test the measurement script with component names containing special characters

4. **Validate component names**: If using sed with `/`, ensure component names don't contain that character

## Related Issues

- [#29 - Components size update failed](https://github.com/link-foundation/sandbox/issues/29) - Different root cause (APT cleanup issue), but same symptom (0MB measurements)

## CI Logs

The full CI logs are available at:
- [GitHub Actions Run 21489818730](https://github.com/link-foundation/sandbox/actions/runs/21489818730/job/61909023165) - Complete workflow output

## Conclusion

The CI/CD failure was caused by using `/` as the sed delimiter when the data being substituted (component names) could contain `/` characters. The fix is straightforward: change the sed delimiter to a character that won't appear in the data, such as `|`. This is a common pitfall when using sed with user-provided or variable data.
76 changes: 76 additions & 0 deletions experiments/test-sed-fix.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env bash
# Test script to verify the sed fix for component names containing /

set -euo pipefail

echo "=== Testing sed fix for component names with forward slashes ==="
echo ""

# Test the problematic component name
COMPONENT_NAME="C/C++ Tools (CMake, Clang, LLVM, LLD)"
CATEGORY="Build Tools"
SIZE_BYTES=51908608
SIZE_MB=49

# Create test JSON
JSON_FILE=$(mktemp)
cat > "$JSON_FILE" << 'EOF'
{
"generated_at": "",
"total_size_mb": 0,
"components": []
}
EOF

echo "Initial JSON:"
cat "$JSON_FILE"
echo ""

# Create new component entry
new_component="{\"name\": \"$COMPONENT_NAME\", \"category\": \"$CATEGORY\", \"size_bytes\": $SIZE_BYTES, \"size_mb\": $SIZE_MB}"

echo "Component to add: $new_component"
echo ""

# Old method (with / as delimiter) - this would fail
echo "Testing OLD method (/ delimiter) - should fail:"
if echo '{"components": []}' | sed "s/\"components\": \[\]/\"components\": [$new_component]/" 2>&1; then
echo "OLD method unexpectedly succeeded"
else
echo "OLD method FAILED as expected (sed error)"
fi
echo ""

# New method (with | as delimiter) - this should work
echo "Testing NEW method (| delimiter) - should succeed:"
if echo '{"components": []}' | sed "s|\"components\": \[\]|\"components\": [$new_component]|" 2>&1; then
echo "NEW method SUCCEEDED as expected"
else
echo "NEW method failed (unexpected!)"
exit 1
fi
echo ""

# Full test - add component to JSON file
echo "Full test - adding component to JSON:"
current_json=$(cat "$JSON_FILE")
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
echo "$current_json" > "$JSON_FILE"

echo "Updated JSON:"
cat "$JSON_FILE"
echo ""

# Verify the JSON is valid
if python3 -c "import json; json.load(open('$JSON_FILE'))" 2>/dev/null; then
echo "JSON is valid!"
else
echo "JSON is invalid!"
exit 1
fi

# Clean up
rm -f "$JSON_FILE"

echo ""
echo "=== All tests passed! ==="
16 changes: 10 additions & 6 deletions scripts/measure-disk-space.sh
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,12 @@ add_measurement() {
# Check if components array is empty
if echo "$current_json" | grep -q '"components": \[\]'; then
# First component - replace empty array
current_json=$(echo "$current_json" | sed "s/\"components\": \[\]/\"components\": [$new_component]/")
# Use | as sed delimiter to avoid issues with / in component names (e.g., "C/C++ Tools")
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
else
# Append to existing array
current_json=$(echo "$current_json" | sed "s/\]$/,$new_component]/")
# Use | as sed delimiter to avoid issues with / in component names
current_json=$(echo "$current_json" | sed "s|\]$|,$new_component]|")
fi

echo "$current_json" > "$JSON_OUTPUT_FILE"
Expand All @@ -128,10 +130,11 @@ finalize_json_output() {
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

# Update timestamp and total
# Use | as sed delimiter for consistency with add_measurement function
local current_json
current_json=$(cat "$JSON_OUTPUT_FILE")
current_json=$(echo "$current_json" | sed "s/\"generated_at\": \"\"/\"generated_at\": \"$timestamp\"/")
current_json=$(echo "$current_json" | sed "s/\"total_size_mb\": 0/\"total_size_mb\": $total_mb/")
current_json=$(echo "$current_json" | sed "s|\"generated_at\": \"\"|\"generated_at\": \"$timestamp\"|")
current_json=$(echo "$current_json" | sed "s|\"total_size_mb\": 0|\"total_size_mb\": $total_mb|")

echo "$current_json" > "$JSON_OUTPUT_FILE"
log_success "Finalized JSON output with total: ${total_mb}MB"
Expand Down Expand Up @@ -340,10 +343,11 @@ add_measurement() {
current_json=$(cat "$JSON_OUTPUT_FILE")
local new_component="{\"name\": \"$name\", \"category\": \"$category\", \"size_bytes\": $size_bytes, \"size_mb\": $size_mb}"

# Use | as sed delimiter to avoid issues with / in component names (e.g., "C/C++ Tools")
if echo "$current_json" | grep -q '"components": \[\]'; then
current_json=$(echo "$current_json" | sed "s/\"components\": \[\]/\"components\": [$new_component]/")
current_json=$(echo "$current_json" | sed "s|\"components\": \[\]|\"components\": [$new_component]|")
else
current_json=$(echo "$current_json" | sed "s/\]$/,$new_component]/")
current_json=$(echo "$current_json" | sed "s|\]$|,$new_component]|")
fi

echo "$current_json" > "$JSON_OUTPUT_FILE"
Expand Down
Loading