Skip to content

feat(agent): overlay GC and mount namespace cleanup#28

Merged
miguelgila merged 3 commits intomainfrom
agent-enhanced
Mar 3, 2026
Merged

feat(agent): overlay GC and mount namespace cleanup#28
miguelgila merged 3 commits intomainfrom
agent-enhanced

Conversation

@miguelgila
Copy link
Owner

Summary

  • Overlay GC: Reconciles on-disk overlay namespaces against the Kubernetes API, removing overlays for namespaces that no longer exist (merged from previous commits)
  • Mount namespace cleanup: Detects and removes stale /run/reaper/ns/* bind-mount files that occur when helper processes crash, nodes partially reboot, or overlay dirs are manually cleaned
  • Deploy fix: Adds mountPropagation: HostToContainer to the agent's run-reaper volume mount for mount-point visibility

Mount namespace cleanup details

  • Detection via /proc/self/mountinfo with /proc/1/mountinfo fallback (hostPID)
  • Non-blocking flock prevents races with the runtime's enter_overlay()
  • Running container safety check prevents removing ns files for active workloads
  • Runs before each overlay GC pass, piggybacks on existing --overlay-gc-enabled and --overlay-gc-interval flags
  • New metrics: reaper_agent_ns_cleanup_runs_total, reaper_agent_ns_cleaned_total

Test plan

  • cargo clippy --target x86_64-unknown-linux-gnu --all-targets — clean
  • cargo test — all unit tests pass
  • ./scripts/run-integration-tests.sh --agent-only — 13/13 agent tests pass
    • Agent ns cleanup stale file — verifies stale ns files are removed
    • Agent ns cleanup preserves active — verifies running container protection
    • Agent ns cleanup metrics — verifies metrics are exposed

🤖 Generated with Claude Code

miguelgila and others added 2 commits March 3, 2026 11:44
Add periodic overlay GC to reaper-agent that lists K8s namespaces via
the API, compares against on-disk overlay directories, and removes
artifacts for namespaces that no longer exist. This prevents orphaned
overlay dirs from accumulating on nodes after namespace deletion.

Key behaviors:
- Safety check: skips cleanup if running containers reference the namespace
- Handles named overlay groups (<ns>--<name> pattern)
- Unmounts namespace bind-mounts before removal (umount2 MNT_DETACH)
- Configurable via --overlay-gc-interval (default 300s) and --overlay-gc-enabled
- Exposes 3 new Prometheus metrics (runs, cleaned, namespaces gauge)

Also fixes a bug in container GC where it would remove overlay
infrastructure directories (overlay/, merged/, ns/) as "orphaned state
dirs" because they lack state.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nd-mounts

Add periodic detection and cleanup of stale `/run/reaper/ns/*` bind-mount
files to the reaper-agent. Stale ns files occur when helper processes crash,
nodes partially reboot, or overlay dirs are manually cleaned.

Detection uses `/proc/self/mountinfo` with a `/proc/1/mountinfo` fallback
(via hostPID) to reliably identify mount points even in nested container
environments. Safety checks prevent removing ns files referenced by running
containers, and non-blocking flock prevents races with the runtime.

Cleanup runs before each overlay GC pass, piggybacks on existing
`--overlay-gc-enabled` and `--overlay-gc-interval` flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.83%. Comparing base (78dae3a) to head (63d981f).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
- Coverage   87.18%   86.83%   -0.36%     
==========================================
  Files           6        6              
  Lines         320      319       -1     
==========================================
- Hits          279      277       -2     
- Misses         41       42       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@miguelgila miguelgila merged commit cac7fef into main Mar 3, 2026
8 checks passed
@miguelgila miguelgila deleted the agent-enhanced branch March 3, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant