Skip to content

feat: add reaper-agent per-node DaemonSet#27

Merged
miguelgila merged 10 commits intomainfrom
feat/reaper-agent
Mar 3, 2026
Merged

feat: add reaper-agent per-node DaemonSet#27
miguelgila merged 10 commits intomainfrom
feat/reaper-agent

Conversation

@miguelgila
Copy link
Owner

Summary

Adds reaper-agent, a per-node Kubernetes DaemonSet that provides a Kubernetes-native operational layer for Reaper:

  • Config sync: Watches a ConfigMap (reaper-config in reaper-system) and writes /etc/reaper/reaper.conf to the host — replaces Ansible-managed config
  • Stale state GC: Scans /run/reaper/*/state.json, detects dead PIDs, cleans up orphaned state
  • Health endpoint: /healthz and /readyz on :9100 — checks binary presence and overlay accessibility
  • Prometheus metrics: /metrics on :9100 — container counts, GC stats, config sync counters

New files

  • src/bin/reaper-agent/ — 5 modules (main, config_sync, gc, health, metrics)
  • deploy/kubernetes/reaper-agent.yaml — Namespace, ConfigMap, ServiceAccount, RBAC, DaemonSet
  • Dockerfile.agent — Multi-stage rust-musl-cross to distroless
  • scripts/build-agent-image.sh — Cross-compile, package, and load into Kind
  • docs/BUGS.md — Documents known flaky tests

Integration test improvements

  • 5 new agent tests (Phase 4a): deployment, config sync, healthz, metrics, stale GC
  • --agent-only flag for fast iteration on agent tests
  • Dedicated KUBECONFIG per Kind cluster to avoid context conflicts
  • Agent build and tests are mandatory (fail the suite if broken)

Test results

  • 39/40 PASS (1 pre-existing DNS annotation flake, documented in BUGS.md)
  • All 5 agent tests PASS

Test plan

  • cargo test — 152 unit/integration tests pass
  • cargo clippy — clean
  • ./scripts/run-integration-tests.sh --agent-only — 5/5 agent tests pass
  • Full integration suite — 39/40 pass (1 pre-existing flake)
  • CI workflow (GitHub Actions)

🤖 Generated with Claude Code

miguelgila and others added 10 commits March 2, 2026 21:41
Introduces reaper-agent, a per-node Kubernetes DaemonSet that provides
operational capabilities for Reaper: ConfigMap-based config sync to host,
stale state GC with dead PID detection, health checks, and Prometheus
metrics on :9100. Gated behind the "agent" cargo feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --with-agent flag to install-reaper.sh for optional reaper-agent
DaemonSet deployment. Add Phase 4a integration tests that verify agent
deployment, ConfigMap sync, /healthz, /metrics, and stale state GC.
Agent tests auto-skip if the image isn't loaded into Kind.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scripts/build-agent-image.sh that cross-compiles the agent binary
via musl, packages it into a distroless container image, and loads it
into Kind. Wire it into Phase 2 setup so Phase 4a agent tests run
automatically instead of being skipped. Build failure is non-fatal
(tests gracefully skip if image is unavailable).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent image build failure now aborts the test suite instead of
silently skipping. Phase 4a agent tests fail hard if the image
isn't found — the agent is core infrastructure, not optional.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Export a per-cluster KUBECONFIG file so all kubectl commands target
the correct Kind cluster, even when the user has other clusters or
contexts active. Previously, bare kubectl defaulted to localhost:8080
if another context was selected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The :latest tag defaults imagePullPolicy to Always, causing kubelet to
attempt pulling from ghcr.io instead of using the locally loaded image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skips cargo tests and Phase 4 integration tests, running only
infrastructure setup and Phase 4a agent tests for fast iteration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent needs root to write /etc/reaper/reaper.conf and clean up
/run/reaper/ on the host. Switch from distroless:nonroot to distroless
root image with securityContext.runAsUser: 0.

Also fix healthz/metrics tests to use kubectl port-forward instead of
kubectl exec (distroless has no shell or wget).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the intermittent containerd sandbox teardown race that causes
the DNS mode annotation override test to fail under load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.83%. Comparing base (78dae3a) to head (f97bc2b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
- Coverage   87.18%   86.83%   -0.36%     
==========================================
  Files           6        6              
  Lines         320      319       -1     
==========================================
- Hits          279      277       -2     
- Misses         41       42       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@miguelgila miguelgila merged commit 9fc4cc7 into main Mar 3, 2026
10 checks passed
@miguelgila miguelgila deleted the feat/reaper-agent branch March 3, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant