feat: add reaper-agent per-node DaemonSet#27
Merged
miguelgila merged 10 commits intomainfrom Mar 3, 2026
Merged
Conversation
Introduces reaper-agent, a per-node Kubernetes DaemonSet that provides operational capabilities for Reaper: ConfigMap-based config sync to host, stale state GC with dead PID detection, health checks, and Prometheus metrics on :9100. Gated behind the "agent" cargo feature. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --with-agent flag to install-reaper.sh for optional reaper-agent DaemonSet deployment. Add Phase 4a integration tests that verify agent deployment, ConfigMap sync, /healthz, /metrics, and stale state GC. Agent tests auto-skip if the image isn't loaded into Kind. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scripts/build-agent-image.sh that cross-compiles the agent binary via musl, packages it into a distroless container image, and loads it into Kind. Wire it into Phase 2 setup so Phase 4a agent tests run automatically instead of being skipped. Build failure is non-fatal (tests gracefully skip if image is unavailable). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agent image build failure now aborts the test suite instead of silently skipping. Phase 4a agent tests fail hard if the image isn't found — the agent is core infrastructure, not optional. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Export a per-cluster KUBECONFIG file so all kubectl commands target the correct Kind cluster, even when the user has other clusters or contexts active. Previously, bare kubectl defaulted to localhost:8080 if another context was selected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The :latest tag defaults imagePullPolicy to Always, causing kubelet to attempt pulling from ghcr.io instead of using the locally loaded image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skips cargo tests and Phase 4 integration tests, running only infrastructure setup and Phase 4a agent tests for fast iteration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent needs root to write /etc/reaper/reaper.conf and clean up /run/reaper/ on the host. Switch from distroless:nonroot to distroless root image with securityContext.runAsUser: 0. Also fix healthz/metrics tests to use kubectl port-forward instead of kubectl exec (distroless has no shell or wget). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the intermittent containerd sandbox teardown race that causes the DNS mode annotation override test to fail under load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #27 +/- ##
==========================================
- Coverage 87.18% 86.83% -0.36%
==========================================
Files 6 6
Lines 320 319 -1
==========================================
- Hits 279 277 -2
- Misses 41 42 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
reaper-agent, a per-node Kubernetes DaemonSet that provides a Kubernetes-native operational layer for Reaper:reaper-configinreaper-system) and writes/etc/reaper/reaper.confto the host — replaces Ansible-managed config/run/reaper/*/state.json, detects dead PIDs, cleans up orphaned state/healthzand/readyzon:9100— checks binary presence and overlay accessibility/metricson:9100— container counts, GC stats, config sync countersNew files
src/bin/reaper-agent/— 5 modules (main, config_sync, gc, health, metrics)deploy/kubernetes/reaper-agent.yaml— Namespace, ConfigMap, ServiceAccount, RBAC, DaemonSetDockerfile.agent— Multi-stage rust-musl-cross to distrolessscripts/build-agent-image.sh— Cross-compile, package, and load into Kinddocs/BUGS.md— Documents known flaky testsIntegration test improvements
--agent-onlyflag for fast iteration on agent testsTest results
Test plan
cargo test— 152 unit/integration tests passcargo clippy— clean./scripts/run-integration-tests.sh --agent-only— 5/5 agent tests pass🤖 Generated with Claude Code