feat: ReaperOverlay CRD for overlay lifecycle management#47
Merged
miguelgila merged 14 commits intomainfrom Mar 21, 2026
Merged
feat: ReaperOverlay CRD for overlay lifecycle management#47miguelgila merged 14 commits intomainfrom
miguelgila merged 14 commits intomainfrom
Conversation
PVC-like CRD that decouples overlay lifecycle from pod lifecycle. Covers CRD design, controller reconciliation, agent reset endpoints, PVC-like blocking for ReaperPods, and integration test strategy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a PVC-like CRD that decouples overlay lifecycle from pod lifecycle,
enabling Kubernetes-native overlay creation, reset, and deletion without
requiring direct node access.
Changes:
- ReaperOverlay CRD types (spec: resetPolicy, resetGeneration; status:
phase, observedResetGeneration, per-node status)
- Overlay controller reconciler with finalizer-based cleanup and
reset via generation counter
- PVC-like blocking: ReaperPods with overlayName stay Pending until
a matching ReaperOverlay is Ready
- Agent HTTP endpoints: GET/DELETE /api/v1/overlays/{namespace}/{name}
for overlay inspection and reset
- RBAC updates for controller to manage reaperoverlays resources
- CRD generation script updated for both CRDs
- reqwest added as controller dependency for agent communication
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #47 +/- ##
=======================================
Coverage 85.01% 85.01%
=======================================
Files 6 6
Lines 307 307
=======================================
Hits 261 261
Misses 46 46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Unit tests for ReaperOverlaySpec/Status/NodeStatus serialization and defaults (12 tests in src/crds/reaper_overlay.rs) - Unit tests for overlay_api list/get functions (7 tests in src/bin/reaper-agent/overlay_api.rs) - Kind integration tests (Phase 4c): CRD install, status, kubectl columns, PVC-like blocking, reset generation, delete cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Controller sets ReaperOverlay phase to Ready immediately on creation (overlays are lazily created by runtime, not pre-provisioned) - Remove unused update_overlay_status and check_agent_overlay_exists - Fix annotations test: create ReaperOverlay before ReaperPod with overlayName (PVC-like blocking requires matching overlay) - Fix shell arithmetic error in delete test (tr -d '[:space:]') - Defer controller cleanup to after Phase 4c so overlay tests have a running controller All 14 Kind integration tests pass (Phase 4b + 4c). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add shortName "rovl" to ReaperOverlay (kubectl get rovl) - Add shortName "rpod" to ReaperPod (kubectl get rpod) - Regenerate CRD YAML with shortNames - Update docs/book CRDs reference page with ReaperOverlay section - Rename SUMMARY.md entry to "Custom Resource Definitions" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add shortName verification to kubectl column tests: - kubectl get rpod returns expected columns - kubectl get rovl returns expected columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add slurm-overlay.yaml (ReaperOverlay CRD for the shared slurm overlay) - Update README with overlay creation step and troubleshooting section for resetting corrupt overlays via kubectl patch rovl - Update slurmd-daemonset.yaml comment (issue #41 is fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All example setup scripts now print `export KUBECONFIG=...` in their summary output so users know how to connect to the cluster from a fresh shell. Also: - Slurm setup script installs ReaperOverlay CRD (idempotent) to support the slurm-overlay.yaml resource - Slurm README lists ReaperOverlay CRD as prerequisite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace undefined if_log with LOG_FILE redirect. This ensures the ReaperOverlay CRD is installed even when using --release with a published Helm chart that predates the CRD. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The static slurm-config.yaml with placeholder values (CPU_COUNT, COMPUTE_NODE_LIST) was being applied by `kubectl apply -f examples/10-slurm-hpc/`, overwriting the correctly generated ConfigMap from setup.sh. - Rename slurm-config.yaml → slurm-config.yaml.template - Update setup.sh summary to list individual files instead of directory - Update README deploy instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use Kubernetes Job instead of Pod for the test - Use sbatch --wait so the job blocks until completion - Print job stdout/stderr output directly in the logs - Show clear PASSED/FAILED verdict - Update README and setup summary with job/test-slurm-job log command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sbatch writes output to the compute node's filesystem, not the submitter's. srun streams output back directly so job results are visible in kubectl logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
srun hangs because it needs bidirectional communication between the submitter and compute node. Use sbatch --parsable + scontrol polling instead, which works reliably across node boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReaperOverlayCRD — a PVC-like resource that decouples overlay lifecycle from pod lifecycleoverlayNamenow block (stay Pending) until a matchingReaperOverlayexists and isReady, like Pods with unbound PVCsChanges
CRD (
src/crds/reaper_overlay.rs)ReaperOverlaySpec:resetPolicy(Manual/OnFailure/OnDelete),resetGeneration(monotonic counter)ReaperOverlayStatus:phase,observedResetGeneration, per-node status array,messagedeploy/kubernetes/crds/anddeploy/helm/reaper/crds/Controller (
src/bin/reaper-controller/)overlay_reconciler.rs: watches ReaperOverlay objects, manages finalizer for cleanup, handles reset via generation counter, discovers and calls agentsreconciler.rs: PVC-like blocking — checks for Ready ReaperOverlay before creating Podsmain.rs: runs both ReaperPod and ReaperOverlay controllers concurrentlyreqwestdependency for controller-to-agent HTTP communicationAgent (
src/bin/reaper-agent/)overlay_api.rs:list_overlays(),get_overlay(),delete_overlay()functionsGET /api/v1/overlays,GET/DELETE /api/v1/overlays/{namespace}/{name}overlay_gc.rsfor reusing existing cleanup logicHelm & RBAC
reaperoverlaysandreaperoverlays/statuspermissionsdeploy/kubernetes/reaper-controller.yaml) updatedUsage
Related
reaper.io→reaper.giar.dev)Test plan
cargo clippy --workspace --all-targets -- -D warningspassescargo clippy --target x86_64-unknown-linux-gnu --all-targets -- -D warningspassescargo test --workspace— all 163 tests pass🤖 Generated with Claude Code