This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
CNPG Storage Manager is a Kubernetes controller that prevents database outages caused by storage exhaustion in CloudNativePG (CNPG) PostgreSQL clusters. It provides automated monitoring, alerting, and remediation for storage-related issues including WAL file accumulation and PVC expansion.
- Language: Go 1.21+
- Framework: controller-runtime v0.16+
- Scaffolding: kubebuilder v3.12+
- Testing: Ginkgo v2 + Gomega + envtest
- Deployment: Helm chart + Kustomize
# Generate manifests (CRDs, RBAC, webhooks)
make manifests
# Generate code (DeepCopy, etc.)
make generate
# Build controller binary
make build
# Run tests
make test
# Run unit tests only
make test-unit
# Run integration tests (envtest)
make test-integration
# Run E2E tests (requires kind cluster)
make test-e2e
# Build Docker image
make docker-build IMG=ghcr.io/supporttools/cnpg-storage-manager:dev
# Push Docker image
make docker-push IMG=ghcr.io/supporttools/cnpg-storage-manager:dev
# Install CRDs to cluster
make install
# Deploy controller to cluster
make deploy IMG=ghcr.io/supporttools/cnpg-storage-manager:dev
# Run controller locally (outside cluster)
make run
# Run linter
make lint
# Full validation (mirrors CI/CD)
make validate-pipeline-local| Decision | Choice |
|---|---|
| API Group | cnpg.supporttools.io |
| CNPG Coordination | Annotation-based (non-invasive) |
| WAL Cleanup | pg_archivecleanup via pod exec |
| Storage Backend | Any CSI with allowVolumeExpansion=true |
| Multi-Instance | Expand all PVCs when any hits threshold |
| Policy Conflicts | Validation webhook rejects overlapping |
| State Persistence | Controller uses PVC for metrics history |
| Resize Failure | Alert and wait (no auto pod restart) |
| Kubernetes Version | 1.25+ |
-
StoragePolicy (
cnpg.supporttools.io/v1alpha1): Defines per-cluster or per-namespace storage management policies including thresholds, expansion settings, WAL cleanup config, circuit breaker, and alerting channels. -
StorageEvent (
cnpg.supporttools.io/v1alpha1): Records storage-related events (expansion, WAL cleanup) with detailed status tracking per-PVC.
- CNPGStorageController: Main reconciliation loop watching CNPG clusters and PVCs
- Metrics Collector: Integrates with kubelet volume stats API, persists to PVC
- Policy Engine: Evaluates policies with validation webhook for conflict detection
- Remediation Engine: Executes PVC expansion (all PVCs in cluster)
- WAL Cleanup Engine: Executes pg_archivecleanup via client-go pod exec
- Alert Manager: Prometheus Alertmanager, Slack, PagerDuty integration
- Circuit Breaker: Per-cluster failure tracking with auto-reset
Controller uses annotations on CNPG Cluster CR for coordination:
storage.cnpg.supporttools.io/managed: "true"
storage.cnpg.supporttools.io/paused: "false"
storage.cnpg.supporttools.io/target-size: "15Gi"
storage.cnpg.supporttools.io/current-usage-percent: "87"
.
├── api/v1alpha1/ # CRD type definitions
│ ├── storagepolicy_types.go
│ ├── storageevent_types.go
│ └── groupversion_info.go
├── cmd/main.go # Controller entrypoint
├── config/ # Kubernetes manifests
│ ├── crd/bases/ # Generated CRD YAML
│ ├── rbac/ # RBAC configuration
│ ├── manager/ # Controller deployment
│ └── default/ # Kustomize overlays
├── controllers/ # Controller implementations
│ ├── storagepolicy_controller.go
│ └── suite_test.go
├── pkg/ # Internal packages
│ ├── metrics/ # Prometheus metrics collector
│ ├── policy/ # Policy evaluation engine
│ ├── remediation/ # Expansion and cleanup logic
│ ├── alerting/ # Alert integrations
│ └── exec/ # Pod exec for pg_archivecleanup
├── charts/ # Helm chart
│ └── cnpg-storage-manager/
├── test/e2e/ # E2E tests
└── hack/ # Build scripts
# Create new API
kubebuilder create api --group cnpg --version v1alpha1 --kind NewResource
# Implement types in api/v1alpha1/newresource_types.go
# Implement controller in controllers/newresource_controller.go
# Regenerate
make manifests generate- Unit Tests (
./pkg/...): Standard Go tests for business logic - Integration Tests (
./controllers/...): Use envtest for controller tests - E2E Tests (
./test/e2e/...): Use kind cluster with real CNPG operator
# All tests
make test
# Unit tests with coverage
go test -v -race -coverprofile=coverage.out ./pkg/...
# Integration tests
make test-integration
# Specific test
go test -v ./controllers/... -run TestStoragePolicyController
# E2E (setup kind first)
kind create cluster --config=test/e2e/kind-config.yaml
make test-e2e- Test coverage: >85% for core components
- Memory usage: <100MB
- API response time: <50ms
- Clean golangci-lint and gosec results
Always run before pushing:
make validate-pipeline-localThis runs: fmt, vet, lint, unit tests, integration tests.
This project uses a multi-agent development approach. See .claude/AGENT_GUIDE.md for details.
- go-kubernetes-skeptic: Review Go/K8s code quality
- kubernetes-specialist: K8s troubleshooting and best practices
- database-skeptic: PostgreSQL and data integrity concerns
- qa-engineer: Test strategy and coverage
- devops-engineer: CI/CD and deployment
All changes must pass:
make validate-pipeline-local(technical validation)- QA Engineer review (test coverage)
- Devils Advocate review (edge cases)
- Skeptic review (domain-specific concerns)
- CloudNativePG Operator (v1.20+): Cluster status via CR, coordination via annotations
- Any CSI Driver: Storage expansion via
allowVolumeExpansion: true - Prometheus: Metrics exposition (
/metrics) - Alertmanager: Alert routing
- Define in
pkg/metrics/metrics.go - Register in controller setup
- Update in reconcile loop
- Document in design.md
- Implement interface in
pkg/alerting/ - Add configuration to StoragePolicy spec
- Update Helm values.yaml
- Add tests
- Update types in
api/v1alpha1/ - Run
make manifests generate - Update validation webhook if needed
- Update Helm CRD templates
- Add migration notes
This project uses TaskForge for task management (Project ID: 82).
- Project Foundation & Infrastructure - kubebuilder scaffolding, CRD types, build system
- Core Monitoring Infrastructure - CNPG discovery, kubelet metrics, threshold evaluation
- Automated Storage Expansion - PVC expansion engine, storage class validation
- Configuration & Policy Management - Policy selectors, validation webhook, conflict detection
- Alerting & Observability - Alertmanager, Slack, PagerDuty integration
- WAL Cleanup & Emergency Actions - pg_archivecleanup, circuit breaker
- Testing & Quality Assurance - Unit, integration, E2E test suites
- Deployment & Operations - Helm chart, security scanning, release automation
design.md: Technical design document with full CRD specificationsdocs/development/: Development guidesdocs/workflows/: Process workflows.claude/AGENT_GUIDE.md: Agent usage guide
Phase 1: Project Foundation & Infrastructure - COMPLETE
- kubebuilder scaffolding initialized
- StoragePolicy and StorageEvent CRDs implemented
- Build infrastructure and CI/CD configured
- All tests passing
Phase 2: Core Monitoring Infrastructure - COMPLETE
- CNPG cluster discovery (
pkg/cnpg/discovery.go) - Kubelet metrics collector (
pkg/metrics/collector.go) - Prometheus metrics exposition (
pkg/metrics/prometheus.go) - Threshold evaluation engine (
pkg/policy/evaluation.go) - 85.6% coverage - Cluster annotation management (
pkg/annotations/annotations.go) - 97.1% coverage - StoragePolicy controller reconciler (
internal/controller/storagepolicy_controller.go)
Next Phases:
- Phase 3: Automated Storage Expansion - In progress (PVC expansion logic implemented)
- Phase 4: Configuration & Policy Management
- Phase 5: Alerting & Observability
- Phase 6: WAL Cleanup & Emergency Actions
- Phase 7: Testing & Quality Assurance
- Phase 8: Deployment & Operations
| File | Description |
|---|---|
api/v1alpha1/storagepolicy_types.go |
StoragePolicy CRD with thresholds, expansion, WAL cleanup, circuit breaker |
api/v1alpha1/storageevent_types.go |
StorageEvent audit trail CRD |
internal/controller/storagepolicy_controller.go |
Main controller reconciliation logic |
pkg/cnpg/discovery.go |
CNPG cluster discovery via unstructured client |
pkg/metrics/collector.go |
Kubelet volume stats collection |
pkg/metrics/prometheus.go |
Prometheus metrics registration and helpers |
pkg/policy/evaluation.go |
Threshold evaluation and action recommendations |
pkg/annotations/annotations.go |
Cluster annotation helpers |
design.md: Technical design document with full CRD specificationsdocs/development/: Development guidesdocs/workflows/: Process workflows.claude/AGENT_GUIDE.md: Agent usage guide