|
| 1 | +## Purpose |
| 2 | + |
| 3 | +This file defines **non-negotiable engineering standards** for anyone (human or AI agent) making changes to this repository. It is optimized for **high Go code quality**, long-term maintainability, and safety when interacting with Kubernetes clusters. |
| 4 | + |
| 5 | +Fortsa is a Kubernetes operator. Small mistakes (reconcile loops, incorrect patching, unsafe retries, noisy logs) can have outsized impact in production. Treat this repo as **production-grade control-plane code**. |
| 6 | + |
| 7 | +## Non-negotiables (read first) |
| 8 | + |
| 9 | +- You **MUST** keep the operator **safe-by-default**: |
| 10 | + - **MUST NOT** cause uncontrolled reconcile storms. |
| 11 | + - **MUST NOT** introduce hot loops (tight polling, immediate requeues without backoff/jitter). |
| 12 | + - **MUST NOT** patch workloads repeatedly without cooldown/guards. |
| 13 | + - **MUST NOT** broaden RBAC needs without explicit justification and review. |
| 14 | +- You **MUST** follow communication and documentation hygiene: |
| 15 | + - **NEVER** use emoji, or Unicode characters that emulate emoji (e.g. ✓, ✗). |
| 16 | + - **MUST** avoid redundant comments that are tautological/self-demonstrating (e.g. restating what a clearly named function does, or narrating code that is obvious at a glance). |
| 17 | +- You **MUST** preserve the project’s architectural constraints: |
| 18 | + - **MUST NOT** introduce CRDs (this project explicitly aims for “no CRDs”). |
| 19 | + - **MUST** keep dependencies minimal and appropriate for controller-runtime/client-go ecosystems. |
| 20 | + - **MUST** keep the final artifact as a **single Go binary** suitable for a minimal container image. |
| 21 | +- You **MUST** keep code formatted and lint-clean: |
| 22 | + - **MUST** run `make fmt`. |
| 23 | + - **MUST** run `make vet`. |
| 24 | + - **MUST** run `make lint` (golangci-lint). |
| 25 | + - **MUST** ensure tests relevant to your change pass (`make test`, and `make test-integration` / `make test-e2e*` when appropriate). |
| 26 | + |
| 27 | +## Repository workflows (canonical commands) |
| 28 | + |
| 29 | +Use the Makefile targets as the canonical workflow: |
| 30 | + |
| 31 | +- **Formatting**: `make fmt` |
| 32 | +- **Static analysis**: `make vet` |
| 33 | +- **Lint**: `make lint` (and `make lint-fix` only for safe, mechanical fixes) |
| 34 | +- **Unit tests**: `make test` |
| 35 | +- **Integration tests**: `make test-integration` |
| 36 | +- **E2E tests**: `make test-e2e` or `make test-e2e-istio` |
| 37 | +- **Build**: `make build` |
| 38 | + |
| 39 | +**MUST** prefer `make <target>` over ad-hoc `go test` / `golangci-lint` invocations so that versioned tooling and repo defaults are applied consistently. |
| 40 | + |
| 41 | +## Go version and language features |
| 42 | + |
| 43 | +- **MUST** assume Go version requirements defined by the repo (see README; currently Go `v1.26.0+`). |
| 44 | +- **MUST** write idiomatic Go: |
| 45 | + - Prefer clarity over cleverness. |
| 46 | + - Keep functions small and single-purpose. |
| 47 | + - Avoid unnecessary generics; use generics only when they clearly reduce duplication without harming readability. |
| 48 | + |
| 49 | +## Project structure and dependency boundaries |
| 50 | + |
| 51 | +The repo uses a layered package structure under `internal/`. Follow it. |
| 52 | + |
| 53 | +- **MUST** keep new code in the most appropriate existing package (`internal/controller`, `internal/podscanner`, `internal/annotator`, etc.). |
| 54 | +- **MUST** avoid cyclic dependencies; prefer “controller → lower layers” flow. |
| 55 | +- **SHOULD** keep packages cohesive: |
| 56 | + - `controller`: orchestration and reconcile routing. |
| 57 | + - `podscanner`: cluster reads and detection logic. |
| 58 | + - `annotator`: workload patching/updates. |
| 59 | + - `webhook`: calling Istio injection webhook and parsing. |
| 60 | + - `cache`: change-detection state. |
| 61 | +- **MUST** keep `cmd/main.go` focused on wiring, flags, and manager startup. |
| 62 | + |
| 63 | +## Formatting, naming, and API surface |
| 64 | + |
| 65 | +### Formatting |
| 66 | + |
| 67 | +- **MUST** use `gofmt` (via `make fmt`). |
| 68 | +- **MUST NOT** commit code that depends on “manual formatting discipline.” |
| 69 | + |
| 70 | +### Naming |
| 71 | + |
| 72 | +- **MUST** follow Go naming conventions: |
| 73 | + - Package names: short, lower-case, no underscores. |
| 74 | + - Exported identifiers: only when needed by other packages; otherwise keep unexported. |
| 75 | + - Avoid stutter: e.g., `podscanner.Scanner` is better than `podscanner.PodScannerScanner`. |
| 76 | +- **MUST** choose names that reflect intent and domain: |
| 77 | + - Use Kubernetes terms consistently: “workload” (Deployment/StatefulSet/DaemonSet), “pod template”, “owner reference”, “namespace label/annotation”, “reconcile request”. |
| 78 | + |
| 79 | +### Public surface area |
| 80 | + |
| 81 | +- **MUST** minimize exported APIs. |
| 82 | +- If you export something: |
| 83 | + - **MUST** document it with a complete Go doc comment. |
| 84 | + - **MUST** keep it stable (expect downstream usage). |
| 85 | + |
| 86 | +## Error handling (Go and Kubernetes-specific) |
| 87 | + |
| 88 | +### General rules |
| 89 | + |
| 90 | +- **MUST** wrap errors with context using `%w`: |
| 91 | + |
| 92 | +```go |
| 93 | +return fmt.Errorf("parse istio values JSON: %w", err) |
| 94 | +``` |
| 95 | + |
| 96 | +- **MUST NOT** swallow errors unless the behavior is explicitly desired and safe; when intentionally ignoring an error, you **MUST** justify it in code with a short intent comment. |
| 97 | +- **MUST** preserve sentinel error checks when used (e.g., `apierrors.IsNotFound(err)`). |
| 98 | + |
| 99 | +### Controller-runtime reconciliation semantics |
| 100 | + |
| 101 | +- **MUST** treat reconciliation as **idempotent**: |
| 102 | + - Re-running reconcile with the same cluster state must not change anything. |
| 103 | + - Reconcile should converge, not oscillate. |
| 104 | +- **MUST** distinguish between: |
| 105 | + - transient errors (requeue is appropriate), |
| 106 | + - terminal/expected conditions (return success, no requeue), |
| 107 | + - “not found” (usually success; the object is gone). |
| 108 | +- **MUST NOT** requeue immediately in a tight loop. Prefer returning errors (controller-runtime backoff) or deliberate `RequeueAfter` with a reasoned delay. |
| 109 | + |
| 110 | +### Kubernetes API interactions |
| 111 | + |
| 112 | +- **MUST** avoid unnecessary writes: |
| 113 | + - Prefer read/compare before patching. |
| 114 | + - Patch only when a change is required. |
| 115 | +- **MUST** use patching correctly: |
| 116 | + - Prefer minimal merge patches to avoid clobbering fields. |
| 117 | + - **MUST** ensure patches are scoped only to fields you own (e.g., annotations under `spec.template.metadata.annotations` for workloads). |
| 118 | +- **MUST** be careful with object mutations: |
| 119 | + - Do not mutate cached objects in-place in ways that leak across reconciles. |
| 120 | + - When constructing objects to send externally, prefer deep copies or fresh structs as appropriate. |
| 121 | + |
| 122 | +## Context, timeouts, and cancellations |
| 123 | + |
| 124 | +- **MUST** thread `context.Context` through all IO-bound operations. |
| 125 | +- **MUST** respect cancellation: |
| 126 | + - If `ctx.Done()` is triggered, return promptly. |
| 127 | +- **MUST** set bounded timeouts for outbound calls (e.g., webhook requests) unless the caller already enforces a timeout. |
| 128 | +- **SHOULD** avoid “infinite” waits and sleeps. If sleeps are needed (e.g., Istiod config read delay), keep them explicit and configurable. |
| 129 | + |
| 130 | +## Logging and observability |
| 131 | + |
| 132 | +- **MUST** keep logs actionable: |
| 133 | + - Include enough identifying data to debug (namespace/name, workload kind, revision/tag). |
| 134 | + - Avoid dumping huge objects or full pod specs. |
| 135 | +- **MUST** treat log volume as a production cost: |
| 136 | + - Frequent paths should use lower verbosity or be rate-limited by behavior (not a logging hack). |
| 137 | +- **MUST NOT** log secrets or sensitive data (webhook CA bundles, tokens, raw admission payloads). |
| 138 | +- **SHOULD** prefer structured logging fields over string concatenation. |
| 139 | + |
| 140 | +## Concurrency and shared state |
| 141 | + |
| 142 | +- **MUST** assume reconciles can run concurrently. |
| 143 | +- **MUST** protect shared state with synchronization (mutex/atomic) and design for safe access. |
| 144 | +- **MUST** avoid deadlocks and lock contention: |
| 145 | + - Keep lock scope minimal. |
| 146 | + - Do not call external IO while holding locks. |
| 147 | +- **MUST** be explicit about thread-safety in types that are shared between reconciles. |
| 148 | + |
| 149 | +## Determinism, ordering, and retries |
| 150 | + |
| 151 | +- **MUST** design behavior to be deterministic under re-ordering: |
| 152 | + - Watch events can arrive in any order and may be duplicated. |
| 153 | + - Caches can be stale. |
| 154 | +- **MUST** handle API conflicts (resourceVersion changes) gracefully: |
| 155 | + - Patch operations should be resilient. |
| 156 | + - If updates conflict, retry in a controlled manner. |
| 157 | +- **SHOULD** avoid relying on object list ordering from the API server. |
| 158 | + |
| 159 | +## Domain-specific safety rules (Fortsa/Istio) |
| 160 | + |
| 161 | +- ReplicaSets and ControllerRevisions are traversed as intermediate owner-chain nodes only. |
| 162 | +- Fortsa’s core action is to add/update `fortsa.scaffidi.net/restartedAt` on workloads. |
| 163 | + - **MUST** keep this mechanism correct and minimal. |
| 164 | + - **MUST** preserve cooldown semantics when present; do not add “helpful” restarts that can lead to churn. |
| 165 | +- Fortsa compares desired vs actual proxy sidecar images. |
| 166 | + - **MUST** keep comparisons precise and well-defined (including behavior for hub comparison flags). |
| 167 | + - **MUST** avoid false positives that could restart large numbers of workloads unnecessarily. |
| 168 | +- Fortsa depends on Istio injection webhook behavior. |
| 169 | + - **MUST** treat webhook calls as untrusted external input: |
| 170 | + - Validate response shapes. |
| 171 | + - Handle malformed patches gracefully. |
| 172 | + - Fail safely (no mutation if uncertain). |
| 173 | + |
| 174 | +## Testing standards |
| 175 | + |
| 176 | +### Minimum expectations |
| 177 | + |
| 178 | +- **MUST** add or update tests when changing behavior. |
| 179 | +- **MUST** ensure tests validate the important properties: |
| 180 | + - idempotency, |
| 181 | + - correct skip/cooldown behavior, |
| 182 | + - correct revision/tag mapping, |
| 183 | + - correct workload selection (Deployment/StatefulSet/DaemonSet), |
| 184 | + - correct patch payloads. |
| 185 | + |
| 186 | +### Table-driven tests |
| 187 | + |
| 188 | +- **SHOULD** use table-driven tests for logic-heavy helpers and parsing functions. |
| 189 | +- **MUST** keep test cases readable: |
| 190 | + - Name each case with the behavior being tested. |
| 191 | + - Keep fixtures minimal; avoid massive YAML/JSON blobs unless the parsing logic requires realism. |
| 192 | + |
| 193 | +### Fake clients and envtest |
| 194 | + |
| 195 | +- **MUST** prefer unit tests for pure logic and parsing. |
| 196 | +- **SHOULD** use envtest (via `make test` / `make test-integration`) when behavior depends on controller-runtime/client-go semantics. |
| 197 | +- **MUST** keep tests deterministic: |
| 198 | + - Avoid sleeps; use polling with timeouts only when required. |
| 199 | + - Avoid reliance on real cluster behavior in unit/integration tests. |
| 200 | + |
| 201 | +## Code style for Kubernetes operators |
| 202 | + |
| 203 | +### Reconcile design |
| 204 | + |
| 205 | +- **MUST** keep reconcile handlers small and composable: |
| 206 | + - Parse inputs → compute desired actions → apply minimal changes. |
| 207 | +- **MUST** separate “what should happen” from “apply it”: |
| 208 | + - Business logic should be testable without a live API server. |
| 209 | +- **MUST** avoid “do everything in reconcile” designs that grow unbounded: |
| 210 | + - Keep scanning and annotation steps isolated and individually testable. |
| 211 | + |
| 212 | +### Resource ownership and field management |
| 213 | + |
| 214 | +- **MUST** only modify fields that this controller owns. |
| 215 | +- **MUST NOT** overwrite existing user annotations/labels unrelated to Fortsa’s function. |
| 216 | +- **MUST** treat `managedFields` and server-side apply behavior carefully: |
| 217 | + - Do not depend on `managedFields` being present or stable for logic unless guarded with robust fallback. |
| 218 | + |
| 219 | +## Security and supply chain |
| 220 | + |
| 221 | +- **MUST** avoid adding large or risky dependencies. |
| 222 | +- **MUST** prefer standard library and well-known Kubernetes libraries already in use. |
| 223 | +- **MUST** validate external data (webhook responses, JSON/YAML parsing) and handle failure modes safely. |
| 224 | +- **MUST NOT** introduce shelling-out or executing external binaries from the controller unless explicitly required and reviewed. |
| 225 | + |
| 226 | +## Performance and scalability |
| 227 | + |
| 228 | +- **MUST** treat cluster-wide scans as expensive: |
| 229 | + - Avoid repeated full scans on small changes. |
| 230 | + - Cache and short-circuit when safe (as the repo already does). |
| 231 | +- **MUST** avoid per-pod expensive operations when a per-workload or cached approach is sufficient. |
| 232 | +- **SHOULD** batch or deduplicate work: |
| 233 | + - Do not annotate the same workload multiple times in the same reconcile run. |
| 234 | + - Avoid repeated webhook calls for identical expected pod templates where caching is safe. |
| 235 | + |
| 236 | +## Documentation expectations |
| 237 | + |
| 238 | +- **MUST** update docs when changing user-facing behavior: |
| 239 | + - README for flags, build/run behavior, prerequisites. |
| 240 | + - ARCHITECTURE.md if control flow / package boundaries materially change. |
| 241 | +- **SHOULD** add small examples to docs when changes are subtle (e.g., new skip logic). |
| 242 | +- **MUST** keep code comments high-signal: |
| 243 | + - **MUST NOT** add redundant comments that simply restate what the code does. |
| 244 | + - Comments should explain intent, constraints, non-obvious trade-offs, or “why”, not narrate “what”. |
| 245 | + |
| 246 | +## Review checklist (before considering work “done”) |
| 247 | + |
| 248 | +- **Correctness** |
| 249 | + - **MUST** be idempotent. |
| 250 | + - **MUST** avoid reconcile storms/hot loops. |
| 251 | + - **MUST** handle NotFound and transient errors correctly. |
| 252 | +- **Quality** |
| 253 | + - **MUST** be `gofmt`’d. |
| 254 | + - **MUST** pass `make vet` and `make lint`. |
| 255 | + - **MUST** have tests for behavior changes. |
| 256 | +- **Operational safety** |
| 257 | + - **MUST** avoid unnecessary writes to the API server. |
| 258 | + - **MUST** keep log volume and verbosity appropriate. |
| 259 | + - **MUST** respect timeouts and cancellations. |
| 260 | +- **Architecture** |
| 261 | + - **MUST NOT** introduce CRDs. |
| 262 | + - **MUST** keep dependencies minimal and package boundaries clean. |
| 263 | + |
0 commit comments