|
| 1 | +# Locust K8s Operator — Go Rewrite Polish |
| 2 | + |
| 3 | +## What This Is |
| 4 | + |
| 5 | +A comprehensive fix pass on the Go rewrite of the Locust Kubernetes Operator (PR #274, `feat/go` -> `master`). The operator manages distributed Locust load tests on Kubernetes via a LocustTest CRD. Eight expert reviewers identified 9 critical, ~25 major, and 50+ minor issues. This project addresses every single one, moves the Go code from `locust-k8s-operator-go/` to repo root, and removes the legacy Java operator code — making the PR merge-ready and production-perfect. |
| 6 | + |
| 7 | +## Core Value |
| 8 | + |
| 9 | +Every API field the operator exposes must actually work, and the operator must never crash or silently ignore user configuration. |
| 10 | + |
| 11 | +## Requirements |
| 12 | + |
| 13 | +### Validated |
| 14 | + |
| 15 | +- ✓ Kubernetes operator using controller-runtime v0.21.0 — existing |
| 16 | +- ✓ LocustTest CRD with v2 (storage) and v1 (deprecated) API versions — existing |
| 17 | +- ✓ Conversion webhooks for v1 <-> v2 — existing |
| 18 | +- ✓ Validation webhooks for CR creation/update — existing |
| 19 | +- ✓ Master/worker Job-based test execution — existing |
| 20 | +- ✓ Status subresource with phase tracking (Pending/Running/Succeeded/Failed) — existing |
| 21 | +- ✓ Owner references for automatic garbage collection — existing |
| 22 | +- ✓ OpenTelemetry support for native Locust metrics export — existing |
| 23 | +- ✓ Volume mounting with target filtering — existing |
| 24 | +- ✓ Environment variable and secret injection — existing |
| 25 | +- ✓ Helm chart with webhook support and security hardening — existing |
| 26 | +- ✓ Comprehensive test pyramid (~150 unit, ~20 integration, ~30 E2E) — existing |
| 27 | +- ✓ CI/CD pipelines for build, test, and release — existing |
| 28 | +- ✓ Multi-arch container images via ko — existing |
| 29 | +- ✓ Documentation with migration guide, API reference, and developer docs — existing |
| 30 | + |
| 31 | +### Active |
| 32 | + |
| 33 | +- [ ] Wire `extraArgs` into master/worker command builders (C1) |
| 34 | +- [ ] Wire CR-level `resources` with fallback to operator config (C2) |
| 35 | +- [ ] Replace `resource.MustParse()` with safe parsing and error handling (C3) |
| 36 | +- [ ] Unify volume name constants across packages (C4) |
| 37 | +- [ ] Fix `ko build` path in release pipeline (C5) |
| 38 | +- [ ] Remove fragile `Generation > 1` reconciliation guard (C6) |
| 39 | +- [ ] Fix Helm CRD conversion webhook architecture (C7) |
| 40 | +- [ ] Verify/fix `readOnlyRootFilesystem` with webhook cert writes (C8) |
| 41 | +- [ ] Keep `failurePolicy: Fail` but ensure operator availability during upgrades (C9) |
| 42 | +- [ ] Remove dead code: `isJobComplete()`/`isJobFailed()` (M1) |
| 43 | +- [ ] Fix mount path trailing slash inconsistency (M2) |
| 44 | +- [ ] Fix error message typo + duplicate logging in main (M3) |
| 45 | +- [ ] Align `ko` action versions across workflows (M4) |
| 46 | +- [ ] Add `timeout-minutes` to all CI/CD jobs (M5) |
| 47 | +- [ ] Fix `ct.yaml` timeout (600s -> reasonable value) (M6) |
| 48 | +- [ ] Address chart-releaser fork supply chain risk (M7) |
| 49 | +- [ ] Add `permissions:` block to `go-test-e2e.yml` (M8) |
| 50 | +- [ ] Pin Kind download with checksum verification (M9) |
| 51 | +- [ ] Replace `go mod tidy` mutation with verification check (M10) |
| 52 | +- [ ] Remove nested `.github/workflows/` under Go subdir (M11) |
| 53 | +- [ ] Fix `imagePullSecrets` format in v2 doc examples (M12) |
| 54 | +- [ ] Add missing `image` field in getting started example (M13) |
| 55 | +- [ ] Fix status example with non-existent fields (M14) |
| 56 | +- [ ] Fix operator CPU limit mismatch in migration docs (M15) |
| 57 | +- [ ] Fix wrong working directory in dev docs (M16) |
| 58 | +- [ ] Fix namespace inconsistency across docs (M17) |
| 59 | +- [ ] Add CRD conversion webhook config to Helm chart (M18) |
| 60 | +- [ ] Tighten Helm RBAC (remove unnecessary ConfigMap/Secret CRUD) (M19) |
| 61 | +- [ ] Add recovery when owned resources are externally deleted (M20) |
| 62 | +- [ ] Fix backward compatibility helpers in Helm templates (M21) |
| 63 | +- [ ] Remove or wire `crd.install` value (M22) |
| 64 | +- [ ] Fix `replicaCount` default (2 -> 1 for backward compat) (M23) |
| 65 | +- [ ] Fix CRD symlink for `helm package` in CI (M24) |
| 66 | +- [ ] Suppress Prometheus scrape annotations when OTel enabled (M25) |
| 67 | +- [ ] Fix E2E conversion script label selectors (M26) |
| 68 | +- [ ] Fix `os.Chdir()` global side effect in test utils (M27) |
| 69 | +- [ ] All 56 minor issues from reviewer feedback (see pr_review.md Minor Issues section) |
| 70 | +- [ ] Move Go code from `locust-k8s-operator-go/` to repo root |
| 71 | +- [ ] Remove legacy Java operator code |
| 72 | + |
| 73 | +### Out of Scope |
| 74 | + |
| 75 | +- New features beyond what reviewers identified — this is a polish pass, not feature work |
| 76 | +- Kubernetes version upgrades or dependency bumps beyond what's needed for fixes |
| 77 | +- Redesigning the operator architecture — it's already approved by reviewers |
| 78 | +- Python locust test files or example load test scripts |
| 79 | + |
| 80 | +## Context |
| 81 | + |
| 82 | +- **Origin**: Complete rewrite from Java (Micronaut + JOSDK) to Go (controller-runtime) |
| 83 | +- **PR #274**: `feat/go` -> `master`, 238 files changed, +56,350 / -959 lines |
| 84 | +- **Review team**: 8 expert reviewers (Architecture, Go, GitHub Actions, Documentation, Kubernetes, Locust, Helm, Testing) |
| 85 | +- **Aggregate verdict**: REQUEST CHANGES — architecturally sound, targeted fixes needed |
| 86 | +- **Existing test coverage**: ~150 unit, ~20 integration, ~30 E2E scenarios |
| 87 | +- **Codebase location**: Currently in `locust-k8s-operator-go/` subdirectory, moving to repo root |
| 88 | +- **Individual reviews**: Available in `reviews/` directory for detailed context per domain |
| 89 | +- **Compiled review**: `pr_review.md` has the full prioritized issue list |
| 90 | + |
| 91 | +## Constraints |
| 92 | + |
| 93 | +- **Branch**: All work on `feat/go` branch — PR must stay mergeable to `master` |
| 94 | +- **No breaking API changes**: v2 CRD API surface is established; wire existing fields, don't redesign |
| 95 | +- **Webhook policy**: Keep `failurePolicy: Fail` per maintainer decision — ensure operator HA instead |
| 96 | +- **Backward compatibility**: v1 API must continue to work through conversion webhooks |
| 97 | +- **Test pyramid**: Maintain existing test coverage; add tests for new behavior (wired fields, error handling) |
| 98 | +- **Review alignment**: Fixes must match reviewer recommendations — individual review files in `reviews/` are the source of truth |
| 99 | + |
| 100 | +## Key Decisions |
| 101 | + |
| 102 | +| Decision | Rationale | Outcome | |
| 103 | +|----------|-----------|---------| |
| 104 | +| Wire `extraArgs` and `resources` (not remove) | Users expect API fields to work; removing would be a breaking change | — Pending | |
| 105 | +| Keep `failurePolicy: Fail` for webhooks | Strict validation is more important than availability during operator downtime | — Pending | |
| 106 | +| Move Go code to repo root | Operator is the primary codebase now; subdirectory adds unnecessary complexity | — Pending | |
| 107 | +| Remove Java code | Go rewrite replaces it entirely; keeping it adds confusion | — Pending | |
| 108 | +| All review items in one pass | Comprehensive fix before merge; no tech debt carried forward | — Pending | |
| 109 | + |
| 110 | +--- |
| 111 | +*Last updated: 2026-02-06 after initialization* |
0 commit comments