|
| 1 | +# F1 Load Testing Library — Codebase Review |
| 2 | + |
| 3 | +**Review date:** March 11, 2025 |
| 4 | +**Scope:** v3 branch, pre-release preparation |
| 5 | +**Focus:** Improvements suitable for v3 vs deferred |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1. Package Structure |
| 10 | + |
| 11 | +### Current State |
| 12 | + |
| 13 | +| Location | Purpose | |
| 14 | +|----------|---------| |
| 15 | +| `pkg/f1` | Core API: `New`, `AddScenario`, `Run`, `Execute`, root cmd, profiling, completions | |
| 16 | +| `pkg/f1/f1testing` | `T`, `ScenarioFn`, `RunFn`, test lifecycle helpers | |
| 17 | +| `pkg/f1/scenarios` | Scenario registry, `Scenario` struct, `scenarios` CLI command | |
| 18 | +| `internal/run` | Run orchestration, test runner, result, views, scenario logger | |
| 19 | +| `internal/workers` | Pool manager, trigger pool, continuous pool, active scenario | |
| 20 | +| `internal/trigger` | Trigger builders (constant, staged, ramp, gaussian, users, file) | |
| 21 | +| `internal/trigger/api` | API types, iteration worker, distribution | |
| 22 | +| `internal/options` | RunOptions functional options | |
| 23 | +| `internal/log` | slog config, attrs, logger factory | |
| 24 | +| `internal/ui` | Output, printer, messages | |
| 25 | +| `internal/metrics` | Prometheus metrics | |
| 26 | +| `internal/envsettings` | Environment-based settings | |
| 27 | +| `internal/raterun` | Progress runner with adaptive intervals | |
| 28 | +| `internal/progress` | Stats, snapshot aggregation | |
| 29 | +| `internal/xcontext` | `Detach` for teardown context | |
| 30 | +| `internal/xtime` | `NanoTime` for timing | |
| 31 | +| `internal/triggerflags` | Shared flag constants | |
| 32 | +| `benchcmd` | Benchmark binary (main package) | |
| 33 | + |
| 34 | +**Organization:** Clear separation between public API (`pkg/`) and internal implementation. `internal/` is well-scoped; no leakage of internal types into public API. |
| 35 | + |
| 36 | +### Suggested Improvements |
| 37 | + |
| 38 | +| Priority | Issue | Rationale | |
| 39 | +|----------|-------|-----------| |
| 40 | +| **Medium** | `pkg/f1/scenarios` has no tests (0% coverage) | Scenario registry is used by all runs; add unit tests for `AddScenario`, `GetScenario`, `GetScenarioNames` | |
| 41 | +| **Low** | Consider `internal/triggerflags` → `internal/trigger/flags` | Co-locate with trigger package; minor refactor | |
| 42 | +| **Low** | `benchcmd` in repo root | Could move to `cmd/bench` for consistency with Go conventions | |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +## 2. API Design |
| 47 | + |
| 48 | +### Current State |
| 49 | + |
| 50 | +- **F1:** `New(opts ...Option)` with unified configuration: `WithSettings(Settings)`, `WithLogger`, `WithStaticMetrics`, and fine-grained overrides (`WithPrometheusPushGateway`, `WithPrometheusNamespace`, `WithPrometheusLabelID`, `WithLogFilePath`, `WithLogLevel(slog.Level)`, `WithLogFormat(LogFormat)`); `AddScenario(name, fn, ...ScenarioOption)`; `Run(ctx, args) error`; `Execute()`. |
| 51 | +- **Public types:** `Settings`, `PrometheusSettings`, `LoggingSettings`, `LogFormat` (strongly typed, no string-key API). `DefaultSettings()` loads from env vars. |
| 52 | +- **Scenarios:** `WithDescription`, `WithParameter`; `Scenario.RunFn` populated by framework during setup. |
| 53 | +- **f1testing.T:** `NewTWithOptions(scenarioName, ...TOption)`; `WithLogger`, `WithIteration`, `WithVUID`; `Error`/`Fatal`/`Log` with `args ...any` (testing.T compatible). |
| 54 | +- **Options:** All consolidated in `options.go`; types in `settings.go`. Precedence: programmatic options > env vars > defaults. `WithLogger` takes precedence over log level/format options. |
| 55 | +- **Design decisions:** No `Config` struct (bundles unrelated concerns), no `SettingsProvider` (lazy eval adds no benefit). `WithSettings(Settings{})` replaces `WithoutEnvSettings()` — explicit, no order-dependence. Typed logging APIs (`slog.Level`, `LogFormat`) instead of strings. |
| 56 | + |
| 57 | +### Suggested Improvements |
| 58 | + |
| 59 | +| Priority | Issue | Rationale | |
| 60 | +|----------|-------|-----------| |
| 61 | +| **Medium** | `AddScenario` returns `*F1` but `Run` is not called on `*F1` in chain | Fluent chaining is inconsistent; either document `AddScenario(...).Execute()` or drop return | |
| 62 | +| **Low** | `Scenario.RunFn` is populated by framework; not obvious from struct | V3_PLAN §8: Document that `RunFn` is set during setup; consider making it private or renaming to clarify | |
| 63 | +| **Low** | `GetScenarios` returns `*scenarios.Scenarios`; internal type exposed | Consider `ScenarioNames() []string` or keep `GetScenarios` for `scenarios` command | |
| 64 | +| **Deferred** | Scenario as interface | V3_PLAN: optional; current struct is sufficient | |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## 3. Error Handling |
| 69 | + |
| 70 | +### Current State |
| 71 | + |
| 72 | +- **Wrapping:** `fmt.Errorf("... %w", err)` used consistently for wrapped errors. |
| 73 | +- **Return paths:** `Run(ctx, args) error` returns; `Execute()` exits on error. |
| 74 | +- **User-facing:** `result.Error()` for explicit errors; `result.Failed()` for failure conditions; `"load test failed - see log for details"` when failed but no explicit error. |
| 75 | +- **Result errors:** `errors.Join` used in `Result.Error()` for multiple errors. |
| 76 | + |
| 77 | +### Bug |
| 78 | + |
| 79 | +**File:** `pkg/f1/f1.go`, lines 166–174 |
| 80 | + |
| 81 | +```go |
| 82 | +err = rootCmd.ExecuteContext(execCtx) |
| 83 | +profilingErr := f.profiling.stop() |
| 84 | +errs := errors.Join(err, profilingErr) |
| 85 | + |
| 86 | +if errs != nil { |
| 87 | + return fmt.Errorf("command execution: %w", err) // BUG: should be errs |
| 88 | +} |
| 89 | +``` |
| 90 | + |
| 91 | +When both `err` and `profilingErr` are non-nil, the returned error wraps only `err`, losing profiling errors. **Fix:** use `errs` instead of `err`. |
| 92 | + |
| 93 | +### Suggested Improvements |
| 94 | + |
| 95 | +| Priority | Issue | Rationale | |
| 96 | +|----------|-------|-----------| |
| 97 | +| **High** | Fix `err` → `errs` in `f1.go` | Bug: profiling errors are dropped | |
| 98 | +| **Medium** | `run_cmd.go:211`: `errors.New("load test failed - see log for details")` loses context | Consider wrapping with `err` if `result.Error() != nil` | |
| 99 | +| **Low** | `scenario_logger.Open` returns `""` on error but displays ErrorMessage | Silently falls back; consider returning error or documenting behavior | |
| 100 | +| **Low** | `setEnvs`/`unsetEnvs` in file trigger: Display error but continue | May be intentional; document or consider failing fast | |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## 4. Logging |
| 105 | + |
| 106 | +### Current State |
| 107 | + |
| 108 | +- **slog:** All logging uses `log/slog`; handlers support text and JSON. |
| 109 | +- **Structured attrs:** `log.IterationAttr`, `log.VUIDAttr`, `log.ScenarioAttr`, `log.ErrorAttr`, `log.StackTraceAttr`, `log.IterationStatsGroup`. |
| 110 | +- **Levels:** `Error` for failures; `Info` for progress; `Warn` for warnings; `Debug` via `F1_LOG_LEVEL`. |
| 111 | +- **T:** `Error`/`Fatal` → ERROR; `Log`/`Logf` → INFO. |
| 112 | +- **Output:** Interactive vs non-interactive; `Outputable` interface with `Print`/`Log`. |
| 113 | + |
| 114 | +### Suggested Improvements |
| 115 | + |
| 116 | +| Priority | Issue | Rationale | |
| 117 | +|----------|-------|-----------| |
| 118 | +| **Low** | `T.Error`/`T.Fatal` use `fmt.Sprintf`/`fmt.Sprintln` then string | Slog prefers key-value; consider `logger.Error("msg", slog.Any("args", args))` for complex cases | |
| 119 | +| **Low** | No trace-level logging in core | Could add for debugging iteration lifecycle | |
| 120 | +| **Deferred** | Log sampling for high-volume runs | Avoid log spam at high iteration rates | |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## 5. Testing |
| 125 | + |
| 126 | +### Current State |
| 127 | + |
| 128 | +| Package | Coverage | Notes | |
| 129 | +|---------|----------|-------| |
| 130 | +| `internal/run/views` | 95.7% | Strong | |
| 131 | +| `internal/raterun` | 88.9% | Strong | |
| 132 | +| `pkg/f1/f1testing` | 82.5% | Good | |
| 133 | +| `internal/metrics` | 72.7% | Good | |
| 134 | +| `internal/trigger/api` | 67.1% | Adequate | |
| 135 | +| `pkg/f1` | 65.0% | Adequate | |
| 136 | +| `internal/run` | 58.7% | Adequate | |
| 137 | +| `internal/trigger/file` | 66.5% | Adequate | |
| 138 | +| `internal/trigger/staged` | 47.6% | Low | |
| 139 | +| `internal/trigger/gaussian` | 37.7% | Low | |
| 140 | +| `internal/xtime` | 100% | Excellent | |
| 141 | + |
| 142 | +**0% coverage:** `envsettings`, `gaussian`, `log`, `logutils`, `options`, `progress`, `trigger`, `constant`, `ramp`, `rate`, `users`, `triggerflags`, `ui`, `workers`, `xcontext`, `benchcmd`. ~~`scenarios`~~ — now has unit tests. |
| 143 | + |
| 144 | +**Patterns:** Stage-style tests (`RunTestStage`, `newF1Stage`); `given/when/then`; integration tests with fake Prometheus; signal tests for SIGTERM/SIGINT. |
| 145 | + |
| 146 | +### Suggested Improvements |
| 147 | + |
| 148 | +| Priority | Issue | Rationale | |
| 149 | +|----------|-------|-----------| |
| 150 | +| ~~**Medium**~~ | ~~Add unit tests for `scenarios`~~ Done | Low-hanging fruit; parsing logic is critical | |
| 151 | +| **Medium** | Add unit tests for `trigger/rate`, `trigger/constant` | Low-hanging fruit; parsing logic is critical | |
| 152 | +| **Medium** | Test `workers` package | Pool logic is core; currently only integration tests | |
| 153 | +| **Low** | Test `xcontext.Detach` | Simple but important for teardown | |
| 154 | +| **Low** | Test `log` package | Config, attrs | |
| 155 | +| **Deferred** | `envsettings`, `options` | Low risk; can defer | |
| 156 | + |
| 157 | +--- |
| 158 | + |
| 159 | +## 6. Documentation |
| 160 | + |
| 161 | +### Current State |
| 162 | + |
| 163 | +- **doc.go:** `pkg/f1` and `pkg/f1/f1testing` have package docs with examples. |
| 164 | +- **README:** Usage, trigger modes, environment variables, output format. |
| 165 | +- **V3_PLAN.md:** Migration checklist, open issues. |
| 166 | +- **MIGRATION.md:** v2 → v3 migration guide. |
| 167 | + |
| 168 | +### Gaps (from V3_PLAN §11) |
| 169 | + |
| 170 | +| Item | Status | |
| 171 | +|------|--------| |
| 172 | +| Installation instructions | Missing | |
| 173 | +| Prometheus push gateway | Done (README: env vars + programmatic options + precedence) | |
| 174 | +| Output mechanisms (Prometheus, logs) | Partial | |
| 175 | +| Grafana dashboard example | Missing | |
| 176 | +| Screenshots/screencast | Missing | |
| 177 | +| `--wait-for-completion-timeout` and run flags | Not documented | |
| 178 | + |
| 179 | +### Suggested Improvements |
| 180 | + |
| 181 | +| Priority | Issue | Rationale | |
| 182 | +|----------|-------|-----------| |
| 183 | +| **High** | Add installation (build from source) | #245 | |
| 184 | +| ~~**Medium**~~ | ~~Document Prometheus env vars and programmatic options~~ Done | #4 | |
| 185 | +| **Medium** | Document run flags (e.g. `--wait-for-completion-timeout`) | #297 | |
| 186 | +| **Low** | Add Grafana dashboard example | #149 | |
| 187 | +| **Low** | Screenshots/screencast | #16 | |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## 7. CLI |
| 192 | + |
| 193 | +### Current State |
| 194 | + |
| 195 | +- **Cobra:** Root cmd with `run`, `scenarios`, `completions`; run has trigger subcommands. |
| 196 | +- **Flags:** Grouped (Output, Duration & limits, Concurrency, Failure handling, Shutdown, Trigger options); kebab-case for long flags. |
| 197 | +- **Help:** `groupedFlagUsages` custom template; `Long` for short-flag meanings. |
| 198 | + |
| 199 | +### Suggested Improvements |
| 200 | + |
| 201 | +| Priority | Issue | Rationale | |
| 202 | +|----------|-------|-----------| |
| 203 | +| ~~**Medium**~~ | ~~`scenarios ls` has no `Short`/`Long`~~ Done | Help text could describe purpose | |
| 204 | +| ~~**Low**~~ | ~~`scenarios ls` prints to stdout~~ Done | Now uses `cmd.OutOrStdout()` for testability; unit tests verify ls output ordering and help text | |
| 205 | +| **Deferred** | `--interactive` for verbose output | #280 | |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## 8. Concurrency |
| 210 | + |
| 211 | +### Current State |
| 212 | + |
| 213 | +- **Context:** `ctx` propagated from `Run` → `newSignalContext` → `execCtx` → `run.Do` → workers and triggers. |
| 214 | +- **Workers:** `TriggerPool` (rate-based) and `ContinuousPool` (users mode); both use `atomic.Bool` for stop to avoid mutex in hot path. |
| 215 | +- **Teardown:** `xcontext.Detach(ctx)` for teardown so it runs even when parent is cancelled. |
| 216 | +- **WaitForCompletion:** `poolManager.WaitForCompletion()` with timeout; `WaitGroup` for worker shutdown. |
| 217 | + |
| 218 | +### Suggested Improvements |
| 219 | + |
| 220 | +| Priority | Issue | Rationale | |
| 221 | +|----------|-------|-----------| |
| 222 | +| **Low** | `ContinuousPool` busy-loop on `!stopWorkers.Load()` | Acceptable; could add `time.Sleep` for very short iterations to reduce CPU | |
| 223 | +| **Deferred** | Worker pool size tuning | No evidence of issues; monitor | |
| 224 | +| **Deferred** | Context cancellation propagation in file trigger stages | `stageCtx` used; `ctx.Done()` checked between stages | |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## 9. Code Quality |
| 229 | + |
| 230 | +### Duplication |
| 231 | + |
| 232 | +| Location | Pattern | Suggestion | |
| 233 | +|----------|---------|------------| |
| 234 | +| Trigger constructors | `flags.GetString(flagX)` + `fmt.Errorf("getting flag: %w", err)` repeated | Extract helper `getFlagString(flags, name) (string, error)` | |
| 235 | +| Trigger constructors | Similar flag parsing logic | Shared `parseFlags` helper for common flags | |
| 236 | +| `run_stage_test.go` | `build_trigger` switch | Acceptable; trigger-specific setup | |
| 237 | + |
| 238 | +### Complexity |
| 239 | + |
| 240 | +- **run_cmd.go `runCmdExecute`:** Long; ~120 lines. Could extract flag parsing into `parseRunFlags` and `buildRunOptions`. |
| 241 | +- **Result.Failed():** Multiple conditions; consider extracting to helper for readability. |
| 242 | + |
| 243 | +### Naming |
| 244 | + |
| 245 | +- **Minor:** `outputer` vs `output` — `outputer` used in `test_runner.go`; `output` elsewhere. Prefer `output` consistently. |
| 246 | +- **Minor:** `stages_worker.go` `setEnvs`/`unsetEnvs` typo: "unable set" → "unable to set" |
| 247 | + |
| 248 | +### Suggested Improvements |
| 249 | + |
| 250 | +| Priority | Issue | Rationale | |
| 251 | +|----------|-------|-----------| |
| 252 | +| **Medium** | Extract flag-getting helper | Reduce ~30+ repetitive `fmt.Errorf("getting flag: %w", err)` | |
| 253 | +| **Low** | Fix "unable set" / "unable unset" typos | User-facing messages | |
| 254 | +| **Low** | Extract `runCmdExecute` flag parsing | Improve readability | |
| 255 | +| **Deferred** | Refactor `Result.Failed()` | Low priority | |
| 256 | + |
| 257 | +--- |
| 258 | + |
| 259 | +## 10. Missing Features / Gaps vs Plan |
| 260 | + |
| 261 | +### V3_PLAN Checklist |
| 262 | + |
| 263 | +| Item | Status | |
| 264 | +|------|--------| |
| 265 | +| Module path v3 | Done | |
| 266 | +| Remove deprecated APIs | Done | |
| 267 | +| Rename testing → f1testing | Done | |
| 268 | +| Context in scenario/run | Done | |
| 269 | +| Builder-style API | Done | |
| 270 | +| T Error/Fatal/Log compatible | Done | |
| 271 | +| Run(ctx, args) error | Done | |
| 272 | +| Options structs | Done | |
| 273 | +| Unified configuration API | Done (Settings types, WithSettings, typed WithLogLevel/WithLogFormat) | |
| 274 | +| Fluent chaining | Partial (AddScenario returns *F1) | |
| 275 | +| Document Scenario.RunFn | Not done | |
| 276 | +| Document env vars | Not done | |
| 277 | +| Custom flags for scenarios | Deferred (#246) | |
| 278 | +| Programmatic runner | Descoped | |
| 279 | +| Documentation | In progress | |
| 280 | +| Install instructions | Not done | |
| 281 | +| Grafana dashboard | Not done | |
| 282 | +| `--interactive` | Not done (#280) | |
| 283 | +| Staged chart transaction count | Not done (#38) | |
| 284 | + |
| 285 | +### Suggested Improvements |
| 286 | + |
| 287 | +| Priority | Issue | Rationale | |
| 288 | +|----------|-------|-----------| |
| 289 | +| **High** | Complete documentation items | Blocking for v3 release | |
| 290 | +| **Medium** | Document Scenario.RunFn population | V3_PLAN §8 | |
| 291 | +| **Medium** | Document env vars for config | #246 | |
| 292 | +| **Low** | Staged chart transaction count | #38 | |
| 293 | +| **Deferred** | Custom scenario flags | #246 | |
| 294 | +| **Deferred** | `--interactive` | #280 | |
| 295 | + |
| 296 | +--- |
| 297 | + |
| 298 | +## Summary: Prioritized Action Items |
| 299 | + |
| 300 | +### Do in v3 (before release) |
| 301 | + |
| 302 | +1. **Fix bug:** `f1.go:173` — use `errs` instead of `err` to preserve joined errors. |
| 303 | +2. **Documentation:** Installation, Prometheus env vars, run flags (--wait-for-completion-timeout). |
| 304 | +3. **Typos:** "unable set" → "unable to set", "unable unset" → "unable to unset" in `stages_worker.go`. |
| 305 | + |
| 306 | +### Consider for v3 |
| 307 | + |
| 308 | +4. ~~**Tests:** Add unit tests for `scenarios`.~~ Done |
| 309 | +5. **Tests:** Add unit tests for `trigger/rate`, `trigger/constant`. |
| 310 | +6. **Documentation:** Document Scenario.RunFn population, output mechanisms. |
| 311 | +7. **Refactor:** Extract flag-getting helper to reduce duplication. |
| 312 | +8. ~~**CLI:** Add `Short`/`Long` for `scenarios ls`.~~ Done |
| 313 | + |
| 314 | +### Defer to v3.1 or later |
| 315 | + |
| 316 | +8. **Custom scenario flags** (#246). |
| 317 | +9. **`--interactive` verbose output** (#280). |
| 318 | +10. **Staged chart transaction count** (#38). |
| 319 | +11. **Grafana dashboard example** (#149). |
| 320 | +12. **T Helper/Skip/TempDir** (V3_PLAN §5a). |
| 321 | +13. **Programmatic runner** (descoped). |
| 322 | + |
| 323 | +--- |
| 324 | + |
| 325 | +## Appendix: File Structure |
| 326 | + |
| 327 | +``` |
| 328 | +pkg/f1/ |
| 329 | + f1.go, settings.go, options.go, root_cmd.go, doc.go, profiling.go, completions.go |
| 330 | + f1testing/ |
| 331 | + api.go, t.go, doc.go |
| 332 | + scenarios/ |
| 333 | + scenario_builder.go, scenarios_cmd.go |
| 334 | +
|
| 335 | +internal/ |
| 336 | + run/ # run orchestration |
| 337 | + workers/ # pools, active scenario |
| 338 | + trigger/ # constant, staged, ramp, gaussian, users, file |
| 339 | + options/ # RunOptions |
| 340 | + log/ # slog config, attrs |
| 341 | + ui/ # output, printer |
| 342 | + metrics/ # Prometheus |
| 343 | + envsettings/ # env-based config |
| 344 | + raterun/ # progress runner |
| 345 | + progress/ # stats |
| 346 | + xcontext/ # Detach |
| 347 | + xtime/ # NanoTime |
| 348 | + triggerflags/ # flag constants |
| 349 | +
|
| 350 | +benchcmd/ # benchmark main |
| 351 | +``` |
0 commit comments