-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue: Q1 — Performance Benchmark Plan (CI Smoke + MonitorState Memory)[1]
This issue defines a minimal, repeatable benchmark plan to validate probe throughput, DB write latency, rollup performance, and the memory envelope of in-memory MonitorState at representative scales, with results published into docs and enforced via CI smoke runs.[2]
Goals
- Add an automated CI smoke benchmark that exercises probes, outage flow, and rollups on a small synthetic dataset and publishes core latency/throughput metrics.[3]
- Verify the MonitorState memory envelope at 1k/5k/10k endpoints versus spec expectations, and document the measured per-endpoint overhead and total memory footprint.[1]
Scope
- Synthetic load generation for ICMP/TCP/HTTP probes with mixed success/failure profiles, recording probe latency, success rate, and error codes.[2]
- End-to-end check-path timing from probe dispatch through DB write and state transition, plus a single rollup cycle on generated data.[4]
- Memory profiling that captures resident set size and per-endpoint MonitorState cost at multiple scales, under steady-state probing.[1]
Tasks
- Create a lightweight BenchmarkHarness (console or test host) that spins the monitoring engine with synthetic endpoints (e.g., 1k/5k/10k), configurable intervals/timeouts, and deterministic outcomes.[2]
- Capture metrics: median/P95 probe latency per type, insert latency for CheckResult, outage open/close timing, 15-minute rollup time on synthetic data, and overall CPU/memory snapshot.[4]
- Implement a memory-measure helper to report per-endpoint MonitorState footprint and total memory at each scale, asserting thresholds in a “smoke” mode (e.g., 1k endpoints) to prevent regressions.[1]
- Add a GitHub Actions job benchmarks.yml that builds, runs the smoke benchmark profile (e.g., 1k endpoints for 2–3 cycles), uploads metrics as artifacts, and fails on regression thresholds.[3]
- Publish results into docs/benchmarks.md with current version, environment notes, and key metrics; link from README for discoverability.[5]
Acceptance Criteria
- CI runs a smoke benchmark on PRs to main/develop, uploads metrics, and enforces failure on exceeded latency or memory thresholds (configurable).[3]
- MonitorState memory report shows measured per-endpoint cost and total memory at 1k/5k/10k, with at least one scale executed locally and summarized in docs.[1]
- docs/benchmarks.md includes probe latency distribution, DB write latency, rollup timing, and memory envelope tables with date/version stamps.[5]
Success Metrics (initial thresholds)
- CheckResult insert time: median < 10 ms, P95 < 25 ms on smoke dataset.[4]
- 15-minute rollup compute on smoke dataset completes within 2 minutes on CI-standard runners.[4]
- MonitorState memory: document measured per-endpoint overhead and keep within spec-driven target band; add guardrails for smoke scale to prevent regressions.[1]
References
- Probes and budgets to exercise (ICMP/TCP/HTTP): probes-spec.md.[2]
- Rollup algorithms and expected timing characteristics: rollup-spec.md.[4]
- Performance targets and product scope guardrails: scope-v1.md.[1]
- CI/quality pipeline conventions for adding a new workflow: quality-gates.md.[3]
- Surface benchmark docs in project docs index: README.md.[5]
Here’s a clean GitHub Issue draft you can directly paste into your repo under Issues → New for the Performance Benchmark Plan:
🎯 Objectives
-
Confirm CPU < 2% and RAM < 200 MB @ 100 endpoints / 10 s intervals.
-
Measure sustained
check_result_rawwrite throughput (10–50 rows/sec expected). -
Validate rollup job timing (≤ 2 min for 10k records).
-
Observe jitter effectiveness (no burst spikes).
-
Collect comparative metrics for SQLite vs PostgreSQL.
🧩 Components Under Test
| Area | Focus | Metrics |
|---|---|---|
| MonitoringBackgroundService | Scheduler loop, concurrency | Tick drift, backlog, probe/sec |
| ProbeService (ICMP/TCP/HTTP) | RTT, timeout/retry behavior | Success %, avg/min/max RTT |
| EF Core Writes | Raw inserts, batching | Inserts/sec, transaction latency |
| RollupService | 15 min & daily aggregation speed | Rows/sec, memory usage |
| Resource footprint | Host process stats | CPU %, RAM MB (avg/peak) |
📈 Metrics to Capture
-
Probes executed/sec
-
Avg/peak RTT per type
-
Insert latency (ms)
-
Rollup duration (sec)
-
CPU % (avg/peak)
-
Memory MB (avg/peak)
-
GC collections (#/min)
Export logs to /perf/results/YYYYMMDD/.