docs: Improve benchmark explanation clarity

ntucker · ntucker · commit e6f05d727bc7 · 2026-03-23T18:56:31.000-04:00
diff --git a/docs/core/concepts/performance.md b/docs/core/concepts/performance.md
@@ -11,13 +11,12 @@ import useBaseUrl from '@docusaurus/useBaseUrl';
 </head>
 
 
-[Normalized caching](./normalization.md) with entity-level memoization enables
+In addition to the data integirty benefits, [normalized caching](./normalization.md) with entity-level memoization enables
 significant performance gains for rich interactive applications.
 
-
 ## React rendering benchmarks
 
-Full rendering pipeline (fetch through paint) measured in a real browser via Playwright.
+Full rendering pipeline (fetch through DOM commit) measured in a real browser via Playwright.
 React baseline uses useEffect + useState from the React docs.
 
 <center>
@@ -39,7 +38,10 @@ sources={{
 - **Mutation Propagation**: One store write updates every view that references the entity.
 - **Scaling**: Mutations with 10k items in the list rendered.
 
-
+These benchmarks measure the framework's impact within the larger system. That
+makes them most useful as comparisons between approaches, rather than as
+absolute measurements of an application's overall performance. We use them to
+guide library optimizations and catch performance regressions over time.
 
 ## Normalization benchmarks
 
diff --git a/examples/benchmark-react/AGENTS.md b/examples/benchmark-react/AGENTS.md
@@ -0,0 +1,75 @@
+# React Rendering Benchmark
+
+Browser benchmark comparing `@data-client/react`, TanStack Query, SWR, and plain React. Webpack build, Playwright runner. See `README.md` for methodology and running instructions.
+
+## Build & Run
+
+```bash
+yarn build:benchmark-react                           # from repo root
+yarn workspace example-benchmark-react preview &     # serve dist/ on port 5173
+cd examples/benchmark-react && yarn bench            # all libs (local) or data-client only (CI)
+```
+
+Filtering: `yarn bench --lib data-client --size small --action update`
+
+## Architecture
+
+**Runner → `window.__BENCH__` → React**: `bench/runner.ts` opens `localhost:5173/<lib>/` in Playwright, calls `BenchAPI` methods on `window.__BENCH__`, waits for `[data-bench-complete]` attribute, then collects `performance.measure` entries. This is the only runner↔app channel.
+
+**Web Worker server**: All "network" goes to an in-memory Worker (`server.worker.ts` via `server.ts` RPC) with configurable latency. Keeps fake-server work off main thread.
+
+**Shared vs library-specific**: `src/shared/` (harness, components, fixtures, resources, server) is identical across all apps. Each `src/<lib>/index.tsx` only contains data-layer wiring. Divergence from shared code breaks fairness.
+
+**Webpack multi-entry**: `webpack.config.cjs` produces four apps at `dist/<lib>/index.html`. `@shared` path alias configured in Webpack + `tsconfig.json`.
+
+## Key Design Decisions
+
+- **MutationObserver timing**: `measureMount`/`measureUpdate` in `benchHarness.tsx` use `MutationObserver` on `[data-bench-harness]`, not React lifecycle. Mount waits for `[data-bench-item]`/`[data-sorted-list]`. Update triggers on first mutation batch, or waits for `isReady` predicate on multi-phase updates.
+- **Proxy API**: `window.__BENCH__` is a `Proxy` → `apiRef.current`. `registerAPI` merges library actions with shared defaults. Methods always reflect current React state; adding new `BenchAPI` methods needs no registration boilerplate.
+- **renderLimit**: Update scenarios store 1000 items but render only 100 — isolates cache-propagation cost from DOM reconciliation.
+- **Expensive UserView**: `components.tsx` `UserView` does deliberate hash/string/date work. Libraries preserving referential equality skip it on unrelated updates; others pay per row.
+- **BenchGCPolicy**: data-client's custom `GCPolicy` — zero expiry, no interval timer. Prevents GC during timing; `sweep()` called explicitly for memory scenarios.
+
+## Scenario System
+
+`BASE_SCENARIOS` in `bench/scenarios.ts` × `LIBRARIES` via `flatMap`. `onlyLibs` restricts to specific libs. CI runs data-client hot-path only (no memory/startup/deterministic). Memory is opt-in locally (`--action memory`). Convergent timing uses single page load with adaptive iterations and early stopping on statistical convergence. Ref-stability scenarios run once (deterministic count, not ops/s).
+
+## Update Data Flow
+
+1. Runner calls `window.__BENCH__.updateEntity(1)`
+2. `measureUpdate` marks `update-start`, invokes action, `MutationObserver` detects DOM change, marks `update-end` + sets `data-bench-complete`
+3. Runner reads `performance.measure('update-duration')`
+4. **Core asymmetry**: data-client propagates via one store write; TanStack Query/SWR/baseline invalidate + re-fetch from Worker
+
+## Adding / Modifying
+
+**New scenario**: Add to `BASE_SCENARIOS` → add action to `BenchAPI` in `types.ts` if new → implement in each `src/<lib>/index.tsx` (or use `onlyLibs`) → set `preMountAction`/`mountCount` if setup needed.
+
+**New library**: `src/<lib>/index.tsx` using `registerAPI` → add to `LIBRARIES` in `scenarios.ts` → webpack entry + `HtmlWebpackPlugin` → `package.json` dep.
+
+**Shared components**: Changes to `components.tsx` or `resources.ts` shift all four libraries equally (by design).
+
+## Data Attributes
+
+| Attribute | Flow | Purpose |
+|---|---|---|
+| `data-app-ready` | harness → runner | `__BENCH__` available |
+| `data-bench-harness` | lib → runner | Container for MutationObserver |
+| `data-bench-complete` | harness → runner | Iteration finished |
+| `data-bench-timeout` | harness → runner | 30s timeout (error) |
+| `data-bench-item` | components → harness | Mount detection |
+| `data-sorted-list` | lib views → harness | Sorted-view mount detection |
+| `data-detail-view` | lib views → harness | Multi-view detection |
+| `data-issue-number` | components → runner/harness | Item identity assertion |
+| `data-title` | components → lib views | Text content assertion |
+| `data-state-list` | lib views → harness | Move-item verification |
+
+## Environment Variables
+
+| Variable | Effect |
+|---|---|
+| `CI` | data-client hot-path only; tighter convergence |
+| `REACT_COMPILER=false` | Disables React Compiler at build |
+| `BENCH_LABEL=<tag>` | Appends `[<tag>]` to result names |
+| `BENCH_PORT` | Preview port (default 5173) |
+| `BENCH_TRACE=true` | Chrome tracing for duration scenarios |
diff --git a/examples/benchmark-react/README.md b/examples/benchmark-react/README.md
@@ -1,25 +1,33 @@
 # React Rendering Benchmark
 
-Browser-based benchmark comparing `@data-client/react`, TanStack Query, and SWR on mount/update scenarios. Built with Webpack via `@anansi/webpack-config`. Results are reported to CI via `rhysd/github-action-benchmark`.
+Browser-based benchmark for `@data-client/react` measuring mount/update scenarios. Includes TanStack Query, SWR, and a plain-React baseline for reference. Built with Webpack via `@anansi/webpack-config`. Results are reported to CI via `rhysd/github-action-benchmark`.
 
 ## Comparison to Node benchmarks
 
 The repo has two benchmark suites:
 
 - **`examples/benchmark`** (Node) — Measures the JS engine only: `normalize`/`denormalize`, `Controller.setResponse`/`getResponse`, reducer throughput. No browser, no React. Use it to validate core and normalizr changes.
-- **`examples/benchmark-react`** (this app) — Measures the full React rendering pipeline: same operations driven in a real browser, with layout and paint. Use it to validate `@data-client/react` and compare against other data libraries.
+- **`examples/benchmark-react`** (this app) — Measures the full React rendering pipeline: same operations driven in a real browser, with layout and paint. Use it to validate `@data-client/react` changes; other libraries are included for reference.
 
 ## Methodology
 
 - **What we measure:** Wall-clock time from triggering an action (e.g. `init(100)` or `updateUser('user0')`) until a MutationObserver detects the expected DOM change in the benchmark container. Optionally we also record React Profiler commit duration and, with `BENCH_TRACE=true`, Chrome trace duration.
-- **Why:** Normalized caching should show wins on shared-entity updates (one store write, many components update), ref stability (fewer new object references), and derived-view memoization (`Query` schema avoids re-sorting when entities haven't changed). See [js-framework-benchmark "How the duration is measured"](https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured) for a similar timeline-based approach.
+- **Why:** Scenarios are chosen to exercise areas where caching strategies differ: shared-entity updates, referential stability, and derived-view memoization. See [js-framework-benchmark "How the duration is measured"](https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured) for a similar timeline-based approach.
 - **Statistical:** Warmup runs are discarded; we report median and 95% CI (as percentage of median). Timing scenarios (navigation and mutation) use **convergent mode**: a single page load per scenario, with warmup iterations followed by adaptive measurement iterations where each iteration produces one sample and convergence is checked inline. This eliminates page-reload overhead between samples for faster, lower-variance results. Deterministic scenarios (ref-stability) run once. Memory scenarios use a separate outer loop with a fresh page per round.
 - **No CPU throttling:** Runs at native speed with more samples for statistical significance rather than artificial slowdown. Convergent timing scenarios use 5 warmup + up to 50 measurement iterations (small) or 3 warmup + up to 40 (large). Early stopping triggers when 95% CI margin drops below the target percentage.
 
+## Comparison philosophy
+
+The primary purpose is to track data-client's own performance — catch regressions and validate improvements. Other libraries are included for context; CI runs data-client only.
+
+Scenarios are designed to isolate the data framework layer: fetching, caching, update propagation, and rendering in response to data changes. Real-world applications will have additional performance considerations (routing, animation, third-party scripts, etc.) beyond what is measured here.
+
+All implementations share presentational components, fixture data, fetch functions, and the `useBenchState` harness. They only diverge where each library's data layer requires it, using idiomatic patterns from that library's documentation. No implementation builds custom state management on top of its library.
+
 ## Scenario categories
 
-- **Hot path (in CI, data-client only)** — JS-only: init (fetch + render), update propagation, ref-stability, sorted-view. No simulated network. CI runs only `data-client` scenarios to track our own regressions; competitor libraries are benchmarked locally for comparison.
-- **With network (local comparison)** — Same shared-author update but with simulated network delay (consistent ms per "request"). Used to compare overfetching: data-client needs one store update (1 × delay); non-normalized libs typically invalidate/refetch multiple queries (N × delay). **Not run in CI** — run locally with `yarn bench` (no `CI` env) to include these.
+- **Hot path (in CI, data-client only)** — JS-only: init (fetch + render), update propagation, ref-stability, sorted-view. No simulated network. CI runs only data-client scenarios to track regressions; other libraries are benchmarked locally.
+- **With network (local)** — Same shared-author update but with simulated network delay (consistent ms per "request"). Normalized caches propagate via a single store update; query-keyed caches invalidate and refetch affected queries. **Not run in CI** — run locally with `yarn bench` (no `CI` env) to include these.
 - **Memory (local only)** — Heap delta after repeated mount/unmount cycles.
 - **Startup (local only)** — FCP and task duration via CDP `Performance.getMetrics`.
 
@@ -28,17 +36,17 @@ The repo has two benchmark suites:
 **Hot path (CI)**
 
 - **Get list** (`getlist-100`, `getlist-500`) — Time to show a ListView component that auto-fetches 100 or 500 issues from the list endpoint, then renders (unit: ops/s). Exercises the full fetch + normalization + render pipeline.
-- **Get list sorted** (`getlist-500-sorted`) — Mount 500 issues through a sorted/derived view. data-client uses `useQuery(sortedIssuesQuery)` with `Query` schema memoization; competitors use `useMemo` + sort.
+- **Get list sorted** (`getlist-500-sorted`) — Mount 500 issues through a sorted/derived view. data-client uses `useQuery(sortedIssuesQuery)` with `Query` schema memoization; other libraries use `useMemo` + sort.
 - **Update entity** (`update-entity`) — Time to update one issue and propagate to the UI (unit: ops/s).
 - **Update entity sorted** (`update-entity-sorted`) — After mounting a sorted view, update one entity. data-client's `Query` memoization avoids re-sorting when sort keys are unchanged.
-- **Update entity multi-view** (`update-entity-multi-view`) — Update one issue that appears simultaneously in a list, a detail panel, and a pinned-cards strip. Exercises cross-query entity propagation: normalized cache updates once and all three views reflect the change; non-normalized libraries must invalidate and refetch each query independently.
-- **Update user (scaling)** (`update-user`, `update-user-10000`) — Update one shared user with 1,000 or 10,000 mounted issues to test subscriber scaling. Normalized cache: one store update, all views of that user update.
-- **Ref-stability** (`ref-stability-issue-changed`, `ref-stability-user-changed`) — Count of components that received a **new** object reference after an update (unit: count; smaller is better). Normalization keeps referential equality for unchanged entities.
+- **Update entity multi-view** (`update-entity-multi-view`) — Update one issue that appears simultaneously in a list, a detail panel, and a pinned-cards strip. Normalized caches propagate via a single store write; query-keyed caches invalidate and refetch each query.
+- **Update user (scaling)** (`update-user`, `update-user-10000`) — Update one shared user with 1,000 or 10,000 mounted issues to test subscriber scaling.
+- **Ref-stability** (`ref-stability-issue-changed`, `ref-stability-user-changed`) — Count of components that received a **new** object reference after an update (unit: count; smaller is better).
 - **Invalidate and resolve** (`invalidate-and-resolve`) — data-client only; invalidates a cached endpoint and immediately re-resolves. Measures Suspense boundary round-trip.
 
 **With network (local comparison)**
 
-- **Update shared user with network** (`update-shared-user-with-network`) — Same as above with a simulated delay (e.g. 50 ms) per "request." data-client propagates via normalization (no extra request); other libs invalidate/refetch the list endpoint.
+- **Update shared user with network** (`update-shared-user-with-network`) — Same as above with a simulated delay (e.g. 50 ms) per "request."
 
 **Memory (local only)**
 
@@ -99,7 +107,7 @@ Regressions >5% on stable scenarios or >15% on volatile scenarios are worth inve
 ## Interpreting results
 
 - **Higher is better** for throughput (ops/s). **Lower is better** for ref-stability counts and heap delta (bytes).
-- **Ref-stability:** data-client's normalized cache keeps referential equality for unchanged entities, so `issueRefChanged` and `userRefChanged` should stay low. Non-normalized libs typically show higher counts because they create new object references for every cache write.
+- **Ref-stability:** `issueRefChanged` and `userRefChanged` count how many components received a new object reference. Normalized caches preserve referential equality for unchanged entities; query-keyed caches typically create new references on each cache write.
 - **React commit:** Reported as `(react commit)` suffix entries. These measure React Profiler `actualDuration` and isolate React reconciliation cost from layout/paint.
 - **Report viewer:** Toggle the "Base metrics", "React commit", and "Trace" checkboxes to filter the comparison table. Use "Load history" to compare multiple runs over time.