Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions bench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
Strfry Benchmark Suite — Plan and Structure

Purpose
- Measure strfry performance across DB sizes and workloads with NIP-50 disabled and enabled.
- Produce comparable Markdown reports that include sanitized system info (no PII).

Outcomes
- Repeatable, automated runs that generate:
- Per-scenario Markdown report with metrics and system profile
- Aggregated summary Markdown table across scenarios
- Clear separation of preparation (DB build) vs execution (load + measure)

High-Level Flow
1) Prepare: build test DB for each scenario
- Generate cryptographically valid nostr events via nak
- Ingest into a fresh strfry DB (separate directory per scenario)
- Optionally pre-compress dictionaries and warm caches
2) Run: execute workload against the prepared DB
- Start strfry with scenario config (NIP-50 on/off)
- Run benchmark client to drive REQ/EVENT traffic
- Capture server logs, resource stats, and client timings
3) Report: aggregate results
- Parse logs and client output
- Emit per-scenario report and a combined summary table

Repository Layout (bench/)
- bench/
- README.md — this plan and usage
- SCENARIOS.md — curated scenarios list and guidance
- scenarios/
- small.yml — ~100k events
- medium.yml — ~1M events
- large.yml — ~10M events (example)
- scripts/
- prepare.sh — build DB(s) for scenarios
- run.sh — run benchmark(s) and gather metrics
- sysinfo.sh — collect sanitized system profile
- report.py — generate per-scenario + summary Markdown
- client/ — load generator (future; optional)
- results/
- raw/ — JSON and logs per run
- summary.md — aggregated Markdown table
- work/ — ephemeral DBs and run artifacts

External Dependencies
- nak: Event generator that produces valid signed nostr events
- Provide binary path via env `NAK_BIN` or place on PATH
- Tools: `bash`, `jq`, `awk`, `sed`, `python3` (for report.py)
- Optional: `mpstat`/`pidstat` (sysstat), `lsblk`, `lscpu` for system profiling

Scenario Format (YAML)
- name: human-readable name
- db:
- events: integer (total events to generate)
- kinds: pattern (e.g., "1,30023")
- avg_event_size: bytes (approx)
- keyword_inject_rate: float 0..1 (probability to inject keywords into generated event content)
- keywords: list of { term, weight } to control frequency distribution and enable realistic search results
- distribution: optional mix (e.g., replies/hashtags)
- server:
- search_enabled: true|false
- search_backend: lmdb|noop
- config_overrides: map of strfry config keys/values
- workload:
- duration_s: 120
- warmup_s: 15
- connections: 100
- subscriptions_per_conn: 3
- writers_per_sec: 200 # events/sec sent to relay
- req_mix:
- type: read
filter: { kinds: [1], limit: 200 }
weight: 3
- type: search
# if the client runner supports it, the search term will be sampled from db.keywords biased by weight
filter: { kinds: [1], search: "best nostr apps", limit: 100 }
weight: 1

Metrics
- Throughput: events/s sent; events/s delivered
- Latency: p50/p95/p99 for
- Initial REQ scan to EOSE
- EVENT -> OK roundtrip
- EVENT observe-to-deliver (ingest to delivery to live subscribers)
- Resource:
- strfry RSS/CPU (sampled), total system CPU/mem
- DB size on disk
- Search (when enabled):
- Search query latency p50/p95/p99
- Index catch-up state at run start/end
- Results cardinality across term classes (common vs rare), using weighted keywords

System Profile (sanitized)
- OS: kernel version, distro
- CPU: model, sockets, cores, MHz
- Memory: total
- Storage: device type (NVMe/SATA), rotational flag, filesystem
- Notes:
- Do not record hostname, users, IP addresses, or MACs

Execution
- Prepare one or more scenarios:
- `bench/scripts/prepare.sh -s scenarios/small.yml [--workers N] [--nak /path/to/nak]`
- `--workers N` controls parallel generators (defaults to min(4, nproc) or env GEN_PAR)
- Output DB at `bench/work/small/db/`
- Run benchmark(s):
- `bench/scripts/run.sh -s scenarios/small.yml --out bench/results/raw/small-$(date +%s)`
- Produces: client.json, server.log, sysinfo.json
- Report:
- `bench/scripts/report.py bench/results/raw/* > bench/results/summary.md`

Output: Markdown Table (example)

| Scenario | DB events | NIP-50 | Conns | Subs/Conn | Writers/s | EOSE p50/p95/p99 (ms) | Search p50/p95/p99 (ms) | OK p50/p95/p99 (ms) | Delivered/s | RSS max (MB) | CPU avg (%) |
|---------:|----------:|:------:|------:|----------:|----------:|-----------------------:|-------------------------:|--------------------:|-----------:|-------------:|------------:|
| small | 100k | off | 100 | 3 | 200 | 8 / 19 / 42 | — | 5 / 12 / 29 | 12,300 | 620 | 240 |
| small | 100k | on | 100 | 3 | 200 | 9 / 21 / 47 | 14 / 31 / 66 | 5 / 12 / 30 | 12,100 | 640 | 252 |

Methodology Notes
- Warm-up period excluded from measurements
- Each scenario run twice and best-of-two reported (helps mitigate jitter)
- REQ scan latency measured from REQ send to EOSE receive per sub
- Search latency measured from REQ send to first EVENT, and to EOSE
- OK roundtrip measured per EVENT
- Logs parsed for dbScan perf lines if `relay__logging__dbScanPerf = true`

PII and Safety
- Scripts must not include hostname, users, IP/MAC addresses
- Sanitize all system data before writing to artifacts

Future Extensions
- k6-like scenario runner for WebSocket; distributed load generation
- Flame graphs and CPU profiling (perf/pprof) under opt-in
- Additional NIP scenarios (negentropy sync under load)
32 changes: 32 additions & 0 deletions bench/SCENARIOS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Benchmark Scenarios

This doc describes the standard scenarios and how to create new ones.

Standard Scenarios
- small.yml
- ~100k events
- Mixed kinds: "1, 30023"
- NIP-50 on and off runs
- Connections: 100, subs/conn: 3, writers/s: 200
- medium.yml
- ~1M events
- Same workload profile as small
- large.yml (example)
- ~10M events (requires ample disk/RAM)
- Lower writers/s initially to avoid IO bottlenecks

Creating a Scenario
- Copy an existing YAML file under bench/scenarios/ and edit:
- db.events, db.kinds, db.avg_event_size
- server.search_enabled and server.search_backend
- workload parameters (duration, warmup, connections, writers)

Tips
- For NIP-50 enabled runs, ensure catch-up indexer is caught up before measuring
- For very large DBs, consider increasing warmup and run duration
- Keep maxCandidateDocs and overfetchFactor balanced to avoid excessive scoring costs
- Choose search terms strategically:
- Common terms (e.g., "nostr", "bitcoin") to stress high-DF scoring paths
- Rare terms (e.g., project-specific tokens) to test low-DF paths and index lookups
- Multi-term phrases (e.g., "best nostr apps") to test multi-token scoring
- Inject keywords into generated content via db.keywords and db.keyword_inject_rate so searches return realistic results
42 changes: 42 additions & 0 deletions bench/scenarios/medium.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: medium
db:
events: 1000000
kinds: "1, 30023"
avg_event_size: 600
keyword_inject_rate: 0.7
keywords:
- term: nostr
weight: 10
- term: bitcoin
weight: 7
- term: lightning
weight: 4
- term: federation
weight: 2
- term: "nostr developers"
weight: 2
- term: "best nostr apps"
weight: 1
server:
search_enabled: true
search_backend: lmdb
config_overrides:
relay__logging__dbScanPerf: true
workload:
duration_s: 180
warmup_s: 20
connections: 200
subscriptions_per_conn: 3
writers_per_sec: 400
req_mix:
- type: read
filter:
kinds: [1]
limit: 200
weight: 2
- type: search
filter:
kinds: [1]
search: "nostr developers"
limit: 100
weight: 1
40 changes: 40 additions & 0 deletions bench/scenarios/small.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: small
db:
events: 100000
kinds: "1, 30023"
avg_event_size: 600
keyword_inject_rate: 0.6
keywords:
- term: nostr
weight: 8
- term: bitcoin
weight: 5
- term: lightning
weight: 3
- term: federation
weight: 2
- term: "best nostr apps"
weight: 1
server:
search_enabled: false
search_backend: lmdb
config_overrides:
relay__logging__dbScanPerf: true
workload:
duration_s: 120
warmup_s: 15
connections: 100
subscriptions_per_conn: 3
writers_per_sec: 200
req_mix:
- type: read
filter:
kinds: [1]
limit: 200
weight: 3
- type: search
filter:
kinds: [1]
search: "best nostr apps"
limit: 100
weight: 1
Loading