Skip to content

Latest commit

 

History

History
198 lines (148 loc) · 6.06 KB

File metadata and controls

198 lines (148 loc) · 6.06 KB

Tonic Benchmark Suite

This suite profiles representative Tonic workloads and can enforce latency/performance contracts.

Inputs

  • Legacy suite manifest: benchmarks/suite.toml
  • Interpreter vs compiled suite: benchmarks/interpreter-vs-compiled-suite.toml
  • Native compiler contract suite: benchmarks/native-compiler-suite.toml
  • Rust/Go baseline data: benchmarks/native-compiler-baselines.json
  • Runner: src/bin/benchsuite.rs

Each workload defines:

  • name
  • target (interpreter default, or compiled)
  • command
    • for target = "interpreter": argv passed to tonic
    • for target = "compiled": argv passed to the compiled executable (optional)
  • source (required for target = "compiled": source/project path to compile)
  • mode (warm or cold, default warm; cold mode clears .tonic/cache)
  • threshold_p50_ms
  • threshold_p95_ms
  • optional threshold_rss_kb
  • optional weight
  • optional category

Native compiler suites can also define [performance_contract] with:

  • native SLOs (startup, runtime, rss, artifact_size, compile_latency)
  • weighted scoring (metric_weights, pass_threshold)
  • reference baseline targets (strict gate currently uses tonic_interpreter_baseline) + relative_budget_pct
  • optional additional targets (e.g. Rust/Go) in the baseline JSON for reporting/reference

Run

Build a release binary first:

cargo build --release

Run the legacy suite:

cargo run --bin benchsuite -- --bin target/release/tonic

Run interpreter vs compiled target comparison suite:

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/interpreter-vs-compiled-suite.toml \
  --json-out benchmarks/interpreter-vs-compiled-summary.json \
  --markdown-out benchmarks/interpreter-vs-compiled-summary.md

target = "compiled" workloads are prepared by compiling each source with tonic compile --out .tonic/bench-compiled/<workload-name> and then executing the produced binary directly.

Wrapper script (recommended):

./scripts/bench-interpreter-vs-compiled.sh

Run the native compiler contract suite (strict regression gate against Tonic baseline):

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/native-compiler-suite.toml \
  --target-name interpreter \
  --compile-latency-ms 2600 \
  --json-out benchmarks/native-compiler-summary.json \
  --markdown-out benchmarks/native-compiler-summary.md

Calibrate workload thresholds:

cargo run --bin benchsuite -- --bin target/release/tonic --calibrate --calibrate-margin-pct 20

Enforce performance requirements

Fail non-zero if any configured threshold or contract gate fails:

./scripts/bench-enforce.sh

Or manually:

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/native-compiler-suite.toml \
  --target-name interpreter \
  --compile-latency-ms 2600 \
  --enforce

In contract mode, enforce checks:

  • absolute workload thresholds (p50, p95, optional rss)
  • weighted overall competitiveness score
  • native SLO thresholds
  • deterministic failure reasons in JSON/Markdown reports

Native CI/Repro Gate Commands

Run the exact local gate sequence used by CI:

./scripts/native-gates.sh

Benchmark + policy gate only:

TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiler-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiler-summary.md \
./scripts/bench-native-contract-enforce.sh

./scripts/native-regression-policy.sh \
  .tonic/native-gates/native-compiler-summary.json --mode strict

TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiled-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiled-summary.md \
TONIC_BENCH_TARGET_NAME=compiled \
./scripts/bench-native-contract-enforce.sh benchmarks/native-compiled-suite.toml

./scripts/native-regression-policy.sh \
  .tonic/native-gates/native-compiled-summary.json --mode strict

Policy verdicts:

  • pass (exit 0)
  • quarantine (exit 2)
  • rollback (exit 3)

Reproducibility metadata

Runner output now persists host metadata in every report:

  • OS/arch
  • kernel
  • CPU model
  • rustc version
  • go version
  • capture timestamp

Reference baseline metadata is also included in contract report output.

p95 stabilization

The benchsuite computes p95_ms via Tukey upper-fence winsorization: before taking the 95th percentile, a single isolated sample above Q3 + 1.5 × IQR is capped at the fence value. If multiple samples exceed the fence, the distribution is left unchanged to preserve true tail-regression signal.

Why this helps: short-lived processes are susceptible to OS scheduling jitter — a single run that lands in a cache-cold or scheduler-preempted slot inflates the raw p95 without representing actual runtime performance. Capping isolated spikes at the fence eliminates that noise.

Signal preservation: for a sustained regression (the entire latency distribution shifts up), Q3 and the fence rise with it, so the elevated p95 is faithfully reported. The strict gate continues to catch real regressions; only isolated one-off outliers are damped.

p50 is unaffected: the median is computed via the plain compute_percentile path; it is inherently robust to tail outliers and needs no winsorization.

Profiling hotspots

If a workload regresses, profile the specific command:

cargo flamegraph --release --bin tonic -- run examples/parity/06-control-flow/for_multi_generator.tn

Or with perf:

perf record -g target/release/tonic run examples/parity/06-control-flow/for_multi_generator.tn
perf report

For built-in phase timing (frontend/codegen/runtime), set a profile sink:

TONIC_PROFILE_OUT=benchmarks/phase-profile.jsonl \
  target/release/tonic compile examples/parity/06-control-flow/for_multi_generator.tn --out .tonic/build/for_multi_generator

Each command appends one JSON line with command, total_ms, and per-phase elapsed_ms so regressions can be localized before full flamegraph runs.