Tonic Benchmark Suite

This suite profiles representative Tonic workloads and can enforce latency/performance contracts.

Inputs

Legacy suite manifest: benchmarks/suite.toml
Interpreter vs compiled suite: benchmarks/interpreter-vs-compiled-suite.toml
Native compiler contract suite: benchmarks/native-compiler-suite.toml
Rust/Go baseline data: benchmarks/native-compiler-baselines.json
Runner: src/bin/benchsuite.rs

Each workload defines:

name
target (interpreter default, or compiled)
command
- for target = "interpreter": argv passed to tonic
- for target = "compiled": argv passed to the compiled executable (optional)
source (required for target = "compiled": source/project path to compile)
mode (warm or cold, default warm; cold mode clears .tonic/cache)
threshold_p50_ms
threshold_p95_ms
optional threshold_rss_kb
optional weight
optional category

Native compiler suites can also define [performance_contract] with:

native SLOs (startup, runtime, rss, artifact_size, compile_latency)
weighted scoring (metric_weights, pass_threshold)
reference baseline targets (strict gate currently uses tonic_interpreter_baseline) + relative_budget_pct
optional additional targets (e.g. Rust/Go) in the baseline JSON for reporting/reference

Run

Build a release binary first:

cargo build --release

Run the legacy suite:

cargo run --bin benchsuite -- --bin target/release/tonic

Run interpreter vs compiled target comparison suite:

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/interpreter-vs-compiled-suite.toml \
  --json-out benchmarks/interpreter-vs-compiled-summary.json \
  --markdown-out benchmarks/interpreter-vs-compiled-summary.md

target = "compiled" workloads are prepared by compiling each source with tonic compile --out .tonic/bench-compiled/<workload-name> and then executing the produced binary directly.

Wrapper script (recommended):

./scripts/bench-interpreter-vs-compiled.sh

Run the native compiler contract suite (strict regression gate against Tonic baseline):

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/native-compiler-suite.toml \
  --target-name interpreter \
  --compile-latency-ms 2600 \
  --json-out benchmarks/native-compiler-summary.json \
  --markdown-out benchmarks/native-compiler-summary.md

Calibrate workload thresholds:

cargo run --bin benchsuite -- --bin target/release/tonic --calibrate --calibrate-margin-pct 20

Enforce performance requirements

Fail non-zero if any configured threshold or contract gate fails:

./scripts/bench-enforce.sh

Or manually:

cargo run --bin benchsuite -- \
  --bin target/release/tonic \
  --manifest benchmarks/native-compiler-suite.toml \
  --target-name interpreter \
  --compile-latency-ms 2600 \
  --enforce

In contract mode, enforce checks:

absolute workload thresholds (p50, p95, optional rss)
weighted overall competitiveness score
native SLO thresholds
deterministic failure reasons in JSON/Markdown reports

Native CI/Repro Gate Commands

Run the exact local gate sequence used by CI:

./scripts/native-gates.sh

Benchmark + policy gate only:

TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiler-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiler-summary.md \
./scripts/bench-native-contract-enforce.sh

./scripts/native-regression-policy.sh \
  .tonic/native-gates/native-compiler-summary.json --mode strict

TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiled-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiled-summary.md \
TONIC_BENCH_TARGET_NAME=compiled \
./scripts/bench-native-contract-enforce.sh benchmarks/native-compiled-suite.toml

./scripts/native-regression-policy.sh \
  .tonic/native-gates/native-compiled-summary.json --mode strict

Policy verdicts:

pass (exit 0)
quarantine (exit 2)
rollback (exit 3)

Reproducibility metadata

Runner output now persists host metadata in every report:

OS/arch
kernel
CPU model
rustc version
go version
capture timestamp

Reference baseline metadata is also included in contract report output.

p95 stabilization

The benchsuite computes p95_ms via Tukey upper-fence winsorization: before taking the 95th percentile, a single isolated sample above Q3 + 1.5 × IQR is capped at the fence value. If multiple samples exceed the fence, the distribution is left unchanged to preserve true tail-regression signal.

Why this helps: short-lived processes are susceptible to OS scheduling jitter — a single run that lands in a cache-cold or scheduler-preempted slot inflates the raw p95 without representing actual runtime performance. Capping isolated spikes at the fence eliminates that noise.

Signal preservation: for a sustained regression (the entire latency distribution shifts up), Q3 and the fence rise with it, so the elevated p95 is faithfully reported. The strict gate continues to catch real regressions; only isolated one-off outliers are damped.

p50 is unaffected: the median is computed via the plain compute_percentile path; it is inherently robust to tail outliers and needs no winsorization.

Profiling hotspots

If a workload regresses, profile the specific command:

cargo flamegraph --release --bin tonic -- run examples/parity/06-control-flow/for_multi_generator.tn

Or with perf:

perf record -g target/release/tonic run examples/parity/06-control-flow/for_multi_generator.tn
perf report

For built-in phase timing (frontend/codegen/runtime), set a profile sink:

TONIC_PROFILE_OUT=benchmarks/phase-profile.jsonl \
  target/release/tonic compile examples/parity/06-control-flow/for_multi_generator.tn --out .tonic/build/for_multi_generator

Each command appends one JSON line with command, total_ms, and per-phase elapsed_ms so regressions can be localized before full flamegraph runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tonic Benchmark Suite

Inputs

Run

Enforce performance requirements

Native CI/Repro Gate Commands

Reproducibility metadata

p95 stabilization

Profiling hotspots

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Tonic Benchmark Suite

Inputs

Run

Enforce performance requirements

Native CI/Repro Gate Commands

Reproducibility metadata

p95 stabilization

Profiling hotspots