This suite profiles representative Tonic workloads and can enforce latency/performance contracts.
- Legacy suite manifest:
benchmarks/suite.toml - Interpreter vs compiled suite:
benchmarks/interpreter-vs-compiled-suite.toml - Native compiler contract suite:
benchmarks/native-compiler-suite.toml - Rust/Go baseline data:
benchmarks/native-compiler-baselines.json - Runner:
src/bin/benchsuite.rs
Each workload defines:
nametarget(interpreterdefault, orcompiled)command- for
target = "interpreter": argv passed totonic - for
target = "compiled": argv passed to the compiled executable (optional)
- for
source(required fortarget = "compiled": source/project path to compile)mode(warmorcold, defaultwarm; cold mode clears.tonic/cache)threshold_p50_msthreshold_p95_ms- optional
threshold_rss_kb - optional
weight - optional
category
Native compiler suites can also define [performance_contract] with:
- native SLOs (
startup,runtime,rss,artifact_size,compile_latency) - weighted scoring (
metric_weights,pass_threshold) - reference baseline targets (strict gate currently uses
tonic_interpreter_baseline) +relative_budget_pct - optional additional targets (e.g. Rust/Go) in the baseline JSON for reporting/reference
Build a release binary first:
cargo build --releaseRun the legacy suite:
cargo run --bin benchsuite -- --bin target/release/tonicRun interpreter vs compiled target comparison suite:
cargo run --bin benchsuite -- \
--bin target/release/tonic \
--manifest benchmarks/interpreter-vs-compiled-suite.toml \
--json-out benchmarks/interpreter-vs-compiled-summary.json \
--markdown-out benchmarks/interpreter-vs-compiled-summary.mdtarget = "compiled" workloads are prepared by compiling each source with
tonic compile --out .tonic/bench-compiled/<workload-name> and then
executing the produced binary directly.
Wrapper script (recommended):
./scripts/bench-interpreter-vs-compiled.shRun the native compiler contract suite (strict regression gate against Tonic baseline):
cargo run --bin benchsuite -- \
--bin target/release/tonic \
--manifest benchmarks/native-compiler-suite.toml \
--target-name interpreter \
--compile-latency-ms 2600 \
--json-out benchmarks/native-compiler-summary.json \
--markdown-out benchmarks/native-compiler-summary.mdCalibrate workload thresholds:
cargo run --bin benchsuite -- --bin target/release/tonic --calibrate --calibrate-margin-pct 20Fail non-zero if any configured threshold or contract gate fails:
./scripts/bench-enforce.shOr manually:
cargo run --bin benchsuite -- \
--bin target/release/tonic \
--manifest benchmarks/native-compiler-suite.toml \
--target-name interpreter \
--compile-latency-ms 2600 \
--enforceIn contract mode, enforce checks:
- absolute workload thresholds (
p50,p95, optionalrss) - weighted overall competitiveness score
- native SLO thresholds
- deterministic failure reasons in JSON/Markdown reports
Run the exact local gate sequence used by CI:
./scripts/native-gates.shBenchmark + policy gate only:
TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiler-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiler-summary.md \
./scripts/bench-native-contract-enforce.sh
./scripts/native-regression-policy.sh \
.tonic/native-gates/native-compiler-summary.json --mode strict
TONIC_BENCH_ENFORCE=0 \
TONIC_BENCH_JSON_OUT=.tonic/native-gates/native-compiled-summary.json \
TONIC_BENCH_MARKDOWN_OUT=.tonic/native-gates/native-compiled-summary.md \
TONIC_BENCH_TARGET_NAME=compiled \
./scripts/bench-native-contract-enforce.sh benchmarks/native-compiled-suite.toml
./scripts/native-regression-policy.sh \
.tonic/native-gates/native-compiled-summary.json --mode strictPolicy verdicts:
pass(exit 0)quarantine(exit 2)rollback(exit 3)
Runner output now persists host metadata in every report:
- OS/arch
- kernel
- CPU model
- rustc version
- go version
- capture timestamp
Reference baseline metadata is also included in contract report output.
The benchsuite computes p95_ms via Tukey upper-fence winsorization: before
taking the 95th percentile, a single isolated sample above Q3 + 1.5 × IQR
is capped at the fence value. If multiple samples exceed the fence, the
distribution is left unchanged to preserve true tail-regression signal.
Why this helps: short-lived processes are susceptible to OS scheduling jitter — a single run that lands in a cache-cold or scheduler-preempted slot inflates the raw p95 without representing actual runtime performance. Capping isolated spikes at the fence eliminates that noise.
Signal preservation: for a sustained regression (the entire latency distribution shifts up), Q3 and the fence rise with it, so the elevated p95 is faithfully reported. The strict gate continues to catch real regressions; only isolated one-off outliers are damped.
p50 is unaffected: the median is computed via the plain compute_percentile
path; it is inherently robust to tail outliers and needs no winsorization.
If a workload regresses, profile the specific command:
cargo flamegraph --release --bin tonic -- run examples/parity/06-control-flow/for_multi_generator.tnOr with perf:
perf record -g target/release/tonic run examples/parity/06-control-flow/for_multi_generator.tn
perf reportFor built-in phase timing (frontend/codegen/runtime), set a profile sink:
TONIC_PROFILE_OUT=benchmarks/phase-profile.jsonl \
target/release/tonic compile examples/parity/06-control-flow/for_multi_generator.tn --out .tonic/build/for_multi_generatorEach command appends one JSON line with command, total_ms, and per-phase elapsed_ms so regressions can be localized before full flamegraph runs.