-
Notifications
You must be signed in to change notification settings - Fork 833
Open
Labels
Description
ONNX import via burn-onnx is able to tackle more and more models.
As support grows it unlocks seeing how well these models perform.
In the Nvidia ecosystem TensorRT is generally the fastest inference you get to my knowledge.
Tables
| Model | Shape | GPU | Burn (ms) | TRT (ms) | TRT speedup | burn SHA | burn-onnx SHA |
|---|---|---|---|---|---|---|---|
| RF-DETR (large) | [1,3,560,560] |
RTX 4090 | 25.97 | 2.62 | ~10x | 8bfa8f75 |
1262150 |
| RetinaFace | [1,3,768,1024] |
RTX 4090 | 2.45 | 0.22 | ~11x | d63bd6a2 |
1262150 |
| FCN-ResNet50 | [1,3,520,924] |
RTX 4090 | 19.31 | 1.12 | ~17x | d63bd6a2 |
1262150 |
How to check Burn
This is more involved and hard to generalize, but here is a suggestion based on making a standalone crate.
cargo.toml
Details
[package]
name = "burn-bench"
edition = "2024"
publish = false
[dependencies]
burn = { git = "https://github.com/tracel-ai/burn", features = ["cuda"] }
burn-store = { git = "https://github.com/tracel-ai/burn" }
nvtx = "1.3" # for human-friendly report regions in profilers such as Nsight systems
[build-dependencies]
burn-onnx = { git = "https://github.com/tracel-ai/burn-onnx" }build.rs, generates the model's Rust code and makes git revisions available for report printing
Details
use burn_onnx::ModelGen;
use std::path::Path;
fn main() {
let onnx = std::env::var("ONNX_MODEL").expect("set ONNX_MODEL=/path/to/model.onnx");
let stem = Path::new(&onnx)
.file_stem()
.unwrap()
.to_str()
.unwrap()
.to_owned();
println!("cargo:rerun-if-changed={onnx}");
println!("cargo:rerun-if-changed=build.rs");
println!("cargo:rerun-if-changed=Cargo.lock");
println!("cargo:rustc-env=MODEL_STEM={stem}");
// Extract git revisions from Cargo.lock so the binary can report them.
let lock = std::fs::read_to_string("Cargo.lock").unwrap_or_default();
println!(
"cargo:rustc-env=BURN_REV={}",
git_rev(&lock, "tracel-ai/burn")
);
println!(
"cargo:rustc-env=BURN_ONNX_REV={}",
git_rev(&lock, "tracel-ai/burn-onnx")
);
ModelGen::new()
.input(&onnx)
.out_dir("model/")
.run_from_script();
}
/// Find the first `source = "git+https://.../<repo>#<rev>"` and return the short rev.
fn git_rev(lock: &str, repo: &str) -> String {
for line in lock.lines() {
let line = line.trim();
if let Some(rest) = line.strip_prefix("source = \"git+https://github.com/") {
if rest.starts_with(repo) {
if let Some(hash) = rest.rsplit_once('#') {
let rev = hash.1.trim_end_matches('"');
return rev[..rev.len().min(10)].to_owned();
}
}
}
}
"unknown".to_owned()
}main.rs
Details
use burn::prelude::*;
use std::time::{Duration, Instant};
type B = burn::backend::Cuda;
#[allow(warnings)]
mod model {
include!(concat!(
env!("OUT_DIR"),
"/model/",
env!("MODEL_STEM"),
".rs"
));
}
// Adjust these to match your model's input shape.
const BATCH: usize = 1;
const CHANNELS: usize = 3;
const HEIGHT: usize = 560;
const WIDTH: usize = 560;
const WARMUP: usize = 3;
const ITERATIONS: usize = 20;
fn main() {
let device = Default::default();
let model = model::Model::<B>::from_file(
concat!(env!("OUT_DIR"), "/model/", env!("MODEL_STEM"), ".bpk"),
&device,
);
println!("warmup={WARMUP} iterations={ITERATIONS}");
for _ in 0..WARMUP {
let input = Tensor::<B, 4>::zeros([BATCH, CHANNELS, HEIGHT, WIDTH], &device);
let _ = model.forward(input);
let _ = B::sync(&device);
}
let mut times = Vec::with_capacity(ITERATIONS);
let _iter_range = nvtx::range!("iterations");
for i in 0..ITERATIONS {
let input = Tensor::<B, 4>::zeros([BATCH, CHANNELS, HEIGHT, WIDTH], &device);
let start = Instant::now();
let _fwd_range = nvtx::range!("forward i={i}");
let _ = model.forward(input);
let _ = B::sync(&device);
drop(_fwd_range);
times.push(start.elapsed());
}
drop(_iter_range);
report(×);
}
fn report(times: &[Duration]) {
println!(
"burn={} burn-onnx={}",
env!("BURN_REV"),
env!("BURN_ONNX_REV")
);
println!(
"model={} input=[{BATCH}, {CHANNELS}, {HEIGHT}, {WIDTH}]",
env!("MODEL_STEM")
);
let mut sorted = times.to_vec();
sorted.sort();
let n = sorted.len();
let median = sorted[n / 2];
let mean = sorted.iter().sum::<Duration>() / n as u32;
let min = sorted[0];
let max = sorted[n - 1];
println!("median {median:>10.2?}");
println!("mean {mean:>10.2?}");
println!("min {min:>10.2?}");
println!("max {max:>10.2?}");
}The above would report:
warmup=3 iterations=20
burn=502910e2a7 burn-onnx=19eedf7141
model=rf_detr input=[1, 3, 560, 560]
median 25.97ms
mean 25.99ms
min 25.30ms
max 27.57ms
How to check TensorRT
trtexec --onnx=my_model.onnx --best --useCudaGraph will optimize and display bench numbers after a while.
my_model.onnxis the model you are interested in benchmarking--bestwill enable TensorRT to optimize the model using any precision (fp16, bf16, int4, ..) appropriate--useCudaGraphenables TensorRT to, once optimized, launch the entire model as a planned graph such that it doesn't need to communicate with the CPU side before done, eliminating many small launch overheads
Links
- RF-DETR (large) onnx export
- RetinaFace from tiny-face-pytorch
- FCN-ResNet50 from torchvision (pretrained
FCN_ResNet50_Weights.DEFAULT, exported withtorch.onnx.export)
Reactions are currently unavailable