-
-
Notifications
You must be signed in to change notification settings - Fork 174
Description
π Feature Description
Add execution provider selection (CPU/CUDA) and multi-iteration inference benchmarking to the existing examples/onnx example.
π Feature Category
Testing Infrastructure
π‘ Motivation
The current examples/onnx example only runs inference using the default CPU execution provider with a single pass. This doesn't reflect real-world applications where backend selection and accurate latency measurement are critical. When evaluating models for deployment on different hardware (CPU vs GPU), having a quick way to compare execution providers within the same example is essential.
π Proposed Solution
- Add a
--deviceCLI flag to select the execution provider (cpuorcuda) - Add a
--num-iterationsCLI flag for multi-run benchmarking (default: 10) - Add a warm-up run before timed iterations to exclude cold-start overhead
(memory allocation, graph optimization, kernel compilation) - Print latency statistics (mean, min, max) across iterations
- Update README with usage examples for both CPU and CUDA execution
π Library Reference
- ONNX Runtime Execution Providers: https://onnxruntime.ai/docs/execution-providers/
ortcrate CUDA EP: https://docs.rs/ort/latest/ort/execution_providers/cuda/- Current
examples/onnximplementation in this repo
π Alternatives Considered
Considered creating a completely new example, but improving the existing one keeps codebase lean and provides immediate value to current users of the ONNX example.
π― Use Cases
- Comparing inference latency across CPU and CUDA backends for model deployment decisions
- Providing a starting point for users who want to run ONNX models with GPU acceleration
- Laying groundwork for TensorRT execution provider support (related: [Feature]: Add ONNXRuntime-based Vision-Language Model (VLM) inference example to kornia-vlmΒ #634,
GSoC 2026 VLM Inference project)
π Additional Context
This is a small, focused improvement to an existing example. The changes are compatible, default behavior remains CPU with a single run if no extra flags are provided.
π€ Contribution Intent
- I plan to submit a PR to implement this feature
- I'm requesting this feature but not planning to implement it