[Feature]: Add execution provider selection and inference benchmarking

### 🚀 Feature Description

Add execution provider selection (CPU/CUDA) and multi-iteration inference benchmarking to the existing `examples/onnx` example.

### 📂 Feature Category

Testing Infrastructure

### 💡 Motivation

The current `examples/onnx` example only runs inference using the default CPU execution provider with a single pass. This doesn't reflect real-world applications where backend selection and accurate latency measurement are critical. When evaluating models for deployment on different hardware (CPU vs GPU), having a quick way to compare execution providers within the same example is essential.

### 💭 Proposed Solution

- Add a `--device` CLI flag to select the execution provider (`cpu` or `cuda`)
- Add a `--num-iterations` CLI flag for multi-run benchmarking (default: 10)
- Add a warm-up run before timed iterations to exclude cold-start overhead 
  (memory allocation, graph optimization, kernel compilation)
- Print latency statistics (mean, min, max) across iterations
- Update README with usage examples for both CPU and CUDA execution

### 📚 Library Reference

- ONNX Runtime Execution Providers: https://onnxruntime.ai/docs/execution-providers/
- `ort` crate CUDA EP: https://docs.rs/ort/latest/ort/execution_providers/cuda/
- Current `examples/onnx` implementation in this repo

### 🔄 Alternatives Considered

Considered creating a completely new example, but improving the existing one keeps codebase lean and provides immediate value to current users of the ONNX example.

### 🎯 Use Cases

- Comparing inference latency across CPU and CUDA backends for model deployment decisions
- Providing a starting point for users who want to run ONNX models with GPU acceleration
- Laying groundwork for TensorRT execution provider support (related: #634, 
  GSoC 2026 VLM Inference project)

### 📝 Additional Context

This is a small, focused improvement to an existing example. The changes are compatible, default behavior remains CPU with a single run if no extra flags are provided.

### 🤝 Contribution Intent

- [x] I plan to submit a PR to implement this feature
- [ ] I'm requesting this feature but not planning to implement it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add execution provider selection and inference benchmarking #815

🚀 Feature Description

📂 Feature Category

💡 Motivation

💭 Proposed Solution

📚 Library Reference

🔄 Alternatives Considered

🎯 Use Cases

📝 Additional Context

🤝 Contribution Intent

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add execution provider selection and inference benchmarking #815

Description

🚀 Feature Description

📂 Feature Category

💡 Motivation

💭 Proposed Solution

📚 Library Reference

🔄 Alternatives Considered

🎯 Use Cases

📝 Additional Context

🤝 Contribution Intent

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions