-
-
Notifications
You must be signed in to change notification settings - Fork 174
Description
π Feature Description
Add throughput benchmarks for the 11 unbenched operations in kornia-tensor-ops:
add, sub, mul, div, min, mul_scalar, abs, powf, powi, mean, and sum_elements.
Currently only dot_product and cosine_similarity are benchmarked.
π Feature Category
Performance Optimization
π‘ Motivation
The GSoC 2026 GPU acceleration project requires CPU performance baselines before
GPU kernels can be written and speedups verified. Without benchmarks for these ops,
there is no way to measure the impact of GPU acceleration when it lands.
π Proposed Solution
Add ~220 lines to crates/kornia-tensor-ops/benches/bench_ops.rs covering all 11
unbenched ops. Use Throughput::Bytes (not Elements) so results are directly
comparable to hardware bandwidth specs and roofline analysis. Test sizes:
[64, 512, 4096, 65536] β spanning sub-cache through bandwidth-bound regimes.
π Library Reference
Criterion.rs benchmarking library (already a dependency). Pattern follows existing
bench_dot_product1 and bench_cosine_similarity in bench_ops.rs.
π Alternatives Considered
Could use Throughput::Elements instead of Bytes, but Bytes allows direct comparison
to DRAM bandwidth in MB/s which is the standard language of roofline modeling.
π― Use Cases
When GPU backends are added to kornia-tensor, corresponding GPU benchmarks can reuse
this metric and speedup will be immediately visible as MB/s on the same roofline chart.
π Additional Context
I am a GSoC 2026 applicant targeting the GPU acceleration project. @edgarriba
π€ Contribution Intent
- I plan to submit a PR to implement this feature
- I'm requesting this feature but not planning to implement it