-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Thanks for sharing your benchmarks and your thoughts on your blog.
Another framework you may want to consider in Rust is Paralight, which offers the best of both worlds: a functional API similar to Rayon together with a choice of strategies on the lightweight vs. work-stealing dimension (Rayon always operates in work-stealing mode, which may have its overhead depending on the workloads). See also this blog post that explains the inner workings of Rayon and the simpler parallelism strategy that ultimately became Paralight.
With Paralight, the following thread pool parameters should be low overhead, and close to the "manual" partitioning of the array items you've described in your post:
use paralight::{CpuPinningPolicy, RangeStrategy, ThreadCount, ThreadPoolBuilder};
use paralight::iter::{IntoParallelRefSource, ParallelIteratorExt, ParallelSourceExt};
let mut thread_pool = ThreadPoolBuilder {
num_threads: ThreadCount::AvailableParallelism, // Or a manual fixed number if you prefer
range_strategy: RangeStrategy::Fixed, // Fixed, low-overhead partitioning of the input array among worker threads
cpu_pinning: CpuPinningPolicy::No, // I don't think it matters much for your example
}
.build();
// Functional style sum, nothing more.
let sum = data.par_iter().with_thread_pool(&mut thread_pool).sum::<f32>();I'm curious how it fares in your benchmarks :)
Incidentally, Paralight already has sum reduction benchmarks: https://github.com/gendx/paralight/blob/main/benches/criterion.rs#L150.
Note: Paralight is still in alpha version because some more advanced APIs aren't fleshed out yet, but reducing values from an array is available and fairly straightforward to use.