You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- To enable [Intel OpenCL](https://github.com/intel/compute-runtime/blob/master/README.md) on CPUs: `apt-get install intel-opencl-icd`.
98
-
- To run on integrated Intel GPU, follow [this guide](https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/prerequisites.html).
99
-
100
-
### Rust
101
-
102
-
Several basic kernels and CPU-oriented parallel reductions are also implemented in Rust.
103
-
To build and run the Rust code, you need to have the Rust toolchain installed. You can use `rustup` to install it:
104
-
105
-
```sh
106
-
rustup toolchain install nightly
107
-
cargo +nightly test --release
108
-
cargo +nightly bench
109
-
```
110
-
111
42
## Results
112
43
113
44
Different hardware would yield different results, but the general trends and observations are:
@@ -383,3 +314,85 @@ test rayon ... bench: 42,649 ns/iter (+/- 4,220)
383
314
test tokio ... bench: 83,644 ns/iter (+/- 3,684)
384
315
test smol ... bench: 3,346 ns/iter (+/- 86)
385
316
```
317
+
318
+
## Build & Run
319
+
320
+
### Rust
321
+
322
+
Several basic kernels and CPU-oriented parallel reductions are also implemented in Rust.
323
+
To build and run the Rust code, you need to have the Rust toolchain installed. You can use `rustup` to install it:
324
+
325
+
```sh
326
+
rustup toolchain install nightly
327
+
cargo +nightly test --release
328
+
cargo +nightly bench
329
+
```
330
+
331
+
### C++
332
+
333
+
This repository is a CMake project designed to be built on Linux with GCC, Clang, or NVCC.
334
+
You may need to install the following dependencies for complete functionality:
335
+
336
+
```sh
337
+
sudo apt install libblas-dev # For OpenBLAS on Linux
338
+
sudo apt install libnuma1 libnuma-dev # For NUMA allocators on Linux
339
+
sudo apt install cuda-toolkit # This may not be as easy 😈
340
+
```
341
+
342
+
The following script will, by default, generate a 1GB array of numbers and reduce them using every available backend.
343
+
All the classical Google Benchmark arguments are supported, including `--benchmark_filter=opencl`.
344
+
All the library dependencies, including GTest, GBench, Intel oneTBB, FMT, and Thrust with CUB, will be automatically fetched.
345
+
You are expected to build this on an x86 machine with CUDA drivers installed.
346
+
347
+
```sh
348
+
cmake -B build_release -D CMAKE_BUILD_TYPE=Release # Generate the build files
349
+
cmake --build build_release --config Release -j # Build the project
350
+
build_release/reduce_bench # Run all benchmarks
351
+
build_release/reduce_bench --benchmark_filter="cuda"# Only CUDA-related
352
+
PARALLEL_REDUCTIONS_LENGTH=1024 build_release/reduce_bench # Set a different input size
353
+
```
354
+
355
+
Need a more fine-grained control to run only CUDA-based backends?
- To enable [Intel OpenCL](https://github.com/intel/compute-runtime/blob/master/README.md) on CPUs: `apt-get install intel-opencl-icd`.
397
+
- To run on integrated Intel GPU, follow [this guide](https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/prerequisites.html).
0 commit comments