Skip to content

Commit 4b877b9

Browse files
committed
Docs: H200 benchmarks
1 parent fad513b commit 4b877b9

File tree

1 file changed

+13
-2
lines changed

1 file changed

+13
-2
lines changed

README.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ All the library dependencies, including GTest, GBench, Intel oneTBB, FMT, and Th
5252
You are expected to build this on an x86 machine with CUDA drivers installed.
5353

5454
```sh
55-
cmake -B build_release
56-
cmake --build build_release --config Release
55+
cmake -B build_release -D CMAKE_BUILD_TYPE=Release # Generate the build files
56+
cmake --build build_release --config Release # Build the project
5757
build_release/reduce_bench # Run all benchmarks
5858
build_release/reduce_bench --benchmark_filter="cuda" # Only CUDA-related
5959
PARALLEL_REDUCTIONS_LENGTH=1024 build_release/reduce_bench # Set a different input size
@@ -136,6 +136,17 @@ Observations:
136136
- 2.2 TB/s using vanilla CUDA approaches.
137137
- 3 TB/s using CUB.
138138

139+
On Nvidia H200 GPUs, the numbers are even higher:
140+
141+
```sh
142+
-------------------------------------------------------------------------------------------------------------
143+
Benchmark Time CPU Iterations UserCounters...
144+
-------------------------------------------------------------------------------------------------------------
145+
cuda/cub/min_time:10.000/real_time 254609 ns 254607 ns 54992 bytes/s=4.21723T/s error,%=0
146+
cuda/thrust/min_time:10.000/real_time 319709 ns 316368 ns 43846 bytes/s=3.3585T/s error,%=0
147+
cuda/thrust/interleaving/min_time:10.000/real_time 318598 ns 314996 ns 43956 bytes/s=3.37021T/s error,%=0
148+
```
149+
139150
### AWS Zen4 `m7a.metal-48xl`
140151

141152
On AWS Zen4 `m7a.metal-48xl` instances with GCC 12, one may expect the following results:

0 commit comments

Comments
 (0)