Skip to content

Commit c923a60

Browse files
committed
Docs: Single-socket Graviton4 results
1 parent c40b7f3 commit c923a60

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,3 +213,49 @@ The reason this happens is that on Zen4:
213213
- Fused-Multiply-Add instructions like `vfmadd132ps zmm, zmm, zmm` execute on ports 0 and 1.
214214

215215
So if the CPU can fetch enough data in time, we can have at least 4 ports simultaneously busy, and the latency of the operation is hidden.
216+
217+
### AWS Graviton4 `c8g.metal-24xl`
218+
219+
On AWS Graviton4 `c8g.metal-24xl` instances with GCC 12, one may expect the following results:
220+
221+
```sh
222+
$ build_release/reduce_bench
223+
You did not feed the size of arrays, so we will use a 1GB array!
224+
Page size: 4096 bytes
225+
Cache line size: 64 bytes
226+
Dataset size: 268435456 elements
227+
Dataset alignment: 64 bytes
228+
Dataset allocation type: mmap
229+
Dataset NUMA nodes: 1
230+
2025-05-03T20:50:16+00:00
231+
Running build_release/reduce_bench
232+
Run on (96 X 2000 MHz CPU s)
233+
CPU Caches:
234+
L1 Data 64 KiB (x96)
235+
L1 Instruction 64 KiB (x96)
236+
L2 Unified 2048 KiB (x96)
237+
L3 Unified 36864 KiB (x1)
238+
Load Average: 5.76, 6.38, 2.75
239+
---------------------------------------------------------------------------------------------------------------
240+
Benchmark Time CPU Iterations UserCounters...
241+
---------------------------------------------------------------------------------------------------------------
242+
unrolled/f32/min_time:10.000/real_time 38034000 ns 38033650 ns 368 bytes/s=28.2311G/s error,%=50
243+
unrolled/f64/min_time:10.000/real_time 72851731 ns 72852189 ns 192 bytes/s=14.7387G/s error,%=0
244+
std::accumulate/f32/min_time:10.000/real_time 192162701 ns 192164003 ns 73 bytes/s=5.58767G/s error,%=93.75
245+
std::accumulate/f64/min_time:10.000/real_time 192266754 ns 192268708 ns 73 bytes/s=5.58465G/s error,%=0
246+
serial/f32/av::fork_union/min_time:10.000/real_time 1889686 ns 1889604 ns 7320 bytes/s=568.212G/s error,%=0
247+
serial/f64/av::fork_union/min_time:10.000/real_time 1935453 ns 1935360 ns 7309 bytes/s=554.775G/s error,%=0
248+
serial/f32/openmp/min_time:10.000/real_time 2244099 ns 2108568 ns 4723 bytes/s=478.473G/s error,%=71.5256u
249+
std::reduce<par>/f32/min_time:10.000/real_time 1950894 ns 1950842 ns 7129 bytes/s=550.384G/s error,%=0
250+
std::reduce<par>/f64/min_time:10.000/real_time 1959062 ns 1953907 ns 7121 bytes/s=548.09G/s error,%=0
251+
std::reduce<par_unseq>/f32/min_time:10.000/real_time 1956428 ns 1949906 ns 7139 bytes/s=548.828G/s error,%=0
252+
std::reduce<par_unseq>/f64/min_time:10.000/real_time 1953465 ns 1952599 ns 7117 bytes/s=549.66G/s error,%=0
253+
neon/f32/min_time:10.000/real_time 48248562 ns 48249488 ns 290 bytes/s=22.2544G/s error,%=75
254+
neon/f32/av::fork_union/min_time:10.000/real_time 1890173 ns 1887574 ns 7354 bytes/s=568.065G/s error,%=0
255+
neon/f32/std::threads/min_time:10.000/real_time 3321599 ns 3181368 ns 4221 bytes/s=323.261G/s error,%=1.04167
256+
neon/f32/openmp/min_time:10.000/real_time 1901684 ns 1899327 ns 7263 bytes/s=564.627G/s error,%=23.8419u
257+
sve/f32/min_time:10.000/real_time 50048126 ns 50049059 ns 280 bytes/s=21.4542G/s error,%=75
258+
sve/f32/av::fork_union/min_time:10.000/real_time 1898117 ns 1897862 ns 7329 bytes/s=565.688G/s error,%=0
259+
sve/f32/std::threads/min_time:10.000/real_time 3347690 ns 3203386 ns 4190 bytes/s=320.741G/s error,%=1.04167
260+
sve/f32/openmp/min_time:10.000/real_time 1909972 ns 1901816 ns 7274 bytes/s=562.177G/s error,%=23.8419u
261+
```

0 commit comments

Comments
 (0)