@@ -56,7 +56,7 @@ Works with any GPU in Windows, Linux, macOS and Android.
5656| Device Name | NVIDIA H100 80GB HBM3 |
5757| Device Vendor | NVIDIA Corporation |
5858| Device Driver | 565.57.01 (Linux) |
59- | OpenCL Version | OpenCL C 1.2 |
59+ | OpenCL Version | OpenCL C 3.0 |
6060| Compute Units | 132 at 1980 MHz (16896 cores, 66.908 TFLOPs/s) |
6161| Memory, Cache | 81105 MB VRAM, 4224 KB global / 48 KB local |
6262| Buffer Limits | 20276 MB global, 64 KB constant |
@@ -80,30 +80,30 @@ Works with any GPU in Windows, Linux, macOS and Android.
8080```
8181```
8282|----------------.------------------------------------------------------------|
83- | Device ID | 2 |
84- | Device Name | AMD Instinct MI210 |
83+ | Device ID | 0 |
84+ | Device Name | AMD Instinct MI300X |
8585| Device Vendor | Advanced Micro Devices, Inc. |
86- | Device Driver | 3625 .0 (HSA1.1,LC) (Linux) |
86+ | Device Driver | 3635 .0 (HSA1.1,LC) (Linux) |
8787| OpenCL Version | OpenCL C 2.0 |
88- | Compute Units | 104 at 1700 MHz (6656 cores, 22.630 TFLOPs/s) |
89- | Memory, Cache | 65520 MB VRAM, 16 KB global / 64 KB local |
90- | Buffer Limits | 65520 MB global, 67092480 KB constant |
88+ | Compute Units | 304 at 2100 MHz (19456 cores, 81.715 TFLOPs/s) |
89+ | Memory, Cache | 196592 MB VRAM, 32 KB global / 64 KB local |
90+ | Buffer Limits | 196592 MB global, 201310208 KB constant |
9191|----------------'------------------------------------------------------------|
9292| Info: OpenCL C code successfully compiled. |
93- | FP64 compute 17.681 TFLOPs/s (2/3 ) |
94- | FP32 compute 20.007 TFLOPs/s ( 1x ) |
95- | FP16 compute 39.594 TFLOPs/s ( 2x ) |
96- | INT64 compute 1.515 TIOPs/s (1/16 ) |
97- | INT32 compute 9.877 TIOPs/s (1/2 ) |
98- | INT16 compute 19.532 TIOPs/s ( 1x ) |
99- | INT8 compute 36.307 TIOPs/s ( 2x ) |
100- | Memory Bandwidth ( coalesced read ) 993.82 GB/s |
101- | Memory Bandwidth ( coalesced write) 999.76 GB/s |
102- | Memory Bandwidth (misaligned read ) 1325.91 GB/s |
103- | Memory Bandwidth (misaligned write) 635.20 GB/s |
104- | PCIe Bandwidth (send ) 28.72 GB/s |
105- | PCIe Bandwidth ( receive ) 28.51 GB/s |
106- | PCIe Bandwidth ( bidirectional) (Gen4 x16) 28.61 GB/s |
93+ | FP64 compute 54.944 TFLOPs/s (2/3 ) |
94+ | FP32 compute 130.000 TFLOPs/s ( 2x ) |
95+ | FP16 compute 141.320 TFLOPs/s ( 2x ) |
96+ | INT64 compute 3.666 TIOPs/s (1/24 ) |
97+ | INT32 compute 47.736 TIOPs/s (2/3 ) |
98+ | INT16 compute 69.022 TIOPs/s ( 1x ) |
99+ | INT8 compute 106.178 TIOPs/s ( 1x ) |
100+ | Memory Bandwidth ( coalesced read ) 3756.64 GB/s |
101+ | Memory Bandwidth ( coalesced write) 4686.31 GB/s |
102+ | Memory Bandwidth (misaligned read ) 3881.24 GB/s |
103+ | Memory Bandwidth (misaligned write) 2491.25 GB/s |
104+ | PCIe Bandwidth (send ) 54.57 GB/s |
105+ | PCIe Bandwidth ( receive ) 55.79 GB/s |
106+ | PCIe Bandwidth ( bidirectional) (Gen4 x16) 55.21 GB/s |
107107|-----------------------------------------------------------------------------|
108108```
109109```
0 commit comments