You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a work in progress light-weight
thread-pool variant targeting OpenMP-like
use-cases.
It doesn't match OpenMP performance on
small inputs and is still a long way from our
goal #7.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -195,7 +195,7 @@ Observations:
195
195
- 370 GB/s can be reached in dual-socket DDR5 setups with 12 channel memory.
196
196
- Using Kahan-like schemes is 3x slower than pure `float` and 2x slower than `double`.
197
197
198
-
One of the interesting observations is the effect of latency hiding, interleaving the operations executing on different ports of the same CPU.
198
+
One of the interesting observations is the effect of [latency hiding, interleaving the operations executing on different ports of the same CPU](https://ashvardanian.com/posts/cpu-ports).
199
199
It is evident when benchmarking AVX-512 kernels on very small arrays:
0 commit comments