Skip to content

Commit 32aff03

Browse files
committed
Remove incorrect gen1 Ryzen speculation from README.
1 parent 89395b7 commit 32aff03

File tree

1 file changed

+0
-2
lines changed

1 file changed

+0
-2
lines changed

README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,6 @@ julia> @btime myselfdotavx($b)
120120
For this reason, the `@avx` version is roughly twice as fast. The `@inbounds @simd` version, however, is not, because it runs into the problem of loop carried dependencies: to add `a[i]*b[i]` to `s_new = s_old + a[i-j]*b[i-j]`, we must have first finished calculating `s_new`, but -- while two `fma` instructions can be initiated per cycle -- they each take several clock cycles to complete.
121121
For this reason, we need to unroll the operation to run several independent instances concurrently. The `@avx` macro models this cost to try and pick an optimal unroll factor.
122122

123-
Note that 14 and 12 nm Ryzen chips can only do 1 full width `fma` per clock cycle (and 2 loads), so they should see similar performance with the dot and selfdot. I haven't verified this, but would like to hear from anyone who can.
124-
125123
</p>
126124
</details>
127125

0 commit comments

Comments
 (0)