You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/examples/multithreading.md
+23-15Lines changed: 23 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ relatively primitive arithmetic operations (e.g. `+`, `/`, or `log`), and not, f
32
32
33
33
I'll make comparisons with OpenMP through the rest of this, starting with a simple dot product to focus on threading overhead:
34
34
```julia
35
-
functiondotavxt(a::AbstractArray{T}, b::AbstractArray{T}) where {T <:Real}
35
+
functiondot_tturbo(a::AbstractArray{T}, b::AbstractArray{T}) where {T <:Real}
36
36
s =zero(T)
37
37
@tturbofor i ∈eachindex(a,b)
38
38
s += a[i] * b[i]
@@ -70,19 +70,19 @@ Trying out one size to give a perspective on scale:
70
70
```julia
71
71
julia> N =10_000; x =rand(N); y =rand(N);
72
72
73
-
julia>@btimedot($x, $y)
73
+
julia>@btimedot($x, $y)# LinearAlgebra
74
74
1.114 μs (0 allocations:0 bytes)
75
75
2480.296446711209
76
76
77
-
julia>@btimedotavx($x, $y)
77
+
julia>@btimedot_turbo($x, $y)
78
78
761.621 ns (0 allocations:0 bytes)
79
79
2480.296446711209
80
80
81
-
julia>@btimedotavxt($x, $y)
81
+
julia>@btimedot_tturbo($x, $y)
82
82
622.723 ns (0 allocations:0 bytes)
83
83
2480.296446711209
84
84
85
-
julia>@btimedotbaseline($x, $y)
85
+
julia>@btimedot_baseline($x, $y)
86
86
1.294 μs (0 allocations:0 bytes)
87
87
2480.2964467112097
88
88
@@ -96,11 +96,11 @@ All these times are fairly fast; `wait(Threads.@spawn 1+1)` will typically take
96
96
97
97
Now let's look at a more complex example:
98
98
```julia
99
-
functiondotavxt(ca::AbstractVector{Complex{T}}, cb::AbstractVector{Complex{T}}) where {T}
99
+
functiondot_tturbo(ca::AbstractVector{Complex{T}}, cb::AbstractVector{Complex{T}}) where {T}
100
100
a =reinterpret(reshape, T, ca)
101
101
b =reinterpret(reshape, T, cb)
102
102
re =zero(T); im =zero(T)
103
-
@turbofor i ∈axes(a,2) # adjoint(a[i]) * b[i]
103
+
@tturbofor i ∈axes(a,2) # adjoint(a[i]) * b[i]
104
104
re += a[1,i] * b[1,i] + a[2,i] * b[2,i]
105
105
im += a[1,i] * b[2,i] - a[2,i] * b[1,i]
106
106
end
@@ -139,15 +139,23 @@ and as we have an array of structs rather than structs of arrays, we need additi
139
139
140
140
If we take this further to the three-argument dot product, which isn't implemented in BLAS, `@tturbo` now holds a substantial advantage over the competition:
141
141
```julia
142
-
functiondotavxt(ca::AbstractVector{Complex{T}}, cb::AbstractVector{Complex{T}}) where {T}
143
-
a =reinterpret(reshape, T, ca)
144
-
b =reinterpret(reshape, T, cb)
145
-
re =zero(T); im =zero(T)
146
-
@tturbofor i ∈axes(a,2) # adjoint(a[i]) * b[i]
147
-
re += a[1,i] * b[1,i] + a[2,i] * b[2,i]
148
-
im += a[1,i] * b[2,i] - a[2,i] * b[1,i]
142
+
functiondot3(x::AbstractVector{Complex{T}}, A::AbstractMatrix{Complex{T}}, y::AbstractVector{Complex{T}}) where {T}
0 commit comments