Skip to content

Commit a7e9cba

Browse files
committed
Use Ref's in README microbenchmark to avoid overoptimization
Julia 1.2 will optimize some of these benchmarks away, so use dereferencing of Ref's to at least prevent that particular error. Update the README a bit with these results.
1 parent b315670 commit a7e9cba

File tree

2 files changed

+50
-22
lines changed

2 files changed

+50
-22
lines changed

README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,22 +27,30 @@ Full documentation can be found [here](https://JuliaArrays.github.io/StaticArray
2727
## Speed
2828

2929
The speed of *small* `SVector`s, `SMatrix`s and `SArray`s is often > 10 × faster
30-
than `Base.Array`. See this simplified benchmark:
30+
than `Base.Array`. For example, here's a
31+
[microbenchmark](perf/README_benchmarks.jl) showing some common operations.
3132

3233
```
3334
============================================
3435
Benchmarks for 3×3 Float64 matrices
3536
============================================
36-
Matrix multiplication -> 5.1x speedup
37-
Matrix multiplication (mutating) -> 1.6x speedup
38-
Matrix addition -> 14.0x speedup
39-
Matrix addition (mutating) -> 2.1x speedup
40-
Matrix determinant -> 119.3x speedup
41-
Matrix inverse -> 65.6x speedup
42-
Matrix symmetric eigendecomposition -> 24.8x speedup
43-
Matrix Cholesky decomposition -> 12.1x speedup
37+
Matrix multiplication -> 5.9x speedup
38+
Matrix multiplication (mutating) -> 1.8x speedup
39+
Matrix addition -> 33.1x speedup
40+
Matrix addition (mutating) -> 2.5x speedup
41+
Matrix determinant -> 112.9x speedup
42+
Matrix inverse -> 67.8x speedup
43+
Matrix symmetric eigendecomposition -> 25.0x speedup
44+
Matrix Cholesky decomposition -> 8.8x speedup
45+
Matrix LU decomposition -> 6.1x speedup
46+
Matrix QR decomposition -> 65.0x speedup
4447
```
4548

49+
These numbers were generated on an Intel i7-7700HQ using Julia-1.2. As with all
50+
synthetic benchmarks, the speedups you see here should only be taken as very
51+
roughly indicative of the speedup you may see in real code. When in doubt,
52+
benchmark your real application!
53+
4654
Note that in the current implementation, working with large `StaticArray`s puts a
4755
lot of stress on the compiler, and becomes slower than `Base.Array` as the size
4856
increases. A very rough rule of thumb is that you should consider using a

perf/README_benchmarks.jl

Lines changed: 33 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -18,22 +18,42 @@ function simple_bench(N, T=Float64)
1818
============================================
1919
""")
2020
ops = [
21-
("Matrix multiplication ", *, (A, A), (SA, SA)),
22-
("Matrix multiplication (mutating) ", mul!, (B, A, A), (MB, MA, MA)),
23-
("Matrix addition ", +, (A, A), (SA, SA)),
24-
("Matrix addition (mutating) ", add!, (B, A, A), (MB, MA, MA)),
25-
("Matrix determinant ", det, A, SA),
26-
("Matrix inverse ", inv, A, SA),
27-
("Matrix symmetric eigendecomposition", eigen, A, SA),
28-
("Matrix Cholesky decomposition ", cholesky, A, SA)
21+
("Matrix multiplication ", *, (A, A), (SA, SA)),
22+
("Matrix multiplication (mutating) ", mul!, (B, A, A), (MB, MA, MA)),
23+
("Matrix addition ", +, (A, A), (SA, SA)),
24+
("Matrix addition (mutating) ", add!, (B, A, A), (MB, MA, MA)),
25+
("Matrix determinant ", det, (A,), (SA,)),
26+
("Matrix inverse ", inv, (A,), (SA,)),
27+
("Matrix symmetric eigendecomposition", eigen, (A,), (SA,)),
28+
("Matrix Cholesky decomposition ", cholesky, (A,), (SA,)),
29+
("Matrix LU decomposition ", lu, (A,), (SA,)),
30+
("Matrix QR decomposition ", qr, (A,), (SA,)),
2931
]
3032
for (name, op, Aargs, SAargs) in ops
31-
if Aargs isa Tuple && length(Aargs) == 2
32-
speedup = @belapsed($op($Aargs[1], $Aargs[2])) / @belapsed($op($SAargs[1], $SAargs[2]))
33-
elseif Aargs isa Tuple && length(Aargs) == 3
34-
speedup = @belapsed($op($Aargs[1], $Aargs[2], $Aargs[3])) / @belapsed($op($SAargs[1], $SAargs[2], $SAargs[3]))
33+
# We load from Ref's here to avoid the compiler completely removing the
34+
# benchmark in some cases.
35+
#
36+
# Like any microbenchmark, the speedups you see here should only be
37+
# taken as roughly indicative of the speedup you may see in real code.
38+
if length(Aargs) == 1
39+
A1 = Ref(Aargs[1])
40+
SA1 = Ref(SAargs[1])
41+
speedup = @belapsed($op($A1[])) / @belapsed($op($SA1[]))
42+
elseif length(Aargs) == 2
43+
A1 = Ref(Aargs[1])
44+
A2 = Ref(Aargs[2])
45+
SA1 = Ref(SAargs[1])
46+
SA2 = Ref(SAargs[2])
47+
speedup = @belapsed($op($A1[], $A2[])) / @belapsed($op($SA1[], $SA2[]))
48+
elseif length(Aargs) == 3
49+
A1 = Ref(Aargs[1])
50+
A2 = Ref(Aargs[2])
51+
A3 = Ref(Aargs[3])
52+
SA1 = Ref(SAargs[1])
53+
SA2 = Ref(SAargs[2])
54+
SA3 = Ref(SAargs[3])
55+
speedup = @belapsed($op($A1[], $A2[], $A3[])) / @belapsed($op($SA1[], $SA2[], $SA3[]))
3556
else
36-
speedup = @belapsed($op($Aargs)) / @belapsed($op($SAargs))
3757
end
3858
println(name*" -> $(round(speedup, digits=1))x speedup")
3959
end

0 commit comments

Comments
 (0)