Skip to content

Commit 4c8e95f

Browse files
authored
Merge pull request #467 from schmrlng/README_0.7
* README tweaks * Add README microbenchmark script
2 parents 9e623c0 + a7e9cba commit 4c8e95f

File tree

2 files changed

+86
-41
lines changed

2 files changed

+86
-41
lines changed

README.md

Lines changed: 24 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@
55
[![Build Status](https://travis-ci.org/JuliaArrays/StaticArrays.jl.svg?branch=master)](https://travis-ci.org/JuliaArrays/StaticArrays.jl)
66
[![Build status](https://ci.appveyor.com/api/projects/status/xabgh1yhsjxlp30d?svg=true)](https://ci.appveyor.com/project/JuliaArrays/staticarrays-jl)
77
[![Coverage Status](https://coveralls.io/repos/github/JuliaArrays/StaticArrays.jl/badge.svg?branch=master)](https://coveralls.io/github/JuliaArrays/StaticArrays.jl?branch=master)
8-
[![codecov.io](http://codecov.io/github/JuliaArrays/StaticArrays.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaArrays/StaticArrays.jl?branch=master)
8+
[![codecov.io](https://codecov.io/github/JuliaArrays/StaticArrays.jl/branch/master/graph/badge.svg)](http://codecov.io/github/JuliaArrays/StaticArrays.jl/branch/master)
99
[![](https://img.shields.io/badge/docs-latest-blue.svg)](https://JuliaArrays.github.io/StaticArrays.jl/latest)
1010
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaArrays.github.io/StaticArrays.jl/stable)
1111

1212
**StaticArrays** provides a framework for implementing statically sized arrays
13-
in Julia (≥ 0.5), using the abstract type `StaticArray{Size,T,N} <: AbstractArray{T,N}`.
13+
in Julia, using the abstract type `StaticArray{Size,T,N} <: AbstractArray{T,N}`.
1414
Subtypes of `StaticArray` will provide fast implementations of common array and
1515
linear algebra operations. Note that here "statically sized" means that the
1616
size can be determined from the *type*, and "static" does **not** necessarily
@@ -27,40 +27,42 @@ Full documentation can be found [here](https://JuliaArrays.github.io/StaticArray
2727
## Speed
2828

2929
The speed of *small* `SVector`s, `SMatrix`s and `SArray`s is often > 10 × faster
30-
than `Base.Array`. See this simplified benchmark (or see the full results [here](https://github.com/andyferris/StaticArrays.jl/blob/master/perf/bench10.txt)):
30+
than `Base.Array`. For example, here's a
31+
[microbenchmark](perf/README_benchmarks.jl) showing some common operations.
3132

3233
```
3334
============================================
3435
Benchmarks for 3×3 Float64 matrices
3536
============================================
36-
37-
Matrix multiplication -> 8.2x speedup
38-
Matrix multiplication (mutating) -> 3.1x speedup
39-
Matrix addition -> 45x speedup
40-
Matrix addition (mutating) -> 5.1x speedup
41-
Matrix determinant -> 170x speedup
42-
Matrix inverse -> 125x speedup
43-
Matrix symmetric eigendecomposition -> 82x speedup
44-
Matrix Cholesky decomposition -> 23.6x speedup
37+
Matrix multiplication -> 5.9x speedup
38+
Matrix multiplication (mutating) -> 1.8x speedup
39+
Matrix addition -> 33.1x speedup
40+
Matrix addition (mutating) -> 2.5x speedup
41+
Matrix determinant -> 112.9x speedup
42+
Matrix inverse -> 67.8x speedup
43+
Matrix symmetric eigendecomposition -> 25.0x speedup
44+
Matrix Cholesky decomposition -> 8.8x speedup
45+
Matrix LU decomposition -> 6.1x speedup
46+
Matrix QR decomposition -> 65.0x speedup
4547
```
4648

47-
These results improve significantly when using `julia -O3` with immutable static
48-
arrays, as the extra optimization results in surprisingly good SIMD code.
49+
These numbers were generated on an Intel i7-7700HQ using Julia-1.2. As with all
50+
synthetic benchmarks, the speedups you see here should only be taken as very
51+
roughly indicative of the speedup you may see in real code. When in doubt,
52+
benchmark your real application!
4953

5054
Note that in the current implementation, working with large `StaticArray`s puts a
5155
lot of stress on the compiler, and becomes slower than `Base.Array` as the size
5256
increases. A very rough rule of thumb is that you should consider using a
53-
normal `Array` for arrays larger than 100 elements. For example, the performance
54-
crossover point for a matrix multiply microbenchmark seems to be about 11x11 in
55-
julia 0.5 with default optimizations.
57+
normal `Array` for arrays larger than 100 elements.
5658

5759

5860
## Quick start
5961

62+
Add *StaticArrays* from the [Pkg REPL](https://docs.julialang.org/en/latest/stdlib/Pkg/#Getting-Started-1), i.e., `pkg> add StaticArrays`. Then:
6063
```julia
61-
Pkg.add("StaticArrays") # or Pkg.clone("https://github.com/JuliaArrays/StaticArrays.jl")
62-
using StaticArrays
6364
using LinearAlgebra
65+
using StaticArrays
6466

6567
# Use the convenience constructor type `SA` to create vectors and matrices
6668
SA[1, 2, 3] isa SVector{3,Int}
@@ -111,7 +113,8 @@ rand(MMatrix{20,20}) * rand(MMatrix{20,20}) # large matrices can use BLAS
111113
eigen(m3) # eigen(), etc uses specialized algorithms up to 3×3, or else LAPACK
112114

113115
# Static arrays stay statically sized, even when used by Base functions, etc:
114-
typeof(eigen(m3)) == Eigen{Float64,Float64,SArray{Tuple{3,3},Float64,2,9},SArray{Tuple{3},Float64,1,3}}
116+
typeof(eigen(m3).vectors) == SMatrix{3,3,Float64,9}
117+
typeof(eigen(m3).values) == SVector{3,Float64}
115118

116119
# similar() returns a mutable container, while similar_type() returns a constructor:
117120
typeof(similar(m3)) == MArray{Tuple{3,3},Int64,2,9} # (final parameter is length = 9)
@@ -145,27 +148,7 @@ performance optimizations may be made when the size of the array is known to the
145148
compiler. One example of this is by loop unrolling, which has a substantial
146149
effect on small arrays and tends to automatically trigger LLVM's SIMD
147150
optimizations. Another way performance is boosted is by providing specialized
148-
methods for `det`, `inv`, `eig` and `chol` where the algorithm depends on the
151+
methods for `det`, `inv`, `eigen` and `cholesky` where the algorithm depends on the
149152
precise dimensions of the input. In combination with intelligent fallbacks to
150153
the methods in Base, we seek to provide a comprehensive support for statically
151154
sized arrays, large or small, that hopefully "just works".
152-
153-
## Relationship to *FixedSizeArrays* and *ImmutableArrays*
154-
155-
Several existing packages for statically sized arrays have been developed for
156-
Julia, noteably *FixedSizeArrays* and *ImmutableArrays* which provided signficant
157-
inspiration for this package. Upon consultation, it has been decided to move
158-
forward with *StaticArrays* which has found a new home in the *JuliaArrays*
159-
github organization. It is recommended that new users use this package, and
160-
that existing dependent packages consider switching to *StaticArrays* sometime
161-
during the life-cycle of Julia v0.5.
162-
163-
You can try `using StaticArrays.FixedSizeArrays` to add some compatibility
164-
wrappers for the most commonly used features of the *FixedSizeArrays* package,
165-
such as `Vec`, `Mat`, `Point` and `@fsa`. These wrappers do not provide a
166-
perfect interface, but may help in trying out *StaticArrays* with pre-existing
167-
code.
168-
169-
Furthermore, `using StaticArrays.ImmutableArrays` will let you use the typenames
170-
from the *ImmutableArrays* package, which does not include the array size as a
171-
type parameter (e.g. `Vector3{T}` and `Matrix3x3{T}`).

perf/README_benchmarks.jl

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
using BenchmarkTools
2+
using LinearAlgebra
3+
using StaticArrays
4+
5+
add!(C, A, B) = (C .= A .+ B)
6+
7+
function simple_bench(N, T=Float64)
8+
A = rand(T,N,N)
9+
A = A'*A
10+
B = copy(A)
11+
SA = SMatrix{N,N}(A)
12+
MA = MMatrix{N,N}(A)
13+
MB = copy(MA)
14+
15+
print("""
16+
============================================
17+
Benchmarks for $N×$N $T matrices
18+
============================================
19+
""")
20+
ops = [
21+
("Matrix multiplication ", *, (A, A), (SA, SA)),
22+
("Matrix multiplication (mutating) ", mul!, (B, A, A), (MB, MA, MA)),
23+
("Matrix addition ", +, (A, A), (SA, SA)),
24+
("Matrix addition (mutating) ", add!, (B, A, A), (MB, MA, MA)),
25+
("Matrix determinant ", det, (A,), (SA,)),
26+
("Matrix inverse ", inv, (A,), (SA,)),
27+
("Matrix symmetric eigendecomposition", eigen, (A,), (SA,)),
28+
("Matrix Cholesky decomposition ", cholesky, (A,), (SA,)),
29+
("Matrix LU decomposition ", lu, (A,), (SA,)),
30+
("Matrix QR decomposition ", qr, (A,), (SA,)),
31+
]
32+
for (name, op, Aargs, SAargs) in ops
33+
# We load from Ref's here to avoid the compiler completely removing the
34+
# benchmark in some cases.
35+
#
36+
# Like any microbenchmark, the speedups you see here should only be
37+
# taken as roughly indicative of the speedup you may see in real code.
38+
if length(Aargs) == 1
39+
A1 = Ref(Aargs[1])
40+
SA1 = Ref(SAargs[1])
41+
speedup = @belapsed($op($A1[])) / @belapsed($op($SA1[]))
42+
elseif length(Aargs) == 2
43+
A1 = Ref(Aargs[1])
44+
A2 = Ref(Aargs[2])
45+
SA1 = Ref(SAargs[1])
46+
SA2 = Ref(SAargs[2])
47+
speedup = @belapsed($op($A1[], $A2[])) / @belapsed($op($SA1[], $SA2[]))
48+
elseif length(Aargs) == 3
49+
A1 = Ref(Aargs[1])
50+
A2 = Ref(Aargs[2])
51+
A3 = Ref(Aargs[3])
52+
SA1 = Ref(SAargs[1])
53+
SA2 = Ref(SAargs[2])
54+
SA3 = Ref(SAargs[3])
55+
speedup = @belapsed($op($A1[], $A2[], $A3[])) / @belapsed($op($SA1[], $SA2[], $SA3[]))
56+
else
57+
end
58+
println(name*" -> $(round(speedup, digits=1))x speedup")
59+
end
60+
end
61+
62+
simple_bench(3)

0 commit comments

Comments
 (0)