Skip to content

Commit 424ed29

Browse files
committed
Need for Speed fixed
1 parent 48a63e9 commit 424ed29

File tree

1 file changed

+62
-47
lines changed

1 file changed

+62
-47
lines changed

lectures/software_engineering/need_for_speed.md

Lines changed: 62 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -131,15 +131,15 @@ For example, we can't at present add an integer and a string in Julia (i.e. `100
131131

132132
This is sensible behavior, but if you want to change it there's nothing to stop you.
133133

134-
```{code-cell} julia
134+
```{code-cell} none
135135
import Base: + # enables adding methods to the + function
136-
137136
+(x::Integer, y::String) = x + parse(Int, y)
138-
139137
@show +(100, "100")
140138
@show 100 + "100"; # equivalent
141139
```
142140

141+
The above code is not executed to avoid any chance of a [method invalidation](https://julialang.org/blog/2020/08/invalidations/), where are a source of compile-time latency.
142+
143143
### Understanding the Compilation Process
144144

145145
We can now be a little bit clearer about what happens when you call a function on given types.
@@ -535,8 +535,9 @@ To illustrate, consider this code, where `b` is global
535535
b = 1.0
536536
function g(a)
537537
global b
538+
tmp = a
538539
for i in 1:1_000_000
539-
tmp = a + b
540+
tmp = tmp + a + b
540541
end
541542
return tmp
542543
end
@@ -560,24 +561,26 @@ If we eliminate the global variable like so
560561

561562
```{code-cell} julia
562563
function g(a, b)
564+
tmp = a
563565
for i in 1:1_000_000
564-
tmp = a + b
566+
tmp = tmp + a + b
565567
end
566-
return tmp
568+
return tmp
567569
end
568570
```
569571

570-
then execution speed improves dramatically
572+
then execution speed improves dramatically. Furthermore, the number of allocations has dropped to zero.
571573

572574
```{code-cell} julia
573575
@btime g(1.0, 1.0)
574576
```
575577

576-
Note that the second run was dramatically faster than the first.
578+
Note that if you called `@time` instead, the first call would be slower as it would need to compile the function. Conversely, `@btime` discards the first call and runs it multiple times.
577579

578-
That's because the first call included the time for JIT compilation.
579-
580-
Notice also how small the memory footprint of the execution is.
580+
More information is available with `@benchmark` instead,
581+
```{code-cell} julia
582+
@benchmark g(1.0, 1.0)
583+
```
581584

582585
Also, the machine code is simple and clean
583586

@@ -588,20 +591,19 @@ Also, the machine code is simple and clean
588591
Now the compiler is certain of types throughout execution of the function and
589592
hence can optimize accordingly.
590593

591-
#### The `const` keyword
592-
593-
Another way to stabilize the code above is to maintain the global variable but
594-
prepend it with `const`
594+
If global variations are strictly needed (and they almost never are) then you can declare them with a `const` to declare to Julia that the type never changes (the value can). For example,
595595

596596
```{code-cell} julia
597597
const b_const = 1.0
598-
function g(a)
598+
function g_const(a)
599599
global b_const
600+
tmp = a
600601
for i in 1:1_000_000
601-
tmp = a + b_const
602+
tmp = tmp + a + b_const
602603
end
603604
return tmp
604605
end
606+
@btime g_const(1)
605607
```
606608

607609
Now the compiler can again generate efficient machine code.
@@ -626,7 +628,7 @@ As we'll see, the last of these options gives us the best performance, while sti
626628
Here's the untyped case
627629

628630
```{code-cell} julia
629-
struct Foo_generic
631+
struct Foo_any
630632
a
631633
end
632634
```
@@ -639,7 +641,7 @@ struct Foo_abstract
639641
end
640642
```
641643

642-
Finally, here's the parametrically typed case
644+
Finally, here's the parametrically typed case (where the `{T <: Real}` is not necessary for performance, and could simply be `{T}`
643645

644646
```{code-cell} julia
645647
struct Foo_concrete{T <: Real}
@@ -650,7 +652,7 @@ end
650652
Now we generate instances
651653

652654
```{code-cell} julia
653-
fg = Foo_generic(1.0)
655+
fg = Foo_any(1.0)
654656
fa = Foo_abstract(1.0)
655657
fc = Foo_concrete(1.0)
656658
```
@@ -669,14 +671,15 @@ Here's a function that uses the field `a` of our objects
669671

670672
```{code-cell} julia
671673
function f(foo)
674+
tmp = foo.a
672675
for i in 1:1_000_000
673676
tmp = i + foo.a
674677
end
675678
return tmp
676679
end
677680
```
678681

679-
Let's try timing our code, starting with the generic case:
682+
Let's try timing our code, starting with the case without any constraints:
680683

681684
```{code-cell} julia
682685
@btime f($fg)
@@ -690,7 +693,7 @@ Here's the nasty looking machine code
690693
@code_native f(fg)
691694
```
692695

693-
The abstract case is similar
696+
The abstract case is almost identical,
694697

695698
```{code-cell} julia
696699
@btime f($fa)
@@ -706,14 +709,18 @@ Finally, let's look at the parametrically typed version
706709
@btime f($fc)
707710
```
708711

709-
Some of this time is JIT compilation, and one more execution gets us down to.
712+
Which is improbably small - since a runtime of 1-2 nanoseconds without any allocations suggests no computations really took place.
710713

711-
Here's the corresponding machine code
714+
A hint is in the simplicity of the corresponding machine code
712715

713716
```{code-cell} julia
714717
@code_native f(fc)
715718
```
716719

720+
This machine code has none of the hallmark assembly instructions associated with a loop, in particular loops (e.g. `for` in julia) end up as jumps in the machine code (e.g. `jne`).
721+
722+
Here, the compiler was smart enough to realize that only the final step in the loop matters, i.e. it could generate the equivalent to `f(a) = 1_000_000 + foo.a` and then it can return that directly and skip the loop. These sorts of code optimizations are only possible if the compiler is able to use a great deal of information about the types.
723+
717724

718725
Finally, note that if we compile a slightly different version of the function, which doesn't actually return the value
719726
```{code-cell} julia
@@ -722,26 +729,11 @@ function f_no_return(foo)
722729
tmp = i + foo.a
723730
end
724731
end
725-
```
726-
That
727-
```{code-cell} julia
728-
@btime f_no_return($fc)
729-
```
730-
Which seems improbably small. The machine code gives a hint,
731-
```{code-cell} julia
732732
@code_native f_no_return(fc)
733733
```
734+
We see that the code is even simpler. In effect, if figured out that because `tmp` wasn't returned, and the `foo.a` could have no side effects (since it knows the type of `a`), that it doesn't even need to execute any of the code in the function.
734735

735-
Note that in this case, the machine code is doing nothing. The compiler was able to prove that the function had no side effects and hence it could simply ignore the inputs.
736-
737-
This is not the case with the abstract case, because the compiler is unable to prove this invariant.
738-
739-
```{code-cell} julia
740-
@btime f_no_return($fa)
741-
```
742-
### Abstract Containers
743-
744-
Another way we can run into trouble is with abstract container types.
736+
### Type Inference
745737

746738
Consider the following function, which essentially does the same job as Julia's `sum()` function but acts only on floating point data
747739

@@ -758,22 +750,45 @@ end
758750
Calls to this function run very quickly
759751

760752
```{code-cell} julia
761-
x = range(0, 1, length = Int(1e6))
762-
x = collect(x)
753+
x_range = range(0, 1, length = 100_000)
754+
x = collect(x_range)
763755
typeof(x)
764756
```
765757

766758
```{code-cell} julia
767759
@btime sum_float_array($x)
768760
```
769761

770-
When Julia compiles this function, it knows that the data passed in as `x` will be an array of 64 bit floats.
762+
When Julia compiles this function, it knows that the data passed in as `x` will be an array of 64 bit floats. Hence it's known to the compiler that the relevant method for `+` is always addition of floating point numbers.
763+
764+
But consider a version without that type annotation
765+
766+
```{code-cell} julia
767+
function sum_array(x)
768+
sum = 0.0
769+
for i in eachindex(x)
770+
sum += x[i]
771+
end
772+
return sum
773+
end
774+
@btime sum_array($x)
775+
```
776+
777+
Note that this has the same running time as the one with the explicit types. In julia, there is (almost never) performance gain from declaring types, and if anything they can make things worse by limited the potential for specialized algorithms. See {doc}`generic programming <../more_julia/generic_programming>` for more.
778+
779+
As an example within Julia code, look at the built-in sum for array
771780

772-
Hence it's known to the compiler that the relevant method for `+` is always addition of floating point numbers.
781+
```{code-cell} julia
782+
@btime sum($x)
783+
```
784+
785+
Versus the underlying range
773786

774-
Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify memory access.
787+
```{code-cell} julia
788+
@btime sum($x_range)
789+
```
775790

776-
Finally, data types are stable --- for example, the local variable `sum` starts off as a float and remains a float throughout.
791+
Note that the difference in speed is enormous---suggesting it is better to keep things in their more structured forms as long as possible. You can check the underlying source used for this with `@which sum(x_range)` to see the specialized algorithm used.
777792

778793
#### Type Inferences
779794

0 commit comments

Comments
 (0)