You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/software_engineering/need_for_speed.md
+62-47Lines changed: 62 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,15 +131,15 @@ For example, we can't at present add an integer and a string in Julia (i.e. `100
131
131
132
132
This is sensible behavior, but if you want to change it there's nothing to stop you.
133
133
134
-
```{code-cell}julia
134
+
```{code-cell}none
135
135
import Base: + # enables adding methods to the + function
136
-
137
136
+(x::Integer, y::String) = x + parse(Int, y)
138
-
139
137
@show +(100, "100")
140
138
@show 100 + "100"; # equivalent
141
139
```
142
140
141
+
The above code is not executed to avoid any chance of a [method invalidation](https://julialang.org/blog/2020/08/invalidations/), where are a source of compile-time latency.
142
+
143
143
### Understanding the Compilation Process
144
144
145
145
We can now be a little bit clearer about what happens when you call a function on given types.
@@ -535,8 +535,9 @@ To illustrate, consider this code, where `b` is global
535
535
b = 1.0
536
536
function g(a)
537
537
global b
538
+
tmp = a
538
539
for i in 1:1_000_000
539
-
tmp = a + b
540
+
tmp = tmp + a + b
540
541
end
541
542
return tmp
542
543
end
@@ -560,24 +561,26 @@ If we eliminate the global variable like so
560
561
561
562
```{code-cell} julia
562
563
function g(a, b)
564
+
tmp = a
563
565
for i in 1:1_000_000
564
-
tmp = a + b
566
+
tmp = tmp + a + b
565
567
end
566
-
return tmp
568
+
return tmp
567
569
end
568
570
```
569
571
570
-
then execution speed improves dramatically
572
+
then execution speed improves dramatically. Furthermore, the number of allocations has dropped to zero.
571
573
572
574
```{code-cell} julia
573
575
@btime g(1.0, 1.0)
574
576
```
575
577
576
-
Note that the second run was dramatically faster than the first.
578
+
Note that if you called `@time` instead, the first call would be slower as it would need to compile the function. Conversely, `@btime` discards the first call and runs it multiple times.
577
579
578
-
That's because the first call included the time for JIT compilation.
579
-
580
-
Notice also how small the memory footprint of the execution is.
580
+
More information is available with `@benchmark` instead,
581
+
```{code-cell} julia
582
+
@benchmark g(1.0, 1.0)
583
+
```
581
584
582
585
Also, the machine code is simple and clean
583
586
@@ -588,20 +591,19 @@ Also, the machine code is simple and clean
588
591
Now the compiler is certain of types throughout execution of the function and
589
592
hence can optimize accordingly.
590
593
591
-
#### The `const` keyword
592
-
593
-
Another way to stabilize the code above is to maintain the global variable but
594
-
prepend it with `const`
594
+
If global variations are strictly needed (and they almost never are) then you can declare them with a `const` to declare to Julia that the type never changes (the value can). For example,
595
595
596
596
```{code-cell} julia
597
597
const b_const = 1.0
598
-
function g(a)
598
+
function g_const(a)
599
599
global b_const
600
+
tmp = a
600
601
for i in 1:1_000_000
601
-
tmp = a + b_const
602
+
tmp = tmp + a + b_const
602
603
end
603
604
return tmp
604
605
end
606
+
@btime g_const(1)
605
607
```
606
608
607
609
Now the compiler can again generate efficient machine code.
@@ -626,7 +628,7 @@ As we'll see, the last of these options gives us the best performance, while sti
626
628
Here's the untyped case
627
629
628
630
```{code-cell} julia
629
-
struct Foo_generic
631
+
struct Foo_any
630
632
a
631
633
end
632
634
```
@@ -639,7 +641,7 @@ struct Foo_abstract
639
641
end
640
642
```
641
643
642
-
Finally, here's the parametrically typed case
644
+
Finally, here's the parametrically typed case (where the `{T <: Real}` is not necessary for performance, and could simply be `{T}`
643
645
644
646
```{code-cell} julia
645
647
struct Foo_concrete{T <: Real}
@@ -650,7 +652,7 @@ end
650
652
Now we generate instances
651
653
652
654
```{code-cell} julia
653
-
fg = Foo_generic(1.0)
655
+
fg = Foo_any(1.0)
654
656
fa = Foo_abstract(1.0)
655
657
fc = Foo_concrete(1.0)
656
658
```
@@ -669,14 +671,15 @@ Here's a function that uses the field `a` of our objects
669
671
670
672
```{code-cell} julia
671
673
function f(foo)
674
+
tmp = foo.a
672
675
for i in 1:1_000_000
673
676
tmp = i + foo.a
674
677
end
675
678
return tmp
676
679
end
677
680
```
678
681
679
-
Let's try timing our code, starting with the generic case:
682
+
Let's try timing our code, starting with the case without any constraints:
680
683
681
684
```{code-cell} julia
682
685
@btime f($fg)
@@ -690,7 +693,7 @@ Here's the nasty looking machine code
690
693
@code_native f(fg)
691
694
```
692
695
693
-
The abstract case is similar
696
+
The abstract case is almost identical,
694
697
695
698
```{code-cell} julia
696
699
@btime f($fa)
@@ -706,14 +709,18 @@ Finally, let's look at the parametrically typed version
706
709
@btime f($fc)
707
710
```
708
711
709
-
Some of this time is JIT compilation, and one more execution gets us down to.
712
+
Which is improbably small - since a runtime of 1-2 nanoseconds without any allocations suggests no computations really took place.
710
713
711
-
Here's the corresponding machine code
714
+
A hint is in the simplicity of the corresponding machine code
712
715
713
716
```{code-cell} julia
714
717
@code_native f(fc)
715
718
```
716
719
720
+
This machine code has none of the hallmark assembly instructions associated with a loop, in particular loops (e.g. `for` in julia) end up as jumps in the machine code (e.g. `jne`).
721
+
722
+
Here, the compiler was smart enough to realize that only the final step in the loop matters, i.e. it could generate the equivalent to `f(a) = 1_000_000 + foo.a` and then it can return that directly and skip the loop. These sorts of code optimizations are only possible if the compiler is able to use a great deal of information about the types.
723
+
717
724
718
725
Finally, note that if we compile a slightly different version of the function, which doesn't actually return the value
719
726
```{code-cell} julia
@@ -722,26 +729,11 @@ function f_no_return(foo)
722
729
tmp = i + foo.a
723
730
end
724
731
end
725
-
```
726
-
That
727
-
```{code-cell} julia
728
-
@btime f_no_return($fc)
729
-
```
730
-
Which seems improbably small. The machine code gives a hint,
731
-
```{code-cell} julia
732
732
@code_native f_no_return(fc)
733
733
```
734
+
We see that the code is even simpler. In effect, if figured out that because `tmp` wasn't returned, and the `foo.a` could have no side effects (since it knows the type of `a`), that it doesn't even need to execute any of the code in the function.
734
735
735
-
Note that in this case, the machine code is doing nothing. The compiler was able to prove that the function had no side effects and hence it could simply ignore the inputs.
736
-
737
-
This is not the case with the abstract case, because the compiler is unable to prove this invariant.
738
-
739
-
```{code-cell} julia
740
-
@btime f_no_return($fa)
741
-
```
742
-
### Abstract Containers
743
-
744
-
Another way we can run into trouble is with abstract container types.
736
+
### Type Inference
745
737
746
738
Consider the following function, which essentially does the same job as Julia's `sum()` function but acts only on floating point data
747
739
@@ -758,22 +750,45 @@ end
758
750
Calls to this function run very quickly
759
751
760
752
```{code-cell} julia
761
-
x = range(0, 1, length = Int(1e6))
762
-
x = collect(x)
753
+
x_range = range(0, 1, length = 100_000)
754
+
x = collect(x_range)
763
755
typeof(x)
764
756
```
765
757
766
758
```{code-cell} julia
767
759
@btime sum_float_array($x)
768
760
```
769
761
770
-
When Julia compiles this function, it knows that the data passed in as `x` will be an array of 64 bit floats.
762
+
When Julia compiles this function, it knows that the data passed in as `x` will be an array of 64 bit floats. Hence it's known to the compiler that the relevant method for `+` is always addition of floating point numbers.
763
+
764
+
But consider a version without that type annotation
765
+
766
+
```{code-cell} julia
767
+
function sum_array(x)
768
+
sum = 0.0
769
+
for i in eachindex(x)
770
+
sum += x[i]
771
+
end
772
+
return sum
773
+
end
774
+
@btime sum_array($x)
775
+
```
776
+
777
+
Note that this has the same running time as the one with the explicit types. In julia, there is (almost never) performance gain from declaring types, and if anything they can make things worse by limited the potential for specialized algorithms. See {doc}`generic programming <../more_julia/generic_programming>` for more.
778
+
779
+
As an example within Julia code, look at the built-in sum for array
771
780
772
-
Hence it's known to the compiler that the relevant method for `+` is always addition of floating point numbers.
781
+
```{code-cell} julia
782
+
@btime sum($x)
783
+
```
784
+
785
+
Versus the underlying range
773
786
774
-
Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify memory access.
787
+
```{code-cell} julia
788
+
@btime sum($x_range)
789
+
```
775
790
776
-
Finally, data types are stable --- for example, the local variable `sum` starts off as a float and remains a float throughout.
791
+
Note that the difference in speed is enormous---suggesting it is better to keep things in their more structured forms as long as possible. You can check the underlying source used for this with `@which sum(x_range)` to see the specialized algorithm used.
0 commit comments