Skip to content

Commit c7340ef

Browse files
committed
improved instructor notes on setup, vectorisation analogy
1 parent 1ccb49f commit c7340ef

File tree

2 files changed

+21
-2
lines changed

2 files changed

+21
-2
lines changed

episodes/optimisation-numpy.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,6 @@ array([ 1. , 2.71828183, 7.3890561 , 20.08553692,
180180
However, broadcasting is not just a nicer way to write mathematical expressions—it can also give a significant performance boost:
181181
Most modern processors are able to apply one instruction across multiple variables simultaneously, instead of sequentially. (In computer science, this is also referred to as "vectorisation".) The manner by which NumPy stores data in arrays enables it to vectorise mathematical operations that are broadcast across arrays.
182182

183-
<!-- Analogy: If you're baking cookies, the oven (CPU register) is big enough to operate on multiple cookies (numbers) simultaneously. So whether you bake 1 cookie or 10, it'll take exactly the same amount of time. -->
184183

185184
```sh
186185
> python -m timeit -s "import numpy; ar = numpy.arange(1)" "ar + 10"
@@ -193,6 +192,18 @@ Most modern processors are able to apply one instruction across multiple variabl
193192
If we were to use a regular `for` loop, the time to perform this operation would increase with the length of the array.
194193
However, using NumPy broadcasting we can apply the addition to 1, 10 or 100 elements, all in the same amount of time!
195194

195+
::::::::::::::::::::::::::::::::::::: instructor
196+
197+
A simple analogy:
198+
199+
If you're baking cookies, the oven (CPU register) is big enough to operate on multiple cookies (numbers) simultaneously. So whether you bake 1 cookie or 10, it'll take exactly the same amount of time.
200+
However, this requires that the cookies are neatly arranged on a baking tray (in a contiguous chunk of memory).
201+
202+
Basic ints/floats in NumPy arrays are arranged like that, so this works great.
203+
In contrast, numbers in a Python list [are spread across memory in a fairly complex arrangement](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/#3.-Python's-object-model-can-lead-to-inefficient-memory-access), so cannot benefit from this unless you convert them to a NumPy array first.
204+
205+
::::::::::::::::::::::::::::::::::::::::::::::::
206+
196207
Earlier it was demonstrated that using core Python methods over a list will outperform a loop, performing the same calculation faster. The below example takes this a step further by demonstrating the calculation of a dot product.
197208

198209
<!-- Inspired by High Performance Python Chapter 6 example
@@ -324,7 +335,6 @@ To vectorise this efficiently, the logic of the code had to be changed slightly:
324335
::::::::::::::::::::::::::::::::::::: instructor
325336

326337
The following code snippet demonstrates how this works for a simplified example.
327-
If you want to run this as a live demo, you need to `pip install shapely` first.
328338

329339
```Python
330340
>>> from shapely import Point, Polygon

learners/setup.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,15 @@ pip install pytest snakeviz line_profiler[all] numpy pandas matplotlib
3333

3434
To complete some of the exercises you will need to use a text-editor or Python IDE, so make sure you have your favourite available.
3535

36+
:::::::::::::: instructor
37+
38+
As the instructor, you should additionally install the `shapely` package, which you may need for a brief demo during the episode on scientific Python packages.
39+
40+
```sh
41+
pip install shapely
42+
```
43+
44+
:::::::::::::::::::::::::
3645

3746

3847
:::::::::::::::: spoiler

0 commit comments

Comments
 (0)