You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/optimisation-numpy.md
+24-24Lines changed: 24 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ exercises: 0
18
18
19
19
::::::::::::::::::::::::::::::::::::::::::::::::
20
20
21
-
Earlier, we saw that builtin functions in Python, like `sum()`, are often faster than manually looping over a list. This is because those high-level functions are able to do most of the work in the C backend
21
+
Earlier, we saw that built-in Python functions, like `sum()`, are often faster than manually looping over a list. This is because those high-level functions are able to do most of the work in the C backend.
22
22
23
23
Packages like NumPy and Pandas work similarly: They have been written in compiled languages to expose this performance across a wide range of scientific workloads.
24
24
@@ -28,17 +28,17 @@ Packages like NumPy and Pandas work similarly: They have been written in compile
28
28
29
29
[NumPy](https://numpy.org/) is a commonly used package for scientific computing, which provides a wide variety of methods.
30
30
31
-
It adds restriction via it's own [basic numeric types](https://numpy.org/doc/stable/user/basics.types.html), and static arrays to enable even greater performance than that of core Python. However if these restrictions are ignored, the performance can become significantly worse.
31
+
It adds restriction via its own [basic numeric types](https://numpy.org/doc/stable/user/basics.types.html) and static arrays to enable even greater performance than that of core Python. However if these restrictions are ignored, the performance can become significantly worse.
32
32
33
33
<!--
34
34
TODO: It might be nice to use the figure from https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/#3.-Python's-object-model-can-lead-to-inefficient-memory-access here for illustration?
35
35
36
36
{alt="A diagram illustrating the difference between a NumPy array and a Python list. The NumPy array is a raw block of memory containing numerical values. A Python list contains a header with metadata and multiple items, each of which is a reference to another Python object with its own header and value."}
37
37
-->
38
38
39
-
### NumPy arrays and Python lists live in two separate worlds
39
+
### NumPy Arrays and Python Lists Live in Two Separate Worlds
40
40
41
-
NumPy's arrays (not to be confused with the core Python `array` package) are static arrays. Unlike core Python's lists, they do not dynamically resize. Therefore if you wish to append to a NumPy array, you must call `resize()` first. If you treat this like `append()` for a Python list, resizing for each individual append you will be performing significantly more copies and memory allocations than a Python list.
41
+
NumPy's arrays (not to be confused with the core Python `array` package) are static arrays. Unlike core Python's lists, they do not dynamically resize. Therefore if you wish to append to a NumPy array, you must call `resize()` first. If you treat this like `append()` for a Python list, resizing for each individual append, you will be performing significantly more copies and memory allocations than a Python list.
42
42
43
43
The below example sees lists and arrays constructed from `range(100000)`.
For Python lists, we’ve seen earlier that list comprehensions are more efficient, so we prefer to avoid using a large number of `append` operations if possible. Similarly, we should try to avoid resizing NumPy arrays, where the overhead is even higher (5.2x slower than a list, probably 10x slower than list comprehension).
67
+
For Python lists, we've seen earlier that list comprehensions are more efficient, so we prefer to avoid using a large number of `append` operations if possible. Similarly, we should try to avoid resizing NumPy arrays, where the overhead is even higher (5.2x slower than a list, probably 10x slower than list comprehension).
68
68
69
69
```output
70
70
list_append: 3.50ms
@@ -130,9 +130,9 @@ There is however a trade-off, using `numpy.random.choice()` can be clearer to so
130
130
131
131
:::::::::::::::::::::::::::::::::::::::::::::
132
132
133
-
### Array broadcasting
133
+
### Array Broadcasting
134
134
135
-
NumPy arrays support “[broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)” many mathematical operations or functions.
135
+
NumPy arrays support "[broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)" many mathematical operations or functions.
136
136
This is a shorthand notation, where the operation/function is applied element-wise without having to loop over the array explicitly:
However, broadcasting isnot just a nicer way to write mathematical expressions—it can also give a significant performance boost:
181
-
Most modern processors are able to apply one instruction across multiple variables simultaneously, instead of sequentially. (In computer science, this is also referred to as“vectorisation”.) The manner by which NumPy stores data in arrays enables it to vectorise mathematical operations that are broadcast across arrays.
181
+
Most modern processors are able to apply one instruction across multiple variables simultaneously, instead of sequentially. (In computer science, this is also referred to as"vectorisation".) The manner by which NumPy stores data in arrays enables it to vectorise mathematical operations that are broadcast across arrays.
182
182
183
-
<!-- Analogy: If you’re baking cookies, the oven (CPU register) is big enough to operate on multiple cookies (numbers) simultaneously. So whether you bake 1 cookie or10, it’ll take exactly the same amount of time. -->
183
+
<!-- Analogy: If you're baking cookies, the oven (CPU register) is big enough to operate on multiple cookies (numbers) simultaneously. So whether you bake 1 cookie or 10, it'll take exactly the same amount of time. -->
@@ -193,14 +193,14 @@ Most modern processors are able to apply one instruction across multiple variabl
193
193
If we were to use a regular `for` loop, the time to perform this operation would increase with the length of the array.
194
194
However, using NumPy broadcasting we can apply the addition to 1, 10 or 100 elements, all in the same amount of time!
195
195
196
-
Earlier in this episode it was demonstrated that using core Python methods over a list, will outperform a loop performing the same calculation faster. The below example takes this a step further by demonstrating the calculation of dot product.
196
+
Earlier in this episode it was demonstrated that using core Python methods over a list will outperform a loop, performing the same calculation faster. The below example takes this a step further by demonstrating the calculation of a dot product.
197
197
198
198
<!-- Inspired by High Performance Python Chapter 6 example
199
199
Added Python sum array, skipped a couple of others-->
200
200
```python
201
201
from timeit import timeit
202
202
203
-
N =1_000_000# Number of elements in list
203
+
N =1000000# Number of elements in list
204
204
205
205
gen_list =f"ls = list(range({N}))"
206
206
gen_array =f"import numpy; ar = numpy.arange({N}, dtype=numpy.int64)"
*`python_sum_list` uses list comprehension to perform the multiplication, followed by the Python core `sum()`. This comes out at 46.93ms
221
-
*`python_sum_array` instead directly multiplies the two arrays, taking advantage of NumPy's vectorisation. But uses the core Python `sum()`, this comes in slightly faster at 33.26ms.
221
+
*`python_sum_array` instead directly multiplies the two arrays (taking advantage of NumPy's vectorisation) but uses the core Python `sum()`, this comes in slightly faster at 33.26ms.
222
222
*`numpy_sum_array` again takes advantage of NumPy's vectorisation for the multiplication, and additionally uses NumPy's `sum()` implementation. These two rounds of vectorisation provide a much faster 1.44ms completion.
223
223
*`numpy_dot_array` instead uses NumPy's `dot()` to calculate the dot product in a single operation. This comes out the fastest at 0.29ms, 162x faster than `python_sum_list`.
224
224
@@ -248,7 +248,7 @@ However, HPC systems should be primed to take advantage, so try increasing the n
248
248
:::::::::::::::::::::::::::::::::::::
249
249
250
250
251
-
## Other libraries that use NumPy
251
+
## Other Libraries That Use NumPy
252
252
253
253
Across the scientific Python software ecosystem, [many domain-specific packages](https://numpy.org/#:~:text=ECOSYSTEM) are built on top of NumPy arrays.
254
254
Similar to the demos above, we can often gain significant performance boosts by using these libraries well.
@@ -257,7 +257,7 @@ Similar to the demos above, we can often gain significant performance boosts by
257
257
258
258
Take a look at the [list of libraries on the NumPy website](https://numpy.org/#:~:text=ECOSYSTEM). Are you using any of them already?
259
259
260
-
If you’ve brought a project you want to work on: Are there areas of the project where you might benefit from adapting one of these libraries instead of writing your own code from scratch?
260
+
If you've brought a project you want to work on: Are there areas of the project where you might benefit from adapting one of these libraries instead of writing your own code from scratch?
261
261
262
262
:::::::::::::::::::::::: hint
263
263
@@ -268,11 +268,11 @@ These libraries could be specific to your area of research; but they could also
268
268
:::::::::::::::::::::::::::::::::::::::::::::::
269
269
270
270
271
-
Which libraries you may use will depend on your research domain; here, we’ll show two examples from our own experience.
271
+
Which libraries you may use will depend on your research domain; here, we'll show two examples from our own experience.
272
272
273
-
### Example: Image analysis with Shapely
273
+
### Example: Image Analysis with Shapely
274
274
275
-
A colleague had a large data set of images of cells. She had already reconstructed the locations of cell walls and various points of interest and needed to identify which points were located in each cell.
275
+
A bioinformatics researcher had a large data set of images of cells. She had already reconstructed the locations of cell walls and various points of interest and needed to identify which points were located in each cell.
276
276
To do this, she used the [Shapely](https://github.com/shapely/shapely) geometry library.
277
277
278
278
```Python
@@ -345,11 +345,11 @@ And since it’s not a very clean example (mixes np arrays and list comprehensio
345
345
<!--
346
346
### Example: Interpolating astrophysical spectra with AstroPy
347
347
348
-
This is from an open-source package I’m working on, so we can look at the actual pull request where I made this change: https://github.com/SNEWS2/snewpy/pull/310
348
+
This is from an open-source package I'm working on, so we can look at the actual pull request where I made this change: https://github.com/SNEWS2/snewpy/pull/310
349
349
350
350
→ See the first table of benchmark results. Note that using a Python `for` loop to calculate the spectrum in 100 different time bins takes 100 times as long as for a single time bin. In the vectorized version, the computing time increases much more slowly.
351
351
352
-
(Note that energies were already vectorized—that’s another factor of 100 we got “for free”!)
352
+
(Note that energies were already vectorized—that's another factor of 100 we got "for free"!)
@@ -366,7 +366,7 @@ Pandas' methods by default operate on columns. Each column or series can be thou
366
366
367
367
Following the theme of this episode, iterating over the rows of a data frame using a `for` loop is not advised. The pythonic iteration will be slower than other approaches.
368
368
369
-
Pandas allows it's own methods to be applied to rows in many cases by passing `axis=1`, where available these functions should be preferred over manual loops. Where you can't find a suitable method, `apply()` can be used, which is similar to `map()`, to apply your own function to rows.
369
+
Pandas allows its own methods to be applied to rows in many cases by passing `axis=1`, where available these functions should be preferred over manual loops. Where you can't find a suitable method, `apply()` can be used, which is similar to `map()`, to apply your own function to rows.
`apply()` is 3x faster than the two `for` approaches, as it avoids the Python `for` loop.
416
+
`apply()` is 4x faster than the two `for` approaches, as it avoids the Python `for` loop.
417
417
418
418
419
419
```output
@@ -422,7 +422,7 @@ for_iterrows: 1677.14ms
422
422
pandas_apply: 390.49ms
423
423
```
424
424
425
-
However, rows don't exist in memory as arrays (columns do!), so `apply()` does not take advantage of NumPys vectorisation. You may be able to go a step further and avoid explicitly operating on rows entirely by passing only the required columns to NumPy.
425
+
However, rows don't exist in memory as arrays (columns do!), so `apply()` does not take advantage of NumPy's vectorisation. You may be able to go a step further and avoid explicitly operating on rows entirely by passing only the required columns to NumPy.
264x faster than `apply()`, 1000x faster than `for``iterrows()`!
435
+
264x faster than `apply()`, 1000x faster than the two `for`approaches!
436
436
437
437
```
438
438
vectorize: 1.48ms
@@ -501,6 +501,6 @@ If you can filter your rows before processing, rather than after, you may signif
501
501
- Python is an interpreted language, this adds an additional overhead at runtime to the execution of Python code. Many core Python and NumPy functions are implemented in faster C/C++, free from this overhead.
502
502
- NumPy can take advantage of vectorisation to process arrays, which can greatly improve performance.
503
503
- Many domain-specific packages are built on top of NumPy and can offer similar performance boosts.
504
-
- Pandas' data tables store columns as arrays, therefore operations applied to columns can take advantage of NumPy’s vectorisation.
504
+
- Pandas' data tables store columns as arrays, therefore operations applied to columns can take advantage of NumPy's vectorisation.
0 commit comments