Skip to content

Commit 3922832

Browse files
committed
differences for PR #79
1 parent e6cc7e7 commit 3922832

8 files changed

+233
-220
lines changed

acknowledgements.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ Anastasiia Shcherbakova and Mira Sarkis of [ICR-RSE](https://github.com/ICR-RSE-
1616

1717
**Resources**
1818

19-
Most of the content was drawn from the education and experience of the author, however the below resources provided inspiration:
19+
Most of the content was drawn from the education and experience of the authors, however the below resources provided inspiration:
2020

21-
* [High Performance Python, 2nd Edition](https://www.oreilly.com/library/view/high-performance-python/9781492055013/): This excellent book goes far deeper than this short course in explaining how to maximise performance in Python, however it inspired the examples; [memory allocation is not free](optimisation-memory.html#memory-allocation-is-not-free) and [vectorisation](optimisation-memory.html#memory-allocation-is-not-free).
22-
* [What scientists must know about hardware to write fast code](https://viralinstruction.com/posts/hardware/): This notebook provides an array of hardware lessons relevant to programming for performance, which could be similarly found in most undergraduate Computer Science courses. Although the notebook is grounded in Julia, a lower level language than Python, it is referring to hardware so many of same lessons are covered in the [memory episode](optimisation-memory.html).
21+
* [High Performance Python, 2nd Edition](https://www.oreilly.com/library/view/high-performance-python/9781492055013/): This excellent book goes far deeper than this short course in explaining how to maximise performance in Python, however it inspired the examples; [memory allocation is not free](optimisation-latency.html#memory-allocation-is-not-free) and [vectorisation](optimisation-latency.html#memory-allocation-is-not-free).
22+
* [What scientists must know about hardware to write fast code](https://viralinstruction.com/posts/hardware/): This notebook provides an array of hardware lessons relevant to programming for performance, which could be similarly found in most undergraduate Computer Science courses. Although the notebook is grounded in Julia, a lower level language than Python, it is referring to hardware so many of same lessons are covered in the [lRWBXT episode](optimisation-latency).
2323
* [Why Python is Slow: Looking Under the Hood](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/): This blog post looks under the hood of CPython to explain why Python is often slower than C (and NumPy). We reproduced two of its figures in the [optimisation introduction](optimisation-introduction.html) and [numpy](optimisation-numpy) episodes to explain how memory is laid out.

config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,13 +70,14 @@ episodes:
7070
- long-break1.md
7171
- optimisation-numpy.md
7272
- optimisation-use-latest.md
73-
- optimisation-memory.md
73+
- optimisation-latency.md
7474
- optimisation-conclusion.md
7575

7676
# Information for Learners
7777
learners:
7878
- setup.md
7979
- registration.md
80+
- technical-appendix.md
8081
- acknowledgements.md
8182
- ppp.md
8283
- reference.md

md5sum.txt

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"file" "checksum" "built" "date"
22
"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-03-20"
33
"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-03-20"
4-
"config.yaml" "56ca7ce668b34aa31b8305aa219c4097" "site/built/config.yaml" "2025-03-12"
4+
"config.yaml" "d4f4ee96aa442feb6ebfa96328375ecb" "site/built/config.yaml" "2025-05-10"
55
"index.md" "2c5fa878c2981562a87585b186fa69e5" "site/built/index.md" "2025-03-24"
66
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-03-20"
77
"episodes/profiling-introduction.md" "a9185138a2b3d7e91a4039aa48e573a7" "site/built/profiling-introduction.md" "2025-03-24"
@@ -10,17 +10,18 @@
1010
"episodes/profiling-lines.md" "01ea49015e5860c938e308ed805f773f" "site/built/profiling-lines.md" "2025-03-24"
1111
"episodes/profiling-conclusion.md" "b5687e26387b353ef23c6292f295ca02" "site/built/profiling-conclusion.md" "2024-03-28"
1212
"episodes/optimisation-introduction.md" "aacb6eaab453c48a49727f28ca7620bd" "site/built/optimisation-introduction.md" "2025-03-26"
13-
"episodes/optimisation-using-python.md" "4df4b9e30bdd954b72394ca9ea5adbfd" "site/built/optimisation-using-python.md" "2025-03-24"
14-
"episodes/optimisation-data-structures-algorithms.md" "d1412933ed510b5f5af23cf5e67c4e96" "site/built/optimisation-data-structures-algorithms.md" "2025-03-24"
13+
"episodes/optimisation-using-python.md" "60bc65cfb1d7b23923e9f5d41a2fcf82" "site/built/optimisation-using-python.md" "2025-05-10"
14+
"episodes/optimisation-data-structures-algorithms.md" "61cfabe16872772f82be1d11424cb867" "site/built/optimisation-data-structures-algorithms.md" "2025-05-10"
1515
"episodes/long-break1.md" "19a5c42e45032003c36ad8f413f44528" "site/built/long-break1.md" "2024-03-28"
1616
"episodes/optimisation-numpy.md" "d506aafdf7ffc9d0f2ba1582fb774899" "site/built/optimisation-numpy.md" "2025-03-24"
1717
"episodes/optimisation-use-latest.md" "23898ec5fdcf9a712ed346fb82c0baf7" "site/built/optimisation-use-latest.md" "2025-03-08"
18-
"episodes/optimisation-memory.md" "475ac23d45a4d13887b5f3946080acdd" "site/built/optimisation-memory.md" "2025-03-24"
19-
"episodes/optimisation-conclusion.md" "e69026037459b5585732ee2116b9fd63" "site/built/optimisation-conclusion.md" "2025-03-10"
18+
"episodes/optimisation-latency.md" "f5b0f79195bc682fe657dd14709de081" "site/built/optimisation-latency.md" "2025-05-10"
19+
"episodes/optimisation-conclusion.md" "8bff8b9ecb79cf4a85724b27c3d93ef4" "site/built/optimisation-conclusion.md" "2025-05-10"
2020
"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-03-20"
2121
"learners/setup.md" "a8827b641b5a3691cb00c9ec97678625" "site/built/setup.md" "2025-03-24"
2222
"learners/registration.md" "7f90eb90170fa462cd615546de559895" "site/built/registration.md" "2024-04-08"
23-
"learners/acknowledgements.md" "16008a17b43752c9ba1eebb4b0d86061" "site/built/acknowledgements.md" "2025-03-24"
23+
"learners/technical-appendix.md" "befd1a4f0aab1df5fe080afef39b9724" "site/built/technical-appendix.md" "2025-05-10"
24+
"learners/acknowledgements.md" "dd7d75e74e424adbf1afb88cceb26852" "site/built/acknowledgements.md" "2025-05-10"
2425
"learners/ppp.md" "c06d345f91bbd37b1b554de3c9e810ad" "site/built/ppp.md" "2025-03-27"
2526
"learners/reference.md" "64d0772cb809aa17b51495c296a01270" "site/built/reference.md" "2024-09-20"
2627
"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-03-20"

optimisation-conclusion.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,9 @@ Your feedback enables us to improve the course for future attendees!
5454
- Where feasible, the latest version of Python and packages should be used as they can include significant free improvements to the performance of your code.
5555
- There is a risk that updating Python or packages will not be possible to due to version incompatibilities or will require breaking changes to your code.
5656
- Changes to packages may impact results output by your code, ensure you have a method of validation ready prior to attempting upgrades.
57-
- How the Computer Hardware Affects Performance
58-
- Sequential accesses to memory (RAM or disk) will be faster than random or scattered accesses.
59-
- This is not always natively possible in Python without the use of packages such as NumPy and Pandas
57+
- How Latency Affects Performance
6058
- One large file is preferable to many small files.
59+
- Network requests can be parallelised to reduce the impact of fixed overheads.
6160
- Memory allocation is not free, avoiding destroying and recreating objects can improve performance.
6261

6362
::::::::::::::::::::::::::::::::::::::::::::::::

optimisation-data-structures-algorithms.md

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -156,14 +156,13 @@ Since Python 3.6, the items within a dictionary will iterate in the order that t
156156

157157
### Hashing Data Structures
158158

159-
Python's dictionaries are implemented as hashing data structures.
160-
Explaining how these work will get a bit technical, so let's start with an analogy:
159+
Python's dictionaries are implemented as hashing data structures, we can understand these at a high-level with an analogy:
161160

162161
A Python list is like having a single long bookshelf. When you buy a new book (append a new element to the list), you place it at the far end of the shelf, right after all the previous books.
163162

164163
![A bookshelf corresponding to a Python list.](episodes/fig/bookshelf_list.jpg){alt="An image of a single long bookshelf, with a large number of books."}
165164

166-
A hashing data structure is more like a bookcase with several shelves, labelled by genre (sci-fi, romance, children's books, non-fiction, …) and author surname. When you buy a new book by Jules Verne, you might place it on the shelf labelled "Sci-Fi, V–Z".
165+
A Python dictionary is more like a bookcase with several shelves, labelled by genre (sci-fi, romance, children's books, non-fiction, …) and author surname. When you buy a new book by Jules Verne, you might place it on the shelf labelled "Sci-Fi, V–Z".
167166
And if you keep adding more books, at some point you'll move to a larger bookcase with more shelves (and thus more fine-grained sorting), to make sure you don't have too many books on a single shelf.
168167

169168
![A bookshelf corresponding to a Python dictionary.](episodes/fig/bookshelf_dict.jpg){alt="An image of two bookcases, labelled "Sci-Fi" and "Romance". Each bookcase contains shelves labelled in alphabetical order, with zero or few books on each shelf."}
@@ -186,25 +185,14 @@ In practice, therefore, this trade-off between memory usage and speed is usually
186185

187186
::::::::::::::::::::::::::::::::::::::::::::::::
188187

188+
When a value is inserted into a dictionary, its key is hashed to decide on which "shelf" it should be stored. Most items will have a unique shelf, allowing them to be accessed directly. This is typically much faster for locating a specific item than searching a list.
189189

190-
::::::::::::::::::::::::::::::::::::: callout
191-
192-
### Technical explanation
193-
194-
Within a hashing data structure each inserted key is hashed to produce a (hopefully unique) integer key.
195-
The dictionary is pre-allocated to a default size, and the key is assigned the index within the dictionary equivalent to the hash modulo the length of the dictionary.
196-
If that index doesn't already contain another key, the key (and any associated values) can be inserted.
197-
When the index isn't free, a collision strategy is applied. CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c) both use a form of open addressing whereby a hash is mutated and corresponding indices probed until a free one is located.
198-
When the hashing data structure exceeds a given load factor (e.g. 2/3 of indices have been assigned keys), the internal storage must grow. This process requires every item to be re-inserted which can be expensive, but reduces the average probes for a key to be found.
199-
200-
![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions."}
201-
202-
To retrieve or check for the existence of a key within a hashing data structure, the key is hashed again and a process equivalent to insertion is repeated. However, now the key at each index is checked for equality with the one provided. If any empty index is found before an equivalent key, then the key must not be present in the data structure.
203190

191+
::::::::::::::::::::::::::::::::::::: callout
204192

205193
### Keys
206194

207-
Keys will typically be a core Python type such as a number or string. However, multiple of these can be combined as a Tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented.
195+
A dictionary's keys will typically be a core Python type such as a number or string. However, multiple of these can be combined as a tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented.
208196

209197
You can implement `__hash__()` by utilising the ability for Python to hash tuples, avoiding the need to implement a bespoke hash function.
210198

0 commit comments

Comments
 (0)