You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: acknowledgements.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,8 +16,8 @@ Anastasiia Shcherbakova and Mira Sarkis of [ICR-RSE](https://github.com/ICR-RSE-
16
16
17
17
**Resources**
18
18
19
-
Most of the content was drawn from the education and experience of the author, however the below resources provided inspiration:
19
+
Most of the content was drawn from the education and experience of the authors, however the below resources provided inspiration:
20
20
21
-
*[High Performance Python, 2nd Edition](https://www.oreilly.com/library/view/high-performance-python/9781492055013/): This excellent book goes far deeper than this short course in explaining how to maximise performance in Python, however it inspired the examples; [memory allocation is not free](optimisation-memory.html#memory-allocation-is-not-free) and [vectorisation](optimisation-memory.html#memory-allocation-is-not-free).
22
-
*[What scientists must know about hardware to write fast code](https://viralinstruction.com/posts/hardware/): This notebook provides an array of hardware lessons relevant to programming for performance, which could be similarly found in most undergraduate Computer Science courses. Although the notebook is grounded in Julia, a lower level language than Python, it is referring to hardware so many of same lessons are covered in the [memory episode](optimisation-memory.html).
21
+
*[High Performance Python, 2nd Edition](https://www.oreilly.com/library/view/high-performance-python/9781492055013/): This excellent book goes far deeper than this short course in explaining how to maximise performance in Python, however it inspired the examples; [memory allocation is not free](optimisation-latency.html#memory-allocation-is-not-free) and [vectorisation](optimisation-latency.html#memory-allocation-is-not-free).
22
+
*[What scientists must know about hardware to write fast code](https://viralinstruction.com/posts/hardware/): This notebook provides an array of hardware lessons relevant to programming for performance, which could be similarly found in most undergraduate Computer Science courses. Although the notebook is grounded in Julia, a lower level language than Python, it is referring to hardware so many of same lessons are covered in the [lRWBXT episode](optimisation-latency).
23
23
*[Why Python is Slow: Looking Under the Hood](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/): This blog post looks under the hood of CPython to explain why Python is often slower than C (and NumPy). We reproduced two of its figures in the [optimisation introduction](optimisation-introduction.html) and [numpy](optimisation-numpy) episodes to explain how memory is laid out.
Copy file name to clipboardExpand all lines: optimisation-conclusion.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,10 +54,9 @@ Your feedback enables us to improve the course for future attendees!
54
54
- Where feasible, the latest version of Python and packages should be used as they can include significant free improvements to the performance of your code.
55
55
- There is a risk that updating Python or packages will not be possible to due to version incompatibilities or will require breaking changes to your code.
56
56
- Changes to packages may impact results output by your code, ensure you have a method of validation ready prior to attempting upgrades.
57
-
- How the Computer Hardware Affects Performance
58
-
- Sequential accesses to memory (RAM or disk) will be faster than random or scattered accesses.
59
-
- This is not always natively possible in Python without the use of packages such as NumPy and Pandas
57
+
- How Latency Affects Performance
60
58
- One large file is preferable to many small files.
59
+
- Network requests can be parallelised to reduce the impact of fixed overheads.
61
60
- Memory allocation is not free, avoiding destroying and recreating objects can improve performance.
Copy file name to clipboardExpand all lines: optimisation-data-structures-algorithms.md
+5-17Lines changed: 5 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -156,14 +156,13 @@ Since Python 3.6, the items within a dictionary will iterate in the order that t
156
156
157
157
### Hashing Data Structures
158
158
159
-
Python's dictionaries are implemented as hashing data structures.
160
-
Explaining how these work will get a bit technical, so let's start with an analogy:
159
+
Python's dictionaries are implemented as hashing data structures, we can understand these at a high-level with an analogy:
161
160
162
161
A Python list is like having a single long bookshelf. When you buy a new book (append a new element to the list), you place it at the far end of the shelf, right after all the previous books.
163
162
164
163
{alt="An image of a single long bookshelf, with a large number of books."}
165
164
166
-
A hashing data structure is more like a bookcase with several shelves, labelled by genre (sci-fi, romance, children's books, non-fiction, …) and author surname. When you buy a new book by Jules Verne, you might place it on the shelf labelled "Sci-Fi, V–Z".
165
+
A Python dictionary is more like a bookcase with several shelves, labelled by genre (sci-fi, romance, children's books, non-fiction, …) and author surname. When you buy a new book by Jules Verne, you might place it on the shelf labelled "Sci-Fi, V–Z".
167
166
And if you keep adding more books, at some point you'll move to a larger bookcase with more shelves (and thus more fine-grained sorting), to make sure you don't have too many books on a single shelf.
168
167
169
168
{alt="An image of two bookcases, labelled "Sci-Fi" and "Romance". Each bookcase contains shelves labelled in alphabetical order, with zero or few books on each shelf."}
@@ -186,25 +185,14 @@ In practice, therefore, this trade-off between memory usage and speed is usually
186
185
187
186
::::::::::::::::::::::::::::::::::::::::::::::::
188
187
188
+
When a value is inserted into a dictionary, its key is hashed to decide on which "shelf" it should be stored. Most items will have a unique shelf, allowing them to be accessed directly. This is typically much faster for locating a specific item than searching a list.
189
189
190
-
::::::::::::::::::::::::::::::::::::: callout
191
-
192
-
### Technical explanation
193
-
194
-
Within a hashing data structure each inserted key is hashed to produce a (hopefully unique) integer key.
195
-
The dictionary is pre-allocated to a default size, and the key is assigned the index within the dictionary equivalent to the hash modulo the length of the dictionary.
196
-
If that index doesn't already contain another key, the key (and any associated values) can be inserted.
197
-
When the index isn't free, a collision strategy is applied. CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c) both use a form of open addressing whereby a hash is mutated and corresponding indices probed until a free one is located.
198
-
When the hashing data structure exceeds a given load factor (e.g. 2/3 of indices have been assigned keys), the internal storage must grow. This process requires every item to be re-inserted which can be expensive, but reduces the average probes for a key to be found.
199
-
200
-
{alt="A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions."}
201
-
202
-
To retrieve or check for the existence of a key within a hashing data structure, the key is hashed again and a process equivalent to insertion is repeated. However, now the key at each index is checked for equality with the one provided. If any empty index is found before an equivalent key, then the key must not be present in the data structure.
203
190
191
+
::::::::::::::::::::::::::::::::::::: callout
204
192
205
193
### Keys
206
194
207
-
Keys will typically be a core Python type such as a number or string. However, multiple of these can be combined as a Tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented.
195
+
A dictionary's keys will typically be a core Python type such as a number or string. However, multiple of these can be combined as a tuple to form a compound key, or a custom class can be used if the methods `__hash__()` and `__eq__()` have been implemented.
208
196
209
197
You can implement `__hash__()` by utilising the ability for Python to hash tuples, avoiding the need to implement a bespoke hash function.
0 commit comments