differences for PR #3

actions-user · actions-user · commit c39b0c080527 · 2025-01-29T18:55:12.000Z
diff --git a/fig/python_lists.png b/fig/python_lists.png
diff --git a/md5sum.txt b/md5sum.txt
@@ -5,10 +5,10 @@
 "index.md" "8f0476c27469136028995d6b7c9d4240" "site/built/index.md" "2025-01-14"
 "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2025-01-08"
 "episodes/optimisation-introduction.md" "4ca162f5e35aa54d9618423d84a200cd" "site/built/optimisation-introduction.md" "2025-01-29"
-"episodes/optimisation-data-structures-algorithms.md" "a7cdce11f55fde5a86e6ae49e4b95645" "site/built/optimisation-data-structures-algorithms.md" "2025-01-29"
+"episodes/optimisation-data-structures-algorithms.md" "0003554a6f306735b8891ce59ec5224d" "site/built/optimisation-data-structures-algorithms.md" "2025-01-29"
 "episodes/optimisation-minimise-python.md" "a4ee08b0ba064aaf4271d8712b29af17" "site/built/optimisation-minimise-python.md" "2025-01-29"
 "episodes/optimisation-use-latest.md" "5948276773890e97b7898292fddbcb39" "site/built/optimisation-use-latest.md" "2025-01-08"
-"episodes/optimisation-memory.md" "dc08f479e4758bcaea243f11251c2464" "site/built/optimisation-memory.md" "2025-01-29"
+"episodes/optimisation-memory.md" "2e3f414bceba47f1a3f814880fbfa20f" "site/built/optimisation-memory.md" "2025-01-29"
 "episodes/optimisation-conclusion.md" "567478d44c721cbf1bc8a71297a54a56" "site/built/optimisation-conclusion.md" "2025-01-08"
 "episodes/long-break1.md" "dea66ed9de52386eebf67722b167a2a8" "site/built/long-break1.md" "2025-01-14"
 "episodes/profiling-introduction.md" "e9fe7f86f9704b3e3655b55c0097ed67" "site/built/profiling-introduction.md" "2025-01-23"
diff --git a/optimisation-data-structures-algorithms.md b/optimisation-data-structures-algorithms.md
@@ -63,6 +63,9 @@ CPython for example uses [`newsize + (newsize >> 3) + 6`](https://github.com/pyt
 
 ![The relationship between the number of appends to an empty list, and the number of internal resizes in CPython.](episodes/fig/cpython_list_allocations.png){alt='A line graph displaying the relationship between the number of calls to append() and the number of internal resizes of a CPython list. It has a logarithmic relationship, at 1 million appends there have been 84 internal resizes.'}
 
+![Visual note on resizing behaviour of Python lists.](episodes/fig/python_lists.png){alt='Small cheat note for better visualization of Python lists.'}
+
+
 This has two implications:
 
 * If you are creating large static lists, they will use up to 12.5% excess memory.
@@ -155,7 +158,6 @@ Python's dictionaries are implemented using hashing as their underlying data str
 
 In CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c)implementations, a technique called open addressing is employed. This approach modifies the hash and probes subsequent indices until an empty one is found.
 
-
 When a dictionary or hash table in Python grows, the underlying storage is resized, which necessitates re-inserting every existing item into the new structure. This process can be computationally expensive but is essential for maintaining efficient average probe times when searching for keys.
 ![A visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram showing how keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. The insertion of 59, 80, and 39 demonstrates linear probing to resolve collisions."}
 To look up or verify the existence of a key in a hashing data structure, the key is re-hashed, and the process mirrors that of insertion. The corresponding index is probed to see if it contains the provided key. If the key at the index matches, the operation succeeds. If an empty index is reached before finding the key, it indicates that the key does not exist in the structure.
@@ -284,7 +286,7 @@ uniqueListSort: 2.67ms
 
 Independent of the performance to construct a unique set (as covered in the previous section), it's worth identifying the performance to search the data-structure to retrieve an item or check whether it exists.
 
-The performance of a hashing data structure is subject to the load factor and number of collisions. An item that hashes with no collision can be checked almost directly, whereas one with collisions will probe until it finds the correct item or an empty slot. In the worst possible case, whereby all insert items have collided this would mean checking every single item. In practice, hashing data-structures are designed to minimise the chances of this happening and most items should be found or identified as missing with single access.
+The performance of a hashing data structure is subject to the load factor and number of collisions. An item that hashes with no collision can be checked almost directly, whereas one with collisions will probe until it finds the correct item or an empty slot. In the worst possible case, whereby all insert items have collided this would mean checking every single item. In practice, hashing data-structures are designed to minimise the chances of this happening and most items should be found or identified as missing with single access, result in an average time complexity of a constant (which is very good!).
 
 In contrast, if searching a list or array, the default approach is to start at the first item and check all subsequent items until the correct item has been found. If the correct item is not present, this will require the entire list to be checked. Therefore, the worst-case is similar to that of the hashing data-structure, however it is guaranteed in cases where the item is missing. Similarly, on-average we would expect an item to be found halfway through the list, meaning that an average search will require checking half of the items.
 
diff --git a/optimisation-memory.md b/optimisation-memory.md
@@ -173,7 +173,6 @@ Within Python memory is not explicitly allocated and deallocated, instead it is
 The below implementation of the [heat-equation](https://en.wikipedia.org/wiki/Heat_equation), reallocates `out_grid`, a large 2 dimensional (500x500) list each time `update()` is called which progresses the model.
 
 ```python
-import time
 grid_shape = (512, 512)
 
 def update(grid, a_dt):
@@ -222,7 +221,6 @@ Line #      Hits         Time  Per Hit   % Time  Line Contents
 If instead `out_grid` is double buffered, such that two buffers are allocated outside the function, which are swapped after each call to update().
 
 ```python
-import time
 grid_shape = (512, 512)
 
 def update(grid, a_dt, out_grid):