You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/optimisation-data-structures-algorithms.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -151,8 +151,41 @@ Since Python 3.6, the items within a dictionary will iterate in the order that t
151
151
152
152
### Hashing Data Structures
153
153
154
-
<!-- simple explanation of how a hash-based data structure works -->
155
154
Python's dictionaries are implemented as hashing data structures.
155
+
Explaining how these work will get a bit technical, so let’s start with an analogy:
156
+
157
+
A Python list is like having a single long bookshelf. When you buy a new book (append a new element to the list), you place it at the far end of the shelf, right after all the previous books.
158
+
159
+
{alt="An image of a single long bookshelf, with a large number of books."}
160
+
161
+
A hashing data structure is more like a bookcase with several shelves, labeled by genre (sci-fi, romance, children’s books, non-fiction, …) and author surname. When you buy a new book by Jules Verne, you might place it on the shelf labeled “Sci-Fi, V–Z”.
162
+
And if you keep adding more books, at some point you’ll move to a larger bookcase with more shelves (and thus more fine-grained sorting), to make sure you don’t have too many books on a single shelf.
163
+
164
+
{alt="An image of two bookcases, labelled “Sci-Fi” and “Romance”. Each bookcase contains shelves labelled in alphabetical order, with zero or few books on each shelf."}
165
+
166
+
Now, let's say a friend wanted to borrow the book "'—All You Zombies—'" by Robert Heinlein.
167
+
If I had my books arranged on a single bookshelf (in a list), I would have to look through every book I own in order to find it.
168
+
However, if I had a bookcase with several shelves (a hashing data structure), I know immediately that I need to check the shelf “Sci-Fi, G—J”, so I’d be able to find it much more quickly!
169
+
170
+
::::::::::::::::::::::::::::::::::::: instructor
171
+
172
+
The large bookcases in the second illustration, with many shelves almost empty, take up a lot more space than the single shelf in the first illustration.
173
+
This may also be interpreted as the dictionary using more memory than a list.
174
+
175
+
In principle, this is correct. However:
176
+
177
+
* The actual difference is much less pronounced than in the illustration. (A list requires about 8 bytes to keep track of each item, while a dictionary requires about 30 bytes.)
178
+
* In most cases this net size of the list/dictionary itself is negligibly small compared to the size of the objects stored in the list or dictionary (e.g. 41 bytes for an empty string or 112 bytes for an empty NumPy array).
179
+
180
+
In practice, therefore, this trade-off between memory usage and speed is usually worth it.
181
+
182
+
::::::::::::::::::::::::::::::::::::::::::::::::
183
+
184
+
185
+
::::::::::::::::::::::::::::::::::::: callout
186
+
187
+
### Technical explanation
188
+
156
189
Within a hashing data structure each inserted key is hashed to produce a (hopefully unique) integer key.
157
190
The dictionary is pre-allocated to a default size, and the key is assigned the index within the dictionary equivalent to the hash modulo the length of the dictionary.
158
191
If that index doesn't already contain another key, the key (and any associated values) can be inserted.
The only limitation is that where two objects are equal they must have the same hash, hence all member variables which contribute to `__eq__()` should also contribute to `__hash__()` and vice versa (it's fine to have irrelevant or redundant internal members contribute to neither).
192
225
226
+
:::::::::::::::::::::::::::::::::::::
227
+
193
228
## Sets
194
229
195
230
Sets are dictionaries without the values (both are declared using `{}`), a collection of unique keys equivalent to the mathematical set. *Modern CPython now uses a set implementation distinct from that of it's dictionary, however they still behave much the same in terms of performance characteristics.*
0 commit comments