diff --git a/config.yaml b/config.yaml index 9886ea23..76397e97 100644 --- a/config.yaml +++ b/config.yaml @@ -74,7 +74,6 @@ episodes: # Information for Learners learners: - setup.md -- registration.md - acknowledgements.md - ppp.md - reference.md diff --git a/episodes/optimisation-data-structures-algorithms.md b/episodes/optimisation-data-structures-algorithms.md index 30b1a0c4..5e94630a 100644 --- a/episodes/optimisation-data-structures-algorithms.md +++ b/episodes/optimisation-data-structures-algorithms.md @@ -151,16 +151,20 @@ Since Python 3.6, the items within a dictionary will iterate in the order that t ### Hashing Data Structures -Python's dictionaries are implemented as hashing data structures. -Within a hashing data structure each inserted key is hashed to produce a (hopefully unique) integer key. -The dictionary is pre-allocated to a default size, and the key is assigned the index within the dictionary equivalent to the hash modulo the length of the dictionary. -If that index doesn't already contain another key, the key (and any associated values) can be inserted. -When the index isn't free, a collision strategy is applied. CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c) both use a form of open addressing whereby a hash is mutated and corresponding indices probed until a free one is located. -When the hashing data structure exceeds a given load factor (e.g. 2/3 of indices have been assigned keys), the internal storage must grow. This process requires every item to be re-inserted which can be expensive, but reduces the average probes for a key to be found. +Python's dictionaries are implemented using hashing as their underlying data structure. In this structure, each key is hashed to generate a (preferably unique) integer, which serves as the basis for indexing. Dictionaries are initialized with a default size, and the hash value of a key, modulo the dictionary's length, determines its initial index. If this index is available, the key and its associated value are stored there. If the index is already occupied, a collision occurs, and a resolution strategy is applied to find an alternate index. -![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions."} +In CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c)implementations, a technique called open addressing is employed. This approach modifies the hash and probes subsequent indices until an empty one is found. -To retrieve or check for the existence of a key within a hashing data structure, the key is hashed again and a process equivalent to insertion is repeated. However, now the key at each index is checked for equality with the one provided. If any empty index is found before an equivalent key, then the key must not be present in the ata structure. + +When a dictionary or hash table in Python grows, the underlying storage is resized, which necessitates re-inserting every existing item into the new structure. This process can be computationally expensive but is essential for maintaining efficient average probe times when searching for keys. +![A visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram showing how keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. The insertion of 59, 80, and 39 demonstrates linear probing to resolve collisions."} +To look up or verify the existence of a key in a hashing data structure, the key is re-hashed, and the process mirrors that of insertion. The corresponding index is probed to see if it contains the provided key. If the key at the index matches, the operation succeeds. If an empty index is reached before finding the key, it indicates that the key does not exist in the structure. + +The above diagrams shows a hash table of 5 elements within a block of 11 slots: +1. We try to add element k=59. Based on its hash, the intended position is p=4. However, slot 4 is already occupied by the element k=37. This results in a collision. +2. To resolve the collision, the linear probing mechanism is employed. The algorithm checks the next available slot, starting from position p=4. The first available slot is found at position 5. +3. The number of jumps (or steps) it took to find the available slot are represented by i=1 (since we moved from position 4 to 5). +In this case, the number of jumps i=1 indicates that the algorithm had to probe one slot to find an empty position at index 5. ### Keys diff --git a/episodes/optimisation-introduction.md b/episodes/optimisation-introduction.md index 5f9bafec..a5c55aba 100644 --- a/episodes/optimisation-introduction.md +++ b/episodes/optimisation-introduction.md @@ -36,16 +36,16 @@ The remaining content is often abstract knowledge, that is transferable to the v > Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: **premature optimisation is the root of all evil**. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth -This classic quote among computer scientists states; when considering optimisation it is important to focus on the potential impact, both to the performance and maintainability of the code. Advanced optimisations, mostly outside the scope of this course, can increase the cost of maintenance by obfuscating what code is doing. Even if you are a solo-developer working on private code, your future self should be able to easily comprehend your implementation. Therefore, the balance between the impact to both performance and maintainability should be considered when optimising code. +This classic quote among computer scientists emphasizes the importance of considering both performance and maintainability when optimizing code. While advanced optimizations may boost performance, they often come at the cost of making the code harder to understand and maintain. Even if you're working alone on private code, your future self should be able to easily understand the implementation. Hence, when optimizing, always weigh the potential impact on both performance and maintainability. -This is not to say, don't consider performance when first writing code. The selection of appropriate algorithms and data-structures covered in this course form a good practice, simply don't fret over a need to micro-optimise every small component of the code that you write. +This doesn't mean you should ignore performance when initially writing code. Choosing the right algorithms and data structures, as we’ve discussed in this course, is essential. However, there's no need to obsess over micro-optimizing every tiny component of your code—focus on the bigger picture. ## Ensuring Reproducible Results when optimising an existing code -When optimising an existing code, you are making speculative changes. It's easy to make mistakes, many of which can be subtle. Therefore, it's important to have a strategy in place to check that the outputs remain correct. +When optimizing existing code, you're often making speculative changes, which can lead to subtle mistakes. To ensure that your optimizations are actually improving the code without introducing errors, it's crucial to have a solid strategy for checking that the results remain correct. -Testing is hopefully already a seamless part of your research software development process. Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality. +Testing should already be an integral part of your development process. It helps clarify expected behavior, ensures new features are working as intended, and protects against unintended regressions in previously working functionality. Always verify your changes through testing to ensure that the optimizations don’t compromise the correctness of your code. ## pytest Overview diff --git a/episodes/optimisation-memory.md b/episodes/optimisation-memory.md index 27500316..b6f9f4b1 100644 --- a/episodes/optimisation-memory.md +++ b/episodes/optimisation-memory.md @@ -24,36 +24,32 @@ exercises: 0 The storage and movement of data plays a large role in the performance of executing software. -Modern computer's typically have a single processor (CPU), within this processor there are multiple processing cores each capable of executing different code in parallel. - -Data held in memory by running software is exists in RAM, this memory is faster to access than hard drives (and solid-state drives). -But the CPU has much smaller caches on-board, to make accessing the most recent variables even faster. +Modern computers have a single CPU with multiple cores, each capable of working on tasks at the same time. Data used by programs is stored in RAM, which is faster than hard drives or solid-state drives. However, the CPU has even faster memory called caches to access frequently used data quickly. ![An annotated photo of a computer's hardware.](episodes/fig/annotated-motherboard.jpg){alt="An annotated photo of inside a desktop computer's case. The CPU, RAM, power supply, graphics cards (GPUs) and harddrive are labelled."} -When reading a variable, to perform an operation with it, the CPU will first look in it's registers. These exist per core, they are the location that computation is actually performed. Accessing them is incredibly fast, but there only exists enough storage for around 32 variables (typical number, e.g. 4 bytes). -As the register file is so small, most variables won't be found and the CPU's caches will be searched. -It will first check the current processing core's L1 (Level 1) cache, this small cache (typically 64 KB per physical core) is the smallest and fastest to access cache on a CPU. -If the variable is not found in the L1 cache, the L2 cache that is shared between multiple cores will be checked. This shared cache, is slower to access but larger than L1 (typically 1-3MB per core). -This process then repeats for the L3 cache which may be shared among all cores of the CPU. This cache again has higher latency to access, but increased size (typically slightly larger than the total L2 cache size). -If the variable has not been found in any of the CPU's cache, the CPU will look to the computer's RAM. This is an order of magnitude slower to access, with several orders of magnitude greater capacity (tens to hundreds of GB are now standard). +How the CPU Accesses Data? +When the CPU needs to use a variable, it follows these steps: -Correspondingly, the earlier the CPU finds the variable the faster it will be to access. -However, to fully understand the cache's it's necessary to explain what happens once a variable has been found. +1) Registers: First, the CPU checks its own small, super-fast storage (registers). But it only has room for about 32 variables, so it usually doesn’t find the data here. +2) L1 Cache: Next, the CPU looks in the L1 cache. It’s small (64 KB per core) and fast, but it only stores data for a single core. +3) L2 Cache: If the variable isn’t in L1, it checks the larger L2 cache, which is shared by several cores. It’s slower than L1 but still faster than RAM. +4) L3 Cache: If the variable isn’t in L2, the CPU checks the L3 cache, which is shared by all cores. It’s slower than L2 but bigger. +5) RAM: If the variable is still not found, the CPU fetches it from the much slower RAM. +The faster the CPU finds the data in the cache, the quicker it can do the job. +This is why understanding how the cache works can help make things run faster. -If a variable is not found in the caches, so must be fetched from RAM. -The full 64 byte cache line containing the variable, will be copied first into the CPU's L3, then L2 and then L1. -Most variables are only 4 or 8 bytes, so many neighbouring variables are also pulled into the caches. -Similarly, adding new data to a cache evicts old data. -This means that reading 16 integers contiguously stored in memory, should be faster than 16 scattered integers +Cache Details: +When the CPU pulls data from RAM, it loads not just the variable, but also a full 64-byte chunk of memory called a "cache line." +This chunk often contains nearby variables that might be needed soon. When new data is added to the cache, old data is pushed out. -Therefore, to **optimally** access variables they should be stored contiguously in memory with related data and worked on whilst they remain in caches. -If you add to a variable, perform large amount of unrelated processing, then add to the variable again it will likely have been evicted from caches and need to be reloaded from slower RAM again. +Because of this, reading a list of data that’s next to each other in memory (like 16 numbers in a row) is much faster than reading scattered data, since the CPU can keep more of it in the cache. +To make programs run faster, related data should be stored next to each other in memory. +By working with this data while it's still in the cache, the CPU doesn’t have to go all the way to RAM, which is much slower. -It's not necessary to remember this full detail of how memory access work within a computer, but the context perhaps helps understand why memory locality is important. - +While you don’t need to know all the details of how memory works, it’s helpful to know that memory locality—keeping related data together and accessing it in chunks—is key to making programs run faster. ![An abstract diagram showing the path data takes from disk or RAM to be used for computation.](episodes/fig/hardware.png){alt='An abstract representation of a CPU, RAM and Disk, showing their internal caches and the pathways data can pass.'} ::::::::::::::::::::::::::::::::::::: callout diff --git a/episodes/optimisation-minimise-python.md b/episodes/optimisation-minimise-python.md index 5ab4e983..a88f5f9d 100644 --- a/episodes/optimisation-minimise-python.md +++ b/episodes/optimisation-minimise-python.md @@ -20,17 +20,16 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::::::::::::: -Python is an interpreted programming language. When you execute your `.py` file, the (default) CPython back-end compiles your Python source code to an intermediate bytecode. This bytecode is then interpreted in software at runtime generating instructions for the processor as necessary. This interpretation stage, and other features of the language, harm the performance of Python (whilst improving its usability). +Python is an interpreted language. When you run a .py file, the CPython interpreter first converts the Python code into bytecode. CPython is the default implementation of Python, written in C. This bytecode is then processed at runtime to generate instructions for the CPU. While this makes Python easier to use, it can slow down performance. -In comparison, many languages such as C/C++ compile directly to machine code. This allows the compiler to perform low-level optimisations that better exploit hardware nuance to achieve fast performance. This however comes at the cost of compiled software not being cross-platform. +In contrast, languages like C/C++ are compiled directly into machine code, allowing the compiler to optimize for better performance. However, this means C/C++ programs aren't as easily portable across different platforms. -Whilst Python will rarely be as fast as compiled languages like C/C++, it is possible to take advantage of the CPython back-end and packages such as NumPy and Pandas that have been written in compiled languages to expose this performance. +Although Python isn’t as fast as languages like C/C++, it can still be efficient by using tools like NumPy and Pandas, which are written in faster compiled languages. -A simple example of this would be to perform a linear search of a list (in the previous episode we did say this is not recommended). -The below example creates a list of 2500 integers in the inclusive-exclusive range `[0, 5000)`. -It then searches for all the even numbers in that range. -`searchlistPython()` is implemented manually, iterating `ls` checking each individual item in Python code. -`searchListC()` in contrast uses the `in` operator to perform each search, which allows CPython to implement the inner loop in it's C back-end. + +A simple example of this is performing a linear search on a list (though we mentioned in the previous episode that this isn’t the most efficient approach). In the following example, we create a list of 2500 integers in the range `[0, 5000)`. The goal is to search for all even numbers within that range. + +The function `searchlistPython()` manually iterates through the list (`ls`) and checks each item using Python code. On the other hand, `searchListC()` uses the `in` operator, which lets CPython handle the search more efficiently by running the inner loop in its C back-end. ```python import random