Skip to content

Commit eb289fa

Browse files
committed
re-write memory section
1 parent 6375824 commit eb289fa

File tree

1 file changed

+17
-21
lines changed

1 file changed

+17
-21
lines changed

episodes/optimisation-memory.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,36 +24,32 @@ exercises: 0
2424
The storage and movement of data plays a large role in the performance of executing software.
2525

2626
<!-- Brief summary of hardware -->
27-
Modern computer's typically have a single processor (CPU), within this processor there are multiple processing cores each capable of executing different code in parallel.
28-
29-
Data held in memory by running software is exists in RAM, this memory is faster to access than hard drives (and solid-state drives).
30-
But the CPU has much smaller caches on-board, to make accessing the most recent variables even faster.
27+
Modern computers have a single CPU with multiple cores, each capable of working on tasks at the same time. Data used by programs is stored in RAM, which is faster than hard drives or solid-state drives. However, the CPU has even faster memory called caches to access frequently used data quickly.
3128

3229
![An annotated photo of a computer's hardware.](episodes/fig/annotated-motherboard.jpg){alt="An annotated photo of inside a desktop computer's case. The CPU, RAM, power supply, graphics cards (GPUs) and harddrive are labelled."}
3330

3431
<!-- Read/operate on variable ram->cpu cache->registers->cpu -->
35-
When reading a variable, to perform an operation with it, the CPU will first look in it's registers. These exist per core, they are the location that computation is actually performed. Accessing them is incredibly fast, but there only exists enough storage for around 32 variables (typical number, e.g. 4 bytes).
36-
As the register file is so small, most variables won't be found and the CPU's caches will be searched.
37-
It will first check the current processing core's L1 (Level 1) cache, this small cache (typically 64 KB per physical core) is the smallest and fastest to access cache on a CPU.
38-
If the variable is not found in the L1 cache, the L2 cache that is shared between multiple cores will be checked. This shared cache, is slower to access but larger than L1 (typically 1-3MB per core).
39-
This process then repeats for the L3 cache which may be shared among all cores of the CPU. This cache again has higher latency to access, but increased size (typically slightly larger than the total L2 cache size).
40-
If the variable has not been found in any of the CPU's cache, the CPU will look to the computer's RAM. This is an order of magnitude slower to access, with several orders of magnitude greater capacity (tens to hundreds of GB are now standard).
32+
How the CPU Accesses Data?
33+
When the CPU needs to use a variable, it follows these steps:
4134

42-
Correspondingly, the earlier the CPU finds the variable the faster it will be to access.
43-
However, to fully understand the cache's it's necessary to explain what happens once a variable has been found.
35+
1) Registers: First, the CPU checks its own small, super-fast storage (registers). But it only has room for about 32 variables, so it usually doesn’t find the data here.
36+
2) L1 Cache: Next, the CPU looks in the L1 cache. It’s small (64 KB per core) and fast, but it only stores data for a single core.
37+
3) L2 Cache: If the variable isn’t in L1, it checks the larger L2 cache, which is shared by several cores. It’s slower than L1 but still faster than RAM.
38+
4) L3 Cache: If the variable isn’t in L2, the CPU checks the L3 cache, which is shared by all cores. It’s slower than L2 but bigger.
39+
5) RAM: If the variable is still not found, the CPU fetches it from the much slower RAM.
40+
The faster the CPU finds the data in the cache, the quicker it can do the job.
41+
This is why understanding how the cache works can help make things run faster.
4442

45-
If a variable is not found in the caches, so must be fetched from RAM.
46-
The full 64 byte cache line containing the variable, will be copied first into the CPU's L3, then L2 and then L1.
47-
Most variables are only 4 or 8 bytes, so many neighbouring variables are also pulled into the caches.
48-
Similarly, adding new data to a cache evicts old data.
49-
This means that reading 16 integers contiguously stored in memory, should be faster than 16 scattered integers
43+
Cache Details:
44+
When the CPU pulls data from RAM, it loads not just the variable, but also a full 64-byte chunk of memory called a "cache line."
45+
This chunk often contains nearby variables that might be needed soon. When new data is added to the cache, old data is pushed out.
5046

51-
Therefore, to **optimally** access variables they should be stored contiguously in memory with related data and worked on whilst they remain in caches.
52-
If you add to a variable, perform large amount of unrelated processing, then add to the variable again it will likely have been evicted from caches and need to be reloaded from slower RAM again.
47+
Because of this, reading a list of data that’s next to each other in memory (like 16 numbers in a row) is much faster than reading scattered data, since the CPU can keep more of it in the cache.
48+
To make programs run faster, related data should be stored next to each other in memory.
49+
By working with this data while it's still in the cache, the CPU doesn’t have to go all the way to RAM, which is much slower.
5350

5451
<!-- Latency/Throughput typically inversely proportional to capacity -->
55-
It's not necessary to remember this full detail of how memory access work within a computer, but the context perhaps helps understand why memory locality is important.
56-
52+
While you don’t need to know all the details of how memory works, it’s helpful to know that memory locality—keeping related data together and accessing it in chunks—is key to making programs run faster.
5753
![An abstract diagram showing the path data takes from disk or RAM to be used for computation.](episodes/fig/hardware.png){alt='An abstract representation of a CPU, RAM and Disk, showing their internal caches and the pathways data can pass.'}
5854

5955
::::::::::::::::::::::::::::::::::::: callout

0 commit comments

Comments
 (0)