|
16 | 16 | "In this project, we are going to see implementations of **Conway's Game of Life**, a classic cellular automaton in three ways: a pure python approach (to run on the CPU), a vectorised approach using NumPy (to run on the CPU) and then using CuPy (to run on the GPU). We'll also visualise the evolution of the Game of Life grid to see the computation in action. \n", |
17 | 17 | "\n", |
18 | 18 | "## What is Conway's Game of Life?\n", |
| 19 | + "\n", |
19 | 20 | "It's a zero-player game devised by John Conway, where you have a grid of cells that live or die based on a few simple rules:\n", |
20 | 21 | "- Each cell can be \"alive\" (1) or \"dead\" (0).\n", |
21 | 22 | "- At each time step (generation), the following rules apply to every cell simultaneously:\n", |
22 | | - "- Any live cell with fewer than 2 live neighbours dies (underpopulation).\n", |
23 | | - "- Any live cell with 2 or 3 live neighbours lives on to the next generation (survival).\n", |
24 | | - "- Any live cell with more than 3 live neighbours dies (overpopulation).\n", |
25 | | - "- Any dead cell with exactly 3 live neighbours becomes a live cell (reproduction).\n", |
| 23 | + " - Any live cell with fewer than 2 live neighbours dies (underpopulation).\n", |
| 24 | + " - Any live cell with 2 or 3 live neighbours lives on to the next generation (survival).\n", |
| 25 | + " - Any live cell with more than 3 live neighbours dies (overpopulation).\n", |
| 26 | + " - Any dead cell with exactly 3 live neighbours becomes a live cell (reproduction).\n", |
26 | 27 | "- Neighbours are the 8 cells touching a given cell horizontally, vertically, or diagonally.\n", |
27 | 28 | "- From these simple rules emerges a lot of interesting behaviour – stable patterns, oscillators, spaceships (patterns that move), etc. It's a good example of a grid-based simulation that can benefit from parallel computation because the state of each cell for the next generation can be computed independently (based on the current generation).\n", |
28 | 29 | "\n", |
29 | 30 | "## Visualisation of Game of Life\n", |
| 31 | + "\n", |
30 | 32 | "To make this project more visually engaging, below is an **animated GIF** showing an example of a Game of Life simulation starting from a random initial configuration. White pixels represent live cells, and black pixels represent dead cells. You can see patterns forming, moving, and changing over time:\n", |
31 | 33 | "An example evolution of Conway's Game of Life over a few generations (white = alive, black = dead).\n", |
32 | 34 | "The animation demonstrates how random initial clusters of cells can evolve into interesting patterns. Notice some cells blink on and off or form moving patterns.\n", |
|
37 | 39 | "\n", |
38 | 40 | "\n", |
39 | 41 | "## Implementations\n", |
| 42 | + "\n", |
40 | 43 | "All of the implementation for the three different versions (Pure Python, NumPy and CuPy) are contained within the `.py` located at `content/game_of_life.py`. \n", |
41 | 44 | "\n", |
42 | 45 | "To run the different versions of the code, you can use:\n", |
43 | 46 | "\n", |
44 | 47 | "**Naïve Python Version**\n", |
45 | 48 | "\n", |
46 | | - "`python game_of_life.py run_life_naive --size 100 --timesteps 50`\n", |
| 49 | + "```bash\n", |
| 50 | + "python game_of_life.py run_life_naive --size 100 --timesteps 50\n", |
| 51 | + "```\n", |
| 52 | + "\n", |
47 | 53 | "which will produce a file called `game_of_life_naive.gif`.\n", |
48 | 54 | "\n", |
49 | 55 | "**CPU-Vectorized Version**\n", |
50 | | - "`python game_of_life.py run_life_numpy --size 100 --timesteps 50`\n", |
| 56 | + "\n", |
| 57 | + "```bash\n", |
| 58 | + "python game_of_life.py run_life_numpy --size 100 --timesteps 50\n", |
| 59 | + "```\n", |
| 60 | + "\n", |
51 | 61 | "which will produce a file called `game_of_life_cpu.gif`.\n", |
52 | 62 | "\n", |
53 | 63 | "**GPU-Accelerated Version**\n", |
54 | | - "`python game_of_life.py run_life_cupy --size 100 --timesteps 50`\n", |
| 64 | + "\n", |
| 65 | + "```bash\n", |
| 66 | + "python game_of_life.py run_life_cupy --size 100 --timesteps 50\n", |
| 67 | + "```\n", |
| 68 | + "\n", |
55 | 69 | "which will produce a file called `game_of_life_gpu.gif`.\n", |
56 | 70 | "\n", |
57 | 71 | "## Naive Implementation\n", |
| 72 | + "\n", |
58 | 73 | "The core computation that is being performed for the naive implementation is: \n", |
| 74 | + "\n", |
59 | 75 | "```python\n", |
60 | 76 | "def life_step_naive(grid: np.ndarray) -> np.ndarray:\n", |
61 | 77 | " N, M = grid.shape\n", |
|
86 | 102 | "### Explanation \n", |
87 | 103 | "\n", |
88 | 104 | "There are a number of different reasons that the naive implementation runs slow, including: \n", |
| 105 | + "\n", |
89 | 106 | "- **Nested Python Loops**: Instead of eight `np.roll` calls and one `np.where`, we make two loops over `i, j` (10^4 iterations) and two more loops over `di, dj` (9 checks each), for roughly 9x10^4 Python level operation per step. \n", |
90 | 107 | "- **Manual edge-wrapping logic**: Branching (`if ni < 0 … elif …`) for each neighbour check, instead of the single fast shift that `np.roll` does in C. \n", |
91 | 108 | "- **Per-cell rule application** The game of life rule is applied with Python `if/else` instead of the single vectorised Boolean mask. \n", |
|
122 | 139 | "### Explanation\n", |
123 | 140 | "\n", |
124 | 141 | "#### From Per-Cell Loops to Whole-Array Operations \n", |
| 142 | + "\n", |
125 | 143 | "In the **naive** version, every one of the NxN cells in Python was traversed within two nested loops; then, for each cell, two more loops over the offsets `di` and `dj` counted its eight neighbours by computing. `(i + di) % N` and `(j + dj) % M` in pure Python. \n", |
126 | 144 | "**Cost**: ~9·N² Python-level iterations per generation, including branching and modulo arithmetic.\n", |
127 | 145 | "**Drawback** Thousands of interpreter calls and non-contiguous memory access. \n", |
128 | 146 | "In the **NumPy** version, no Python loops over individual cells occur. Instead, eight calls to `np.roll` shift the entire grid array (up, down, left, right and on diagonals), automatically handling wrap-around in one C-level operation. Summing those eight arrays gives a full neighbour count in a single, optimised pass. \n", |
129 | 147 | "\n", |
130 | 148 | "#### Manual `if/else` vs Vectorised Mask \n", |
| 149 | + "\n", |
131 | 150 | "In the **naive** implementation, after counting neighbours, each cell's fate is determined with a Python `if grid[i,j] == 1: ... else: ...` and assigned via `new[i,j] = ...`. \n", |
132 | 151 | "In the **NumPy** implementation a single expression of `(neighbours == 3) | ((grid == 1) & (neighbours == 2))` produces an NxN Boolean mask of *cells alive next*. Converting that mask to integers with `np.where(mask, 1, 0)` builds the entire next-generation grid in one C-level operation, resulting in no per-element Python overhead. \n", |
133 | 152 | "\n", |
134 | | - "\n", |
135 | 153 | "#### Automatic Wrap-Around vs Manual Modulo Logic\n", |
| 154 | + "\n", |
136 | 155 | "In the **naive** version, every neighbour checks does: \n", |
137 | 156 | "\n", |
138 | 157 | "```python \n", |
139 | 158 | "ni = (i + di) % N\n", |
140 | 159 | "nj = (j + dj) % M\n", |
141 | 160 | "```\n", |
142 | 161 | "\n", |
143 | | - "with Python-level branching and modulo arithmetic on each of the 9 checks per cell. The associated **cost** is thousands of `%` operations and branch instructions per generation. \n", |
| 162 | + "with Python-level branching and modulo arithmetic on each of the 9 checks per cell. The associated **cost** is thousands of modulo (`%`) operations and branch instructions per generation. \n", |
144 | 163 | "\n", |
145 | 164 | "In the **NumPy** version, a single call to \n", |
146 | 165 | "\n", |
147 | 166 | "```python\n", |
148 | 167 | "np.roll(grid, shift, axis=)\n", |
149 | 168 | "```\n", |
| 169 | + "\n", |
150 | 170 | "automatically wraps the entire array in one C-level operation. The **benefit** is that all per-cell `%` operations and branching are eliminated, being replaced by a single optimised memory shift over the whole grid. \n", |
151 | 171 | "\n", |
152 | 172 | "## GPU-Accelerated Implementation \n", |
|
194 | 214 | "```\n", |
195 | 215 | "\n", |
196 | 216 | "#### Random initialisation \n", |
| 217 | + "\n", |
197 | 218 | "**NumPy**: \n", |
198 | 219 | "```Python \n", |
199 | 220 | "grid = np.random.choice([0,1], size=(N,N), p=[1-p, p])\n", |
|
205 | 226 | "```\n", |
206 | 227 | "\n", |
207 | 228 | "#### Data Transfer\n", |
| 229 | + "\n", |
208 | 230 | "**CuPy**: \n", |
| 231 | + "\n", |
209 | 232 | "```Python \n", |
210 | 233 | "cp.asnumpy(grid_gpu) # bring a CuPy array back to NumPy\n", |
211 | 234 | "```\n", |
212 | 235 | "\n", |
213 | 236 | "### Which to use?\n", |
| 237 | + "\n", |
214 | 238 | "**Large grids (e.g. N ≥ 500) or many timesteps**: GPU's parallel throughput outweighs kernel-launch and transfer overhead.\n", |
215 | 239 | "**Small grids (e.g. 10×10)**: GPU overhead may dominate, so you may want to stick with NumPy.\n", |
216 | 240 | "\n", |
217 | 241 | "### Why is this quicker?\n", |
218 | 242 | "\n", |
219 | 243 | "When a computation can be expressed as the same operation applied independently across many data elements, like counting neighbours on every cell of a large Game of Life grid, GPUs often deliver dramatic speedups compared to CPUs. This advantage stems from several architectural and compiler-related factors that we discussed earlier in the section on theory, including: \n", |
| 244 | + "\n", |
220 | 245 | "- **Massive Data Parallelism**\n", |
221 | 246 | " - **CPU**: A few (4–16) powerful cores optimised for sequential tasks and complex control flow.\n", |
222 | 247 | " - **GPU**: Hundreds to thousands of simpler cores running in lock-step.\n", |
|
239 | 264 | "### How much quicker?\n", |
240 | 265 | "\n", |
241 | 266 | "Each implementation exhibits a different overall runtime, as you have probably noticed when running them from the command line. We can use the built-in UNIX command line tool `time` to measure the time that is taken to run the code. The `time` command is a simple profiler that measures how long a given program takes to run. It provides three primary metrics, including:\n", |
| 267 | + "\n", |
242 | 268 | "- **real**: The \"wall-clock\" time elapsed from start to finish (i.e. actual elapsed time).\n", |
243 | | - "- **user**: CPU time spent in user-mode *your programs own computations)\n", |
| 269 | + "- **user**: CPU time spent in user-mode (your programs own computations)\n", |
244 | 270 | "- **sys**: CPU time spent in kernel mode (system calls on behalf of your program)." |
245 | 271 | ] |
246 | 272 | }, |
|
0 commit comments