|
| 1 | +# Boids |
| 2 | + |
| 3 | +> Boids is an artificial life simulation originally developed by Craig Reynolds. |
| 4 | +The aim of the simulation was to replicate the behavior of flocks of birds. |
| 5 | +Instead of controlling the interactions of an entire flock, however, |
| 6 | +the Boids simulation only specifies the behavior of each individual bird. |
| 7 | + |
| 8 | +```console |
| 9 | +npm ci |
| 10 | +npm run boids-gl |
| 11 | +npm run boids-cl |
| 12 | +``` |
| 13 | + |
| 14 | +This example uses the package "opencl-raub", so a separate `npm ci` needs to be run here. |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +* The original example was taken from Three.js examples |
| 20 | +[GPGPU Birds](https://github.com/mrdoob/three.js/blob/master/examples/webgl_gpgpu_birds.html). |
| 21 | +* The interop and some other notes for the OpenCL implementation were taken from this |
| 22 | +[presentation](http://web.engr.oregonstate.edu/~mjb/cs575/Handouts/opencl.opengl.vbo.1pp.pdf). |
| 23 | +* Some optimization ideas for the OpenCL demo were taken from this |
| 24 | +[guide](https://developer.download.nvidia.com/compute/DevZone/docs/html/OpenCL/doc/OpenCL_Best_Practices_Guide.pdf) |
| 25 | + |
| 26 | +The OpenCL implementation is similar to GLSL one algorithmically - i.e. the same N to N |
| 27 | +interaction is performed. This is a basic example, not a grid-based N-body. It only |
| 28 | +exists to illustrate how GLSL RTT compute can be swapped for OpenCL compute with Node3D. |
| 29 | + |
| 30 | +## Minor changes |
| 31 | + |
| 32 | +Compared to the original Three.js example there are several edits: |
| 33 | +* Added OrbitControls - you may look around and zoom. The mouse-predator is only correct for the |
| 34 | +initial view position. |
| 35 | +* The birds are colored according to their flight direction. The background is black. |
| 36 | +* Some GLSL changes, like removing the unused variables and improving readability. |
| 37 | +* Extracted some functions and primitives into separate modules. Extracted the inline shaders. |
| 38 | +* Removed the unused attributes. Renamed some of the variables. |
| 39 | +* Bumped the number of birds from `32*32` to `128*128` (16k+). |
| 40 | + |
| 41 | + |
| 42 | +## OpenCL notes |
| 43 | + |
| 44 | +The positions memory contains `phase` similar to the GLSL implementation. The `velocity.w` |
| 45 | +property is unused - we might as well use 3-component velocity. |
| 46 | + |
| 47 | +It is possible to use OpenGL textures in OpenCL - to directly map the GLSL compute implementation. |
| 48 | +But a more straightforward way of using shared VBOs was chosen. To implement that, the |
| 49 | +birds mesh was adjusted for instancing. The changes are mostly related to how the |
| 50 | +birds geometry is configured (see [GL](gl/bird-geometry.ts) vs [CL](cl/bird-geometry-cl.ts)). |
| 51 | + |
| 52 | +The call to `cl.enqueueAcquireGLObjects` is only performed once. That works for a combination |
| 53 | +of **Windows + nVidia**. It may turn out that on other platforms this should be called |
| 54 | +every frame. |
| 55 | + |
| 56 | +The naive implementation of N-body interaction performs similar in performance to GLSL - |
| 57 | +because they do basically the same. I.e. the same number of GPU memory reads and writes, |
| 58 | +and about the same math. But using shared/local GPU memory allows to optimize the number of |
| 59 | +memory reads. By doing so, we can observe around x2 overall performance. |
| 60 | + |
| 61 | +See [boids.cl](cl/boids.cl), where the first loop starts. The N-body interaction is split |
| 62 | +into chunks of 256 items, and that is matched by the **work group** size when launching the |
| 63 | +kernel. |
| 64 | +1. For each iteration of the outer loop, the workgroup threads synchronize and |
| 65 | +copy 256 entries into local memory (1 entry per thread). |
| 66 | +1. Threads synchronize again, and each thread does 256 iterations, |
| 67 | +but only reading from shared (and not global) memory. |
| 68 | +1. Hence we use (the order of) N global reads, instead of N\*N. If N is **16384**, N\*N is **268,435,456**. |
0 commit comments