Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f852ca2
Add flocking behavior to kernel.cu, but slightly buggy.
dluisnothere Sep 1, 2022
93b0520
Complete part 1 Naive Boids
dluisnothere Sep 2, 2022
71e7830
Add division by 0 checking for each rule
dluisnothere Sep 2, 2022
e69512b
Add local flocking check behavior pt.1
dluisnothere Sep 4, 2022
8c10a53
Complete 2.2 Scatter Boid Octant
dluisnothere Sep 9, 2022
63777f5
Complete coherent grid flocking
dluisnothere Sep 10, 2022
ff951b8
Fix octant selection logic
dluisnothere Sep 11, 2022
52073d9
Fix coherence grid flocking
dluisnothere Sep 11, 2022
5afc882
finish assignment for now
dluisnothere Sep 11, 2022
12a283c
Update README.md
dluisnothere Sep 11, 2022
c4997a9
Update README.md
dluisnothere Sep 11, 2022
39a36de
Update README.md
dluisnothere Sep 11, 2022
fa64f7c
Update README.md
dluisnothere Sep 11, 2022
0ee7a43
Update README.md
dluisnothere Sep 11, 2022
d2c84bd
Update README.md
dluisnothere Sep 11, 2022
9c0f4aa
Add header gif
dluisnothere Sep 11, 2022
567f919
Update README.md
dluisnothere Sep 11, 2022
9204fb0
Update README.md
dluisnothere Sep 11, 2022
1845bc6
Fix header img
dluisnothere Sep 11, 2022
372fc51
Update README.md
dluisnothere Sep 11, 2022
8f1afd0
Update README.md
dluisnothere Sep 11, 2022
a13d0eb
Update README.md
dluisnothere Sep 11, 2022
2c9be85
Add new headers
dluisnothere Sep 11, 2022
c77a10f
Update README.md
dluisnothere Sep 11, 2022
73ccdaa
Update README.md
dluisnothere Sep 11, 2022
3de5ae3
Update README.md
dluisnothere Sep 11, 2022
e2203ee
Update README.md
dluisnothere Sep 11, 2022
e14c936
Update README.md
dluisnothere Sep 11, 2022
35d501d
Update README.md
dluisnothere Sep 11, 2022
67ad8db
Update README.md
dluisnothere Sep 11, 2022
16f7e44
Add more images
dluisnothere Sep 11, 2022
b84fa8a
Update README.md
dluisnothere Sep 11, 2022
7b76603
Update README.md
dluisnothere Sep 11, 2022
744184e
Update README.md
dluisnothere Sep 11, 2022
4f76711
Update README.md
dluisnothere Sep 11, 2022
abc394b
Update README.md
dluisnothere Sep 11, 2022
dd1bf08
Add new gifs
dluisnothere Sep 11, 2022
7e71bdd
Update README.md
dluisnothere Sep 11, 2022
8d4f703
Add graph images
dluisnothere Sep 11, 2022
f36950f
Update README.md
dluisnothere Sep 11, 2022
f482bb3
Update README.md
dluisnothere Sep 11, 2022
e4b8279
Update README.md
dluisnothere Sep 11, 2022
ae40082
Update README.md
dluisnothere Sep 12, 2022
bb7f857
Update README.md
dluisnothere Sep 12, 2022
39d8361
Remove big.gif and more gifs
dluisnothere Sep 12, 2022
db38f83
Update README.md
dluisnothere Sep 12, 2022
0c35e00
Update README.md
dluisnothere Sep 12, 2022
b3ac71f
Update README.md
dluisnothere Sep 12, 2022
f4521db
Update README.md
dluisnothere Sep 12, 2022
3279048
Update README.md
dluisnothere Sep 12, 2022
54e84b1
Update README.md
dluisnothere Sep 12, 2022
5a2cd32
Update README.md
dluisnothere Sep 12, 2022
3cdb549
Update README.md
dluisnothere Sep 12, 2022
0487dd9
Update README.md
dluisnothere Sep 12, 2022
3b43923
Update README.md
dluisnothere Sep 12, 2022
41ed9c0
Update README.md
dluisnothere Sep 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 142 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,145 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**
# University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Di Lu
* [LinkedIn](https://www.linkedin.com/in/di-lu-0503251a2/)
* [personal website](https://www.dluisnothere.com/)
* Tested on: Windows 11, i7-12700H @ 2.30GHz 32GB, NVIDIA GeForce RTX 3050 Ti

### (TODO: Your README)
## Introduction

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
In this project, I simulate flocking behavior for a 200 x 200 x 200 cube of scattered boids by using CUDA kernel functions
to calculate their position and velocity on each dT. Based on Craig Reynold's artificial life program, for which a SIGGRAPH paper was written in 1989,
the following three behaviors are implemented:

1. cohesion - boids move towards the perceived center of mass of their neighbors
2. separation - boids avoid getting to close to their neighbors
3. alignment - boids generally try to move with the same direction and speed as
their neighbors

In the simulation results, the color of each particle is a representation of its velocity.

![Coherent Grid Flocking with 50,000 boids](images/headerResized.gif)

_Coherent Grid Flocking with 50,000 boids_

## Implementation and Results
To measure the performance of my code, I ran my program on release mode with VSync disabled. There are
three implementations: with the first being naive neighbor search, and each subsequent part
utilizing more optimizations.

#### Part 1. Naive Boids Simulation

The first simulation is a naive neighbor search, where each boid searches every other boid in existence and checks
whether they are within distance for cohesion, separation, or alignment. If a non-self boid is within any such distance,
then its position and velocity will be taken into account for the respective rule.

![Naive Boids Simulation](images/naive.gif)

_Naive Grid Flocking with 5,000 boids_

#### Part 2. Uniform Grid Boids

The second simulation is a neighbor search that takes into account the largest neighborhood distance among the 3 rules.
The simulation space is divided into grid cubes. Using these cubes, Each boid only needs to check the cubes that overlap
with its spherical neighborhood.

Each boid calculates the extremities of its reach by using its own radius and position. With these extremities, I can calculate
the maximum and minimum of my desired cells to scan. Hence, the number of useless boid scans are reduced, resulting in a much
faster simulation!

![Uniform Boids Simulation](images/uniform.gif)

_Uniform Grid Flocking with 5,000 boids_

#### Part 3. Coherent Grid Boids

The third simulation builds on the second simulation. This time, we also rearrange the position and velocity information such that
boids that are in a cell together are also contiguous in memory.

![Coherent Boids Simulation](images/coherent.gif)

_Coherent Grid Flocking with 5,000 boids_

## Part 3. Overall Performance Analysis

* Number of Boids vs. Performance

![Graph 1](images/graph1.png)

* BlockSize vs. Performance (N = 100,000)

![Graph 2](images/graph2.png)

**For each implementation, how does changing the number of boids affect
performance? Why do you think this is?**

Across all three implementations, increasing the number of boids will decrease FPS.
If the scene scale is the same, then each cell will become more dense as boids increase,
which means each boid will need to run more sequential operations to check the increased
number of valid neighbors.

**For each implementation, how does changing the block count and block size
affect performance? Why do you think this is?**

As seen from the graph, smaller block count usually results in poorer performance before a certain
blockSize is hit. For example, there is a big difference between blockSize == 16 and blockSize == 4, however,
above blockSize 32 is relatively similar in FPS performance. This behavior is not easily seen in simulations
with fewer boids (N = 5,000). This is likely because when blockSize is small, there are less threads that can run
in parallel. When blockSize is large, more and more threads can run the same program at the same time. However,
if the number of threads exceed the number of parallel operations, then there won't be much performance enahancement
anymore.

**For the coherent uniform grid: did you experience any performance improvements
with the more coherent uniform grid? Was this the outcome you expected?
Why or why not?**

I experienced a significant performance improvement with coherent uniform grid when the boid number
is very high. For example, on 500,000 boids, coherent grid can comfortably give me ~130 FPS, while
uniform grid just about dies at ~6 FPS. For fewer boids, the performance improvement is not that obvious (for example,
FPS is fairly similar between coherent and uniform grids when we have 5,000 boids.) This outcome is expected, because
when the number of boids is higher, each cell also contains a lot more boids. If the information is not contiguous in
memory, the impact of checking many boids whose information is further apart is more noticeable.

**Did changing cell width and checking 27 vs 8 neighboring cells affect performance?
Why or why not? Be careful: it is insufficient (and possibly incorrect) to say
that 27-cell is slower simply because there are more cells to check!**

My understanding is that given scene scale 100 and 100,000 boids, decreasing cell size
(thus checking more cells) will increase performance. This is because given the same density
of boids in a cell, we can check more cells in parallel and run less sequential operations
for each cell. However, if the scene scale were larger with the same number of boids, then this could potentially
decrease efficiency, because we are checking more cells that could be "useless".

To test my thinking, I used the following two scenarios:

_Constants:_

* _Coherent grid_
* _Number of boids: 100,000_

1. Scene Scale: 100

I observed around 100 FPS **increase** when I used cell width == neighborhood size.

2. Scene Scale: 200

I observed nearly 200 FPS **decrease** when I used cell width == neighbordhood size.



## Part 4: More Images and Results!
![](images/naive50k.png)

_Naive Flocking with 50,000 boids_

![](images/big2.gif)

_Coherent Flocking with 100,000 boids_

![](images/big.png)

_Coherent Flocking with 500,000 boids, could not get a gif of this onto github_

![](images/sceneScale200with100kBoids.png)

_100,000 boids on 200 scene scale instead of 100_
Loading