CIS565-Fall-2022 · yuchiashen1009 · Sep 12, 2022 · Sep 12, 2022
diff --git a/README.md b/README.md
@@ -1,11 +1,55 @@
+Project 1 Boids Flocking
+====================
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture,
 Project 1 - Flocking**
 
-* (TODO) YOUR NAME HERE
-  * (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Yu-Chia Shen
+  * [LinkedIn](https://www.linkedin.com/in/ycshen0831/)
+* Tested on: Windows 10, i5-11400F @ 4.3GHz 16GB, GTX 3060 12GB (personal)
 
-### (TODO: Your README)
+# Result
+![](./images/result.gif)
 
-Include screenshots, analysis, etc. (Remember, this is public, so don't put
-anything here that you don't want to share with the world.)
+![](./images/result_12.gif)
+
+![](./images/1M.png)
+
+![](./images/1M2.png)
+
+# Overview
+The obective of this project is to simulate the boid flocking. For the following image, every colored dot is a boid, and every color represents a group of boids. A boid is a bird-like particle and moves arround the simulation spzce according to the three rules:
+
+  * cohesion - boids move towards the perceived center of mass of their neighbors
+  * separation - boids avoid getting to close to their neighbors
+  * alignment - boids generally try to move with the same direction and speed as their neighbors
+
+These three rules specify a boid's velocity change in a timestep. At every timestep, a boid thus has to look at each of its neighboring boids and compute the velocity change contribution from each of the three rules. Thus, a bare-bones boids implementation has each boid check every other boid in the simulation.
+
+# Performance Analysis
+* FPS vs Number of Boids
+![](./images/Vis.png)
+
+* FPS vs Block Size
+![](./images/BlockSize.png)
+
+* 8 neighboring vs 27 neighboring
+![](./images/8vs27.png)
+
+# Questions
+* For each implementation, how does changing the number of boids affect performance? Why do you think this is?
+
+  _When the number of boids increase, the performance decrease. This is beacause the number of boids represent the complexity to this problem. When the number increase, there are more boids to check for updating one single boid. Therefore, the computation will also increase, and thus decrease the performance._
+
+* For each implementation, how does changing the block count and block size affect performance? Why do you think this is?
+
+  _According to the image above, we can see that there is no significant difference in performance between each block size. However, we can also observe that there is a slight performance drop for block size 32. I think this is because there are maximum number of blocks per SM. Therefore, there will be occupancy waste if we have too many blocks, and thus limit the performance. Also, large block sizes may decrease the performance since there are resource limit(e.g. registers). But this doesn't happened in this case._
+
+  [Reference](https://forums.developer.nvidia.com/t/how-to-choose-how-many-threads-blocks-to-have/55529)
+
+* For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
+
+  _Yes, the coherent uniform grid has a better performance as I expected. Since with the coherent structure, there are fewer times to access the Global Memory. Without coherent structure, we have to access memory for all neighbors of one boid every time. Hence, although we have to rearange the arraies every time, we still spend fewer times to access the Global Memory._
+
+* Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!
+
+  _Yes, according to the image above, the 27-neighboring version is slight better than 8-neighboring as the number of boids increase. I think this is because the volume of 27 small grids is less than the volume of 8 big grids. Therefore, there will be fewer boids needed to be checked, and thus increase the performance. However, when the number of boids decreases, the performance decreases, since we have to check more grid cells in 27-neighboring version (27 > 8)._
diff --git a/images/1M.png b/images/1M.png
diff --git a/images/1M2.png b/images/1M2.png
diff --git a/images/8vs27.png b/images/8vs27.png
diff --git a/images/BlockSize.png b/images/BlockSize.png
diff --git a/images/Vis.png b/images/Vis.png
diff --git a/images/generate_image.py b/images/generate_image.py
@@ -0,0 +1,62 @@
+from cProfile import label
+import matplotlib
+import matplotlib.pyplot as plt
+import numpy as np
+
+if __name__ == "__main__":
+    x = [25000, 50000, 100000, 250000, 500000, 1000000]
+
+    nativeVis = [102.97, 29.3, 7.8, 1.3, 0, 0]
+    nativeNonvis = [112.98, 30.15, 7.83, 1.3, 0, 0]
+
+    scatteredVis = [1122.4, 665.6, 503.43, 69.44, 22.39, 7.82]
+    scatteredNonvis = [1686.73, 1000.49, 654.64, 74.09, 23.95, 7.89]
+
+    coherentVis = [1260.61, 897.67, 859.73, 479.44, 310.48, 190.69]
+    coherentNonvis = [1897.59, 1236.93, 1247.28, 741.93, 440.62, 242.66]
+
+    fig = plt.figure()
+    plt.plot(x, coherentVis, "g", label="Coherent With Visualization ")
+    plt.plot(x, scatteredVis, "b", label="Scattered With Visualization ")
+    plt.plot(x, nativeVis, "r", label="Native With Visualization ")
+
+    plt.plot(x, coherentNonvis, "g--", label="Coherent")
+    plt.plot(x, scatteredNonvis, "b--", label="Scattered")
+    plt.plot(x, nativeNonvis, "r--", label="Native")
+
+    plt.legend()
+
+    plt.title("FPS vs Number of Boids")
+    plt.ylabel('FPS')
+    plt.xlabel("Number of Boids")
+
+    x = [5, 6, 7, 8, 9, 10]
+    y1 = [166.83, 242.10, 259.3, 272.09, 258.91, 258.86]
+    y2 = [1171.7, 1430.62, 1486.67, 1365.3, 1457.6, 1426.61]
+
+    fig = plt.figure()
+    plt.plot(x, y1, label="1M Boids Without Visualization")
+    plt.plot(x, y2, label="100K Boids Without Visualization")
+
+    plt.legend()
+
+    plt.title("FPS vs Block Size")
+    plt.ylabel('FPS')
+    plt.xlabel("Block Size 2^(x)")
+
+    x = [10000, 25000, 50000, 100000, 250000, 500000, 1000000]
+    y = [2069.2, 1983.6, 1898.64, 1615.14, 929.8, 601.04, 354.63]
+    coherentNonvis = [2120, 2026.9, 1345.29, 1512.05, 795, 480.94, 268.7]
+
+
+    fig = plt.figure()
+    plt.plot(x, coherentNonvis, label="8 neighboring cells")
+    plt.plot(x, y, label="27 neighboring cells")
+
+    plt.legend()
+
+    plt.title("FPS vs Block Size")
+    plt.ylabel('FPS')
+    plt.xlabel("Block Size 2^(x)")
+
+    plt.show()
diff --git a/images/result.gif b/images/result.gif
diff --git a/images/result_12.gif b/images/result_12.gif