You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+71-10Lines changed: 71 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,76 @@ For all GPU Scan algorithms, I choose to implement inclusive Scan first, and the
37
37
38
38
## Performance Analysis
39
39
40
+

41
+
42
+
When the array size is under 20,000, CPU Scan performs better than other algorithms. As the array size increases, GPU Naive Scan performs better than the rest of the algorithms. The Thrust implementation has more stable performance than the rest of the algorithms.
@@ -56,7 +126,7 @@ I want to choose a block configuration that would result in the largest number o
56
126
57
127
- You need 1536/512 = 3 blocks to fully occupy the SM. Fortunately, SM allows up to 16 blocks. Thus, the actual number of threads that can run on this SM is 3 * 512 = 1536. We have occupied 1536/1536 = 100% of the SM.
58
128
59
-
## Naive Scan Analysis
129
+
## Naive Scan
60
130
61
131
- Implemented ```NaiveGPUScan``` using shared memory.
62
132
- Each thread is assigned to evolve the contents of one element in the input array.
@@ -83,15 +153,6 @@ Understand thread to data mapping:
0 commit comments