Commit 0491465
authored
[GPU] 70% off APC tracegen overhead (#3436)
For GPU trace gen, we previously loop over each air per block, each
`Subst` per warp, and each row per thread. This PR explores an
alternative that loops over each row per thread (regardless of air or
`Subst`). Surprisingly, this shaves another ~70% off the current APC
trace gen overhead.
The following three scenarios are:
1. Current main (APC=100, threads=256): 7874 ms in tracegen.
2. This PR (APC=100, threads=256): 7370 ms in tracegen.
3. Baseline to benchmark for dummy tracegen time (APC=0, threads=256):
7171 ms in tracegen.
Therefore, this PR shaves another `(7874 - 7370) / (7874 - 7171) = 72%`
off tracegen time.
```
filename num_segments app_proof_cells app_proof_cols total_proof_time_ms app_proof_time_ms app_execute_preflight_time_ms app_execute_metered_time_ms app_trace_gen_time_ms leaf_proof_time_ms inner_recursion_proof_time_ms normal_instruction_ratio openvm_precompile_ratio powdr_ratio powdr_rows
/home/steve/openvm-reth-benchmark/apc_100_app_256.json 19 13856523983 354152 31156 31156 7886 703 7874 0 0 0.307127 0.540265 0.152608 14033237
/home/steve/openvm-reth-benchmark/apc_100_new.json 19 13856523983 354152 31097 31097 7851 708 7370 0 0 0.307127 0.540265 0.152608 14033237
../openvm-reth-benchmark/metrics_apc0.json 26 20019740816 216005 42660 42660 4622 749 7171 0 0 0.612871 0.387129 0.000000 0
```
I have some rough theories about where the diff come from:
1. In our prior strategy, because each original air is assigned to a
block, there can be lopsided cases when a few original airs are "called"
many times while other airs aren't. These cases should be quite common,
as we can think of instructions from like the ALU chip is called way
more often than other chips.
2. Lopsided cases means that some blocks can be left idle when they
could have been redirected to other airs that are still processing.
3. This method does have the disadvantage of not localizing memory
accesses enough (which our prior strategy optimizes for), but it has the
main benefit of almost 100% utilization of all threads allocated,
because each thread is assigned to an APC row.1 parent 43adb4b commit 0491465
File tree
3 files changed
+43
-55
lines changed- openvm
- cuda/src
- src
- powdr_extension/trace_generator/cuda
3 files changed
+43
-55
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
17 | 15 | | |
18 | 16 | | |
19 | 17 | | |
| 18 | + | |
20 | 19 | | |
21 | 20 | | |
22 | 21 | | |
| |||
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
33 | | - | |
| 32 | + | |
34 | 33 | | |
35 | 34 | | |
36 | 35 | | |
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
40 | 39 | | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
73 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
74 | 64 | | |
75 | | - | |
76 | 65 | | |
77 | 66 | | |
78 | 67 | | |
| |||
137 | 126 | | |
138 | 127 | | |
139 | 128 | | |
140 | | - | |
141 | | - | |
| 129 | + | |
142 | 130 | | |
| 131 | + | |
143 | 132 | | |
144 | 133 | | |
145 | 134 | | |
146 | 135 | | |
147 | 136 | | |
148 | 137 | | |
149 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
150 | 141 | | |
151 | 142 | | |
152 | | - | |
| 143 | + | |
153 | 144 | | |
154 | 145 | | |
155 | | - | |
| 146 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
| 15 | + | |
17 | 16 | | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
75 | 73 | | |
76 | 74 | | |
77 | 75 | | |
78 | 76 | | |
79 | 77 | | |
| 78 | + | |
| 79 | + | |
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
105 | 104 | | |
106 | 105 | | |
107 | 106 | | |
108 | 107 | | |
109 | 108 | | |
110 | 109 | | |
111 | | - | |
112 | 110 | | |
| 111 | + | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
273 | 274 | | |
274 | 275 | | |
275 | | - | |
| 276 | + | |
276 | 277 | | |
277 | 278 | | |
278 | 279 | | |
| |||
283 | 284 | | |
284 | 285 | | |
285 | 286 | | |
| 287 | + | |
286 | 288 | | |
287 | 289 | | |
288 | 290 | | |
289 | 291 | | |
290 | 292 | | |
291 | | - | |
292 | | - | |
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| |||
301 | 301 | | |
302 | 302 | | |
303 | 303 | | |
304 | | - | |
305 | | - | |
306 | 304 | | |
307 | 305 | | |
308 | 306 | | |
| |||
0 commit comments