|
| 1 | +<!DOCTYPE html> |
| 2 | + |
| 3 | +<html> |
| 4 | + <head> |
| 5 | + <meta charset="utf-8" /> |
| 6 | + <title> |
| 7 | + Standalone WebGPU Scan/Reduce/Sort Primitive Test, with Configuration Pane |
| 8 | + </title> |
| 9 | + <link |
| 10 | + rel="stylesheet" |
| 11 | + href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/default.min.css" |
| 12 | + /> |
| 13 | + </head> |
| 14 | + |
| 15 | + <body> |
| 16 | + <p> |
| 17 | + This example is a self-contained use of the <code>scan</code> and |
| 18 | + <code>sort</code> primitives, meant to plot performance. This builds on |
| 19 | + the simpler |
| 20 | + <a href="scan_sort_pane_example.html">functionality example</a>. Set your |
| 21 | + parameters in the pane and click "Start" to run and plot performance data |
| 22 | + for a WebGPU scan/reduce/sort. The <code>inputCount</code> input specifies |
| 23 | + how many different input lengths to run, which will be evenly |
| 24 | + (logarithmically) interpolated between the specified start and end |
| 25 | + lengths. Otherwise, the parameters are the same as in the |
| 26 | + <a href="scan_sort_pane_example.html">functionality example</a>. This |
| 27 | + example explains |
| 28 | + <a href="/gridwise/docs/gridwise/timing-strategy.html" |
| 29 | + >how to time a Gridwise primitive</a |
| 30 | + >. |
| 31 | + <a |
| 32 | + href="https://github.com/gridwise-webgpu/gridwise/blob/main/examples/scan_sort_perf.mjs" |
| 33 | + >The entire JS source file is in github.</a |
| 34 | + > |
| 35 | + </p> |
| 36 | + <p> |
| 37 | + To measure CPU and/or GPU timing, include a timing directive in the call |
| 38 | + to <code>primitive.execute</code>. Typically we call the primitive once |
| 39 | + without any timing information to handle warmup effects (e.g., compiling |
| 40 | + the kernel) and then call the kernel many times and average the runtimes |
| 41 | + of that second set of calls. We then average the total runtime over the |
| 42 | + number of trials. |
| 43 | + </p> |
| 44 | + <pre><code class="language-javascript">/* call the primitive once to warm up */ |
| 45 | +await primitive.execute({ |
| 46 | + inputBuffer: memsrcBuffer, |
| 47 | + outputBuffer: memdestBuffer, |
| 48 | +}); |
| 49 | +/* call params.trials times */ |
| 50 | +await primitive.execute({ |
| 51 | + inputBuffer: memsrcBuffer, |
| 52 | + outputBuffer: memdestBuffer, |
| 53 | + trials: params.trials, /* integer */ |
| 54 | + enableGPUTiming: true, |
| 55 | + enableCPUTiming: true, |
| 56 | +});</code></pre> |
| 57 | + <p> |
| 58 | + We can get timing information back from the primitive with a `getResults` |
| 59 | + call. The GPU time might be an array of timings if the GPU call has |
| 60 | + multiple kernels within it. In the below example, we simply flatten that |
| 61 | + array by adding it up into a total time. |
| 62 | + </p> |
| 63 | + <pre><code class="language-javascript">let { gpuTotalTimeNS, cpuTotalTimeNS } = await primitive.getTimingResult(); |
| 64 | +if (gpuTotalTimeNS instanceof Array) { |
| 65 | + // gpuTotalTimeNS might be a list, in which case just sum it up |
| 66 | + gpuTotalTimeNS = gpuTotalTimeNS.reduce((x, a) => x + a, 0); |
| 67 | +} |
| 68 | +averageGpuTotalTimeNS = gpuTotalTimeNS / params.trials; |
| 69 | +averageCpuTotalTimeNS = cpuTotalTimeNS / params.trials;</code></pre> |
| 70 | + <p> |
| 71 | + Timing the <code>sort</code> primitive is frustratingly complicated |
| 72 | + because sort overwrites its input with its output. The most meaningful |
| 73 | + timing results will therefore need to reset sort's input on each pass to |
| 74 | + make sure it has the same workload on each pass. For simplicity, we are |
| 75 | + not doing that here. |
| 76 | + </p> |
| 77 | + <hr /> |
| 78 | + <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script> |
| 79 | + <script> |
| 80 | + hljs.highlightAll(); |
| 81 | + </script> |
| 82 | + <script src="scan_sort_perf.mjs" type="module"></script> |
| 83 | + <div id="webgpu-results"></div> |
| 84 | + <div id="plot"></div> |
| 85 | + </body> |
| 86 | +</html> |
0 commit comments