Skip to content

Commit dfdf89e

Browse files
committed
functionally complete but not getting good results yet
1 parent b161ca7 commit dfdf89e

File tree

2 files changed

+542
-0
lines changed

2 files changed

+542
-0
lines changed

examples/scan_sort_perf.html

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
<!DOCTYPE html>
2+
3+
<html>
4+
<head>
5+
<meta charset="utf-8" />
6+
<title>
7+
Standalone WebGPU Scan/Reduce/Sort Primitive Test, with Configuration Pane
8+
</title>
9+
<link
10+
rel="stylesheet"
11+
href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/default.min.css"
12+
/>
13+
</head>
14+
15+
<body>
16+
<p>
17+
This example is a self-contained use of the <code>scan</code> and
18+
<code>sort</code> primitives, meant to plot performance. This builds on
19+
the simpler
20+
<a href="scan_sort_pane_example.html">functionality example</a>. Set your
21+
parameters in the pane and click "Start" to run and plot performance data
22+
for a WebGPU scan/reduce/sort. The <code>inputCount</code> input specifies
23+
how many different input lengths to run, which will be evenly
24+
(logarithmically) interpolated between the specified start and end
25+
lengths. Otherwise, the parameters are the same as in the
26+
<a href="scan_sort_pane_example.html">functionality example</a>. This
27+
example explains
28+
<a href="/gridwise/docs/gridwise/timing-strategy.html"
29+
>how to time a Gridwise primitive</a
30+
>.
31+
<a
32+
href="https://github.com/gridwise-webgpu/gridwise/blob/main/examples/scan_sort_perf.mjs"
33+
>The entire JS source file is in github.</a
34+
>
35+
</p>
36+
<p>
37+
To measure CPU and/or GPU timing, include a timing directive in the call
38+
to <code>primitive.execute</code>. Typically we call the primitive once
39+
without any timing information to handle warmup effects (e.g., compiling
40+
the kernel) and then call the kernel many times and average the runtimes
41+
of that second set of calls. We then average the total runtime over the
42+
number of trials.
43+
</p>
44+
<pre><code class="language-javascript">/* call the primitive once to warm up */
45+
await primitive.execute({
46+
inputBuffer: memsrcBuffer,
47+
outputBuffer: memdestBuffer,
48+
});
49+
/* call params.trials times */
50+
await primitive.execute({
51+
inputBuffer: memsrcBuffer,
52+
outputBuffer: memdestBuffer,
53+
trials: params.trials, /* integer */
54+
enableGPUTiming: true,
55+
enableCPUTiming: true,
56+
});</code></pre>
57+
<p>
58+
We can get timing information back from the primitive with a `getResults`
59+
call. The GPU time might be an array of timings if the GPU call has
60+
multiple kernels within it. In the below example, we simply flatten that
61+
array by adding it up into a total time.
62+
</p>
63+
<pre><code class="language-javascript">let { gpuTotalTimeNS, cpuTotalTimeNS } = await primitive.getTimingResult();
64+
if (gpuTotalTimeNS instanceof Array) {
65+
// gpuTotalTimeNS might be a list, in which case just sum it up
66+
gpuTotalTimeNS = gpuTotalTimeNS.reduce((x, a) => x + a, 0);
67+
}
68+
averageGpuTotalTimeNS = gpuTotalTimeNS / params.trials;
69+
averageCpuTotalTimeNS = cpuTotalTimeNS / params.trials;</code></pre>
70+
<p>
71+
Timing the <code>sort</code> primitive is frustratingly complicated
72+
because sort overwrites its input with its output. The most meaningful
73+
timing results will therefore need to reset sort's input on each pass to
74+
make sure it has the same workload on each pass. For simplicity, we are
75+
not doing that here.
76+
</p>
77+
<hr />
78+
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
79+
<script>
80+
hljs.highlightAll();
81+
</script>
82+
<script src="scan_sort_perf.mjs" type="module"></script>
83+
<div id="webgpu-results"></div>
84+
<div id="plot"></div>
85+
</body>
86+
</html>

0 commit comments

Comments
 (0)