@@ -295,12 +295,12 @@ to manually add `-DIMEX_ENABLE_BENCHMARK=ON` option when building the IMEX. The
295295script for running them will be generated under the ` build/benchmarks ` folder.
296296
297297Currently, IMEX provides benchmarks for the following 4 categories of operations:
298- | Operation | CPU | GPU |
299- | :---: | :---: | :---: |
300- | elementwise (relu and silu) | Yes | Yes |
301- | reduction (softmax) | Yes | Yes |
302- | transpose (transpose) | Yes | Yes |
303- | fusion (kInputFusion and kLoopFusion) | No | Yes |
298+ | Operation | CPU | GPU |
299+ | :-----------------------------------: | :---: | :---: |
300+ | elementwise (relu and silu) | Yes | Yes |
301+ | reduction (softmax) | Yes | Yes |
302+ | transpose (transpose) | Yes | Yes |
303+ | fusion (kInputFusion and kLoopFusion) | No | Yes |
304304
305305These test cases are mainly implemented using linalg dialect, and the spriv test cases for
306306relu are also provided. Each testcase is named following the pattern of ` opname_shape_dtype.mlir `
@@ -341,11 +341,24 @@ fp32, fp16, int32 etc.
341341
342342
343343## Profiling kernel execute time
344- ### sycl event
344+ ### level-zero (l0) event
345345``` sh
346346export IMEX_ENABLE_PROFILING=ON
347347run the test
348348```
349+ The profiling is based on warmup runs and profiling runs for stable results.
350+ By default, the number of warmup runs and profiling runs are 100 each.
351+ However, one can control them using envinronment variables:
352+
353+ ``` sh
354+ # 10 warm-up runs
355+ export IMEX_PROFILING_WARMUPS = 10
356+ # 10 profiling runs
357+ export IMEX_PROFILING_RUNS = 10
358+ ```
359+
360+ The profiling result provides the min, max, avg, median and std dev of the execution time (in ms).
361+
349362### trace tools
350363``` sh
351364python {your_path}/imex_runner.py xxx -o test.mlir
0 commit comments