Skip to content

Commit 6a6223e

Browse files
authored
[README] Update the instruction for profiling a kernel (#1165)
[README] Update the instruction for profiling a kernel.
1 parent bf0a91d commit 6a6223e

File tree

1 file changed

+20
-7
lines changed

1 file changed

+20
-7
lines changed

README.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -295,12 +295,12 @@ to manually add `-DIMEX_ENABLE_BENCHMARK=ON` option when building the IMEX. The
295295
script for running them will be generated under the `build/benchmarks` folder.
296296

297297
Currently, IMEX provides benchmarks for the following 4 categories of operations:
298-
| Operation | CPU | GPU |
299-
| :---: | :---: | :---: |
300-
| elementwise (relu and silu) | Yes | Yes |
301-
| reduction (softmax) | Yes | Yes |
302-
| transpose (transpose) | Yes | Yes |
303-
| fusion (kInputFusion and kLoopFusion) | No | Yes |
298+
| Operation | CPU | GPU |
299+
| :-----------------------------------: | :---: | :---: |
300+
| elementwise (relu and silu) | Yes | Yes |
301+
| reduction (softmax) | Yes | Yes |
302+
| transpose (transpose) | Yes | Yes |
303+
| fusion (kInputFusion and kLoopFusion) | No | Yes |
304304

305305
These test cases are mainly implemented using linalg dialect, and the spriv test cases for
306306
relu are also provided. Each testcase is named following the pattern of `opname_shape_dtype.mlir`
@@ -341,11 +341,24 @@ fp32, fp16, int32 etc.
341341

342342

343343
## Profiling kernel execute time
344-
### sycl event
344+
### level-zero (l0) event
345345
```sh
346346
export IMEX_ENABLE_PROFILING=ON
347347
run the test
348348
```
349+
The profiling is based on warmup runs and profiling runs for stable results.
350+
By default, the number of warmup runs and profiling runs are 100 each.
351+
However, one can control them using envinronment variables:
352+
353+
```sh
354+
# 10 warm-up runs
355+
export IMEX_PROFILING_WARMUPS = 10
356+
# 10 profiling runs
357+
export IMEX_PROFILING_RUNS = 10
358+
```
359+
360+
The profiling result provides the min, max, avg, median and std dev of the execution time (in ms).
361+
349362
### trace tools
350363
```sh
351364
python {your_path}/imex_runner.py xxx -o test.mlir

0 commit comments

Comments
 (0)