You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tools/unitrace/README.md
+34-34Lines changed: 34 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Introduction
4
4
5
-
This a performance tool for Intel(R) oneAPI applications. It traces and profiles host/device activities, interactions and hardware utilizations for
5
+
This is a performance tool for Intel(R) oneAPI applications. It traces and profiles host/device activities, interactions, and hardware utilization for
6
6
Intel(R) GPU applications.
7
7
8
8
## Supported Platforms
@@ -98,7 +98,7 @@ cd test
98
98
python test_unitrace.py
99
99
```
100
100
101
-
By default, command **python test_unitrace.py** builds and runs all the tests. If the tests are already built and rebuilding the tests is not needed, you can use **--run** to skip buidling the tests:
101
+
By default, command **python test_unitrace.py** builds and runs all the tests. If the tests are already built and rebuilding the tests is not needed, you can use **--run** to skip building the tests:
102
102
103
103
```sh
104
104
cdtest
@@ -147,7 +147,7 @@ The options can be one or more of the following:
147
147
```
148
148
--call-logging [-c] Trace host API calls
149
149
--host-timing [-h] Report host API execution time
150
-
--device-timing [-d] Report kernels execution time
150
+
--device-timing [-d] Report kernel execution time
151
151
--ccl-summary-report [-r] Report CCL execution time summary
152
152
--kernel-submission [-s] Report append (queued), submit and execute intervals for kernels
153
153
--device-timeline [-t] Report device timeline
@@ -164,18 +164,18 @@ The options can be one or more of the following:
164
164
Device activities are traced per thread if this option is not present
165
165
--chrome-no-engine-on-device Trace device activities without per-Level-Zero-engine-or-OpenCL-queue info.
166
166
Device activities are traced per Level-Zero engine or OpenCL queue if this option is not present
167
-
--chrome-event-buffer-size <number-of-events> Size of event buffer on host per host thread(default is -1 or unlimited)
167
+
--chrome-event-buffer-size <number-of-events> Size of event buffer on host per host thread(default is -1 or unlimited)
168
168
--verbose [-v] Enable verbose mode to show kernel shapes
169
169
Kernel shapes are always enabled in timelines for Level Zero backend
170
170
--demangle Demangle kernel names. For OpenCL backend only. Kernel names are always demangled for Level Zero backend
171
171
--separate-tiles Trace each tile separately in case of implicit scaling
172
172
--tid Output TID in host API trace
173
173
--pid Output PID in host API and device activity trace
174
174
--output [-o] <filename> Output profiling result to file
175
-
--conditional-collection Enable conditional collection. This options is deprecated. Use --start-paused instead
175
+
--conditional-collection Enable conditional collection. This option is deprecated. Use --start-paused instead
176
176
--start-paused Start the tool with tracing and profiling paused
177
177
--output-dir-path <path> Output directory path for result files
178
-
--metric-query [-q] Query hardware metrics for each kernel instance is enabled for level-zero
178
+
--metric-query [-q] Query hardware metrics for each kernel instance (Level Zero only)
179
179
--metric-sampling [-k] Sample hardware performance metrics for each kernel instance in time-based mode
180
180
--group [-g] <metric-group> Hardware metric group (ComputeBasic by default)
181
181
--sampling-interval [-i] <interval> Hardware performance metric sampling interval in us (default is 50 us) in time-based mode
@@ -193,12 +193,12 @@ The options can be one or more of the following:
193
193
--pause <session> Pause session <session>. The argument <session> must be the same session named with --session option
194
194
--resume <session> Resume session <session>. The argument <session> must be the same session named with --session option
195
195
--stop <session> Stop session <session>. The argument <session> must be the same session named with --session option
196
-
--chrome-kmd-logging <script> Trace OS/KMD activitives. The argument <script> file defines the OS kernel or device driver activies to trace
197
-
--include-kernels <kernel-filters> Include kernels with name containing any of kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
198
-
--include-kernels-file <kernel-filter-file> Include kernels with name containing any of kernel filter strings in the <kernel-filter-file>.
199
-
--exclude-kernels<kernel-filters> Exclude kernels with name containing any of kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
200
-
--exclude-kernels-file <kernel-filter-file> Exclude kernels with name containing any of kernel filter strings in the <kernel-filter-file>.
201
-
--chrome-kmd-logging <script> Trace OS/KMD activitives. The argument <script> file defines the OS kernel or device driver activies to trace
196
+
--chrome-kmd-logging <script> Trace OS/KMD activities. The argument <script> file defines the OS kernel or device driver activities to trace
197
+
--include-kernels <kernel-filters> Include kernels with names containing any of the kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
198
+
--include-kernels-file <kernel-filter-file> Include kernels with names containing any of the kernel filter strings in the <kernel-filter-file>.
199
+
--exclude-kernels<kernel-filters> Exclude kernels with names containing any of the kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
200
+
--exclude-kernels-file <kernel-filter-file> Exclude kernels with names containing any of the kernel filter strings in the <kernel-filter-file>.
201
+
--chrome-kmd-logging <script> Trace OS/KMD activities. The argument <script> file defines the OS kernel or device driver activities to trace
202
202
--version Print version
203
203
--help Show this help message and exit. Please refer to the README.md file for further details.
204
204
```
@@ -302,7 +302,7 @@ The **--call-logging [-c]** option traces Level Zero and/or OpenCL calls on the
302
302
The **--host-timing [-h]** option outputs a Level Zero and/or OpenCL host call timing summary:
In case both **--chrome-kernel-logging** and **--chrome-device-logging** are present, **--chrome-kernel-logging** takes precedence.
363
-
### Include and Exlcude Kernels
363
+
### Include and Exclude Kernels
364
364
365
365
If you care about the performance of just a subset of kernels in an application, for example, kernels you are currently developing or optimizing, you can use the kernel inclusion and/or exclusion options **--include-kernels**, **--exclude-kernels**, **--include-kernels-file** and **--exclude-kernels-file** to instruct unitrace to profile and trace only the kernels of interest, reducing performance overhead and improving analysis efficiency.
366
366
@@ -443,11 +443,11 @@ The **--chrome-itt-logging** traces activities in applications instrumented usin
443
443
The **--ccl-summary-report [-r]** option outputs CCL call timing summary:
If the application is a PyTorch workload, one or more options from **--chrome-mpi-logging**, **--chrome-ccl-logging** and **--chrome-dnn-logging** also enables PyTorch profiling(see [Profile PyTorch](#profile-pytorch) for more information).
446
+
If the application is a PyTorch workload, one or more options from **--chrome-mpi-logging**, **--chrome-ccl-logging** and **--chrome-dnn-logging** also enable PyTorch profiling(see [Profile PyTorch](#profile-pytorch) for more information).
447
447
448
448
### Trace Operating System Kernel and/or Device Driver Activities (Linux)
449
449
450
-
To trace operating system kernel and/or device driver activities, yon must have root access and a [bpftrace](https://bpftrace.org) script as the argument to option **--chrome-kmd-logging**. The [script](/tools/unitrace/examples/kmdprobes/probes.bt) is a simple exmaple.
450
+
To trace operating system kernel and/or device driver activities, you must have root access and a [bpftrace](https://bpftrace.org) script as the argument to option **--chrome-kmd-logging**. The [script](/tools/unitrace/examples/kmdprobes/probes.bt) is a simple example.
451
451
452
452
The trace data for each operating system and/or GPU device driver event or function collected using bpftrace should be in the format of
453
453
@@ -459,7 +459,7 @@ The **data** is optional. If it is present, it will be treated as a string argum
459
459
460
460
The trace is stored in file **oskmd.0.json**.
461
461
462
-
The **--chrome-kmd-logging** can be used together with other options, for example, **--chrome-kernel-logging**, to trace user space and kernel space event at the same time, for example:
462
+
The **--chrome-kmd-logging** can be used together with other options, for example, **--chrome-kernel-logging**, to trace user space and kernel space events at the same time, for example:
By default, counters in **ComputeBasic** metric group are profiled. You can use the **--group [-g]** option to specify a different group. All available metric groups can be listed by **--metric-list** option.
512
512
513
513
#### Sample Metrics in Time-based Mode
514
514
515
-
Different from **--metric-query [-q]** option, the **--metric-sampling [-k]** option profile hardware metrics in time-based sampling mode.
515
+
Different from the **--metric-query [-q]** option, the **--metric-sampling [-k]** option profiles hardware metrics in time-based sampling mode.
516
516
517
517
```sh
518
518
unitrace -k -o perfmetrics.csv myapp
519
519
```
520
-
Performance metrics data are stored in **perfmetrics.<pid>.csv** file.
520
+
Performance metrics data are stored in **perfmetrics.pid.csv** file.
To kernels that take short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option, for example:
524
+
For kernels that take a short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option, for example:
525
525
526
526
```sh
527
527
unitrace -k -i 20 -o perfmetrics.csv myapp
528
528
```
529
529
530
530
By default, counters in **ComputeBasic** metric group are profiled. You can use the **--group [-g]** option to specify a different group. All available metric groups can be listed by **--metric-list** option.
531
531
532
-
The **--metric-sampling [-k]** option alone samples all devices. but it can be used together with the **--devices-to-sample** option to sample only specific devices. The devices are given in a comma-separated list of integer identifiers as reported by **--device-list**. Those identifiers that do not match actual devices will be ignored. In the event that no valid or existent device is specified, no sampling will be performed at all.
532
+
The **--metric-sampling [-k]** option alone samples all devices, but it can be used together with the **--devices-to-sample** option to sample only specific devices. The devices are given in a comma-separated list of integer identifiers as reported by **--device-list**. Those identifiers that do not match actual devices will be ignored. In the event that no valid or existent device is specified, no sampling will be performed at all.
533
533
534
534
#### Sample Stalls at Instruction Level
535
535
536
-
The **--stall-sampling** works on Intel(R) Data Center GPU Max Series and later products.
536
+
The **--stall-sampling**option works on Intel(R) Data Center GPU Max Series and later products.
To kernels that take short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option.
540
+
For kernels that take a short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option.
The **Device** is the device on which the metrics are sampled. In this example output, the decice is 0. If multiple devices are used and sampled, multiple sections of **Device** will be present.
613
+
The **Device** is the device on which the metrics are sampled. In this example output, the device is 0. If multiple devices are used and sampled, multiple sections of **Device** will be present.
614
614
615
-
The **Metric** section shows the metrics collected on the device and the **Kernel, Number of Instances**shows the kernels and number of instances for each kernel are profiled. An instance is one kernel execution sampled on the device. For example, The kernel "main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]" having 5 instances means the 5 exeuctions of the kernel are sampled. Please note that the number of instances of a kernel here may be less than the total number of exeuctions or submissions of the kernel in the application, especially when the kernel is short and/or sampling interval is large.
615
+
The **Metric** section shows the metrics collected on the device, and the **Kernel, Number of Instances**section lists each kernel with its number of sampled instances. An instance is one kernel execution sampled on the device. For example, the kernel "main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]" having 5 instances means the 5 executions of the kernel are sampled. Please note that the number of instances of a kernel here may be less than the total number of executions or submissions of the kernel in the application, especially when the kernel is short and/or sampling interval is large.
616
616
617
617
The number of instances is not applicable to stall sampling metric data:
618
618
@@ -656,7 +656,7 @@ This command plots a chart of XVE stall and function unit utilizations for the *
The metrics shown in the browser are the metrics passed to the **-m** option when you start **analyzeperfmetrics.py**. If you stop and restart **analyzeperfmetrics.py** with a different set of metrics passed to **-m** option, for example:
806
806
@@ -810,7 +810,7 @@ The metrics shown in the browser are the metrics passed to the **-m** option whe
810
810
811
811
Refreshing the same link will show the new metrics:
@@ -843,15 +843,15 @@ If both temporal or out-of-application control and spatial or in-application con
843
843
844
844
### Temporal or Out-of-Application Control (Linux Only)
845
845
846
-
The temporal or out-of-application control runs control commands in a sperate process to pause/resume/stop tracing/profiling. It does not require any application code change.
846
+
The temporal or out-of-application control runs control commands in a separate process to pause/resume/stop tracing/profiling. It does not require any application code change.
847
847
848
848
By default, a unitrace session is unnamed. To use temporal or out-of-application control, you have to name the unitrace session using the **--session** option. The name must be an alphanumeric string.
The optional **--start-paused** flag paues tracing/profiling of the application when it starts. Later, when it is the time to trace/profile the execution, you can run the following commnad in a different terminal:
854
+
The optional **--start-paused** flag pauses tracing/profiling of the application when it starts. Later, when it is time to trace/profile the execution, you can run the following command in a different terminal:
855
855
856
856
```sh
857
857
unitrace --resume mysession1
@@ -867,7 +867,7 @@ unitrace --pause mysession1
867
867
868
868
to pause tracing/profiling.
869
869
870
-
You can pause and resume multiple time. When all the executions of interest are traced/profiled, you can run command
870
+
You can pause and resume multiple times. When all the executions of interest are traced/profiled, you can run command
0 commit comments