Skip to content

Commit 4d3aa3f

Browse files
authored
[UNITRACE] Corrected ReadMe file text. (#791)
Signed-off-by: Sarbojit Sarkar <[email protected]>
1 parent ee149e6 commit 4d3aa3f

File tree

1 file changed

+34
-34
lines changed

1 file changed

+34
-34
lines changed

tools/unitrace/README.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44

5-
This a performance tool for Intel(R) oneAPI applications. It traces and profiles host/device activities, interactions and hardware utilizations for
5+
This is a performance tool for Intel(R) oneAPI applications. It traces and profiles host/device activities, interactions, and hardware utilization for
66
Intel(R) GPU applications.
77

88
## Supported Platforms
@@ -98,7 +98,7 @@ cd test
9898
python test_unitrace.py
9999
```
100100

101-
By default, command **python test_unitrace.py** builds and runs all the tests. If the tests are already built and rebuilding the tests is not needed, you can use **--run** to skip buidling the tests:
101+
By default, command **python test_unitrace.py** builds and runs all the tests. If the tests are already built and rebuilding the tests is not needed, you can use **--run** to skip building the tests:
102102

103103
```sh
104104
cd test
@@ -147,7 +147,7 @@ The options can be one or more of the following:
147147
```
148148
--call-logging [-c] Trace host API calls
149149
--host-timing [-h] Report host API execution time
150-
--device-timing [-d] Report kernels execution time
150+
--device-timing [-d] Report kernel execution time
151151
--ccl-summary-report [-r] Report CCL execution time summary
152152
--kernel-submission [-s] Report append (queued), submit and execute intervals for kernels
153153
--device-timeline [-t] Report device timeline
@@ -164,18 +164,18 @@ The options can be one or more of the following:
164164
Device activities are traced per thread if this option is not present
165165
--chrome-no-engine-on-device Trace device activities without per-Level-Zero-engine-or-OpenCL-queue info.
166166
Device activities are traced per Level-Zero engine or OpenCL queue if this option is not present
167-
--chrome-event-buffer-size <number-of-events> Size of event buffer on host per host thread(default is -1 or unlimited)
167+
--chrome-event-buffer-size <number-of-events> Size of event buffer on host per host thread (default is -1 or unlimited)
168168
--verbose [-v] Enable verbose mode to show kernel shapes
169169
Kernel shapes are always enabled in timelines for Level Zero backend
170170
--demangle Demangle kernel names. For OpenCL backend only. Kernel names are always demangled for Level Zero backend
171171
--separate-tiles Trace each tile separately in case of implicit scaling
172172
--tid Output TID in host API trace
173173
--pid Output PID in host API and device activity trace
174174
--output [-o] <filename> Output profiling result to file
175-
--conditional-collection Enable conditional collection. This options is deprecated. Use --start-paused instead
175+
--conditional-collection Enable conditional collection. This option is deprecated. Use --start-paused instead
176176
--start-paused Start the tool with tracing and profiling paused
177177
--output-dir-path <path> Output directory path for result files
178-
--metric-query [-q] Query hardware metrics for each kernel instance is enabled for level-zero
178+
--metric-query [-q] Query hardware metrics for each kernel instance (Level Zero only)
179179
--metric-sampling [-k] Sample hardware performance metrics for each kernel instance in time-based mode
180180
--group [-g] <metric-group> Hardware metric group (ComputeBasic by default)
181181
--sampling-interval [-i] <interval> Hardware performance metric sampling interval in us (default is 50 us) in time-based mode
@@ -193,12 +193,12 @@ The options can be one or more of the following:
193193
--pause <session> Pause session <session>. The argument <session> must be the same session named with --session option
194194
--resume <session> Resume session <session>. The argument <session> must be the same session named with --session option
195195
--stop <session> Stop session <session>. The argument <session> must be the same session named with --session option
196-
--chrome-kmd-logging <script> Trace OS/KMD activitives. The argument <script> file defines the OS kernel or device driver activies to trace
197-
--include-kernels <kernel-filters> Include kernels with name containing any of kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
198-
--include-kernels-file <kernel-filter-file> Include kernels with name containing any of kernel filter strings in the <kernel-filter-file>.
199-
--exclude-kernels<kernel-filters> Exclude kernels with name containing any of kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
200-
--exclude-kernels-file <kernel-filter-file> Exclude kernels with name containing any of kernel filter strings in the <kernel-filter-file>.
201-
--chrome-kmd-logging <script> Trace OS/KMD activitives. The argument <script> file defines the OS kernel or device driver activies to trace
196+
--chrome-kmd-logging <script> Trace OS/KMD activities. The argument <script> file defines the OS kernel or device driver activities to trace
197+
--include-kernels <kernel-filters> Include kernels with names containing any of the kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
198+
--include-kernels-file <kernel-filter-file> Include kernels with names containing any of the kernel filter strings in the <kernel-filter-file>.
199+
--exclude-kernels <kernel-filters> Exclude kernels with names containing any of the kernel filter strings. The argument <kernel-filters> is a comma-separated list of strings.
200+
--exclude-kernels-file <kernel-filter-file> Exclude kernels with names containing any of the kernel filter strings in the <kernel-filter-file>.
201+
--chrome-kmd-logging <script> Trace OS/KMD activities. The argument <script> file defines the OS kernel or device driver activities to trace
202202
--version Print version
203203
--help Show this help message and exit. Please refer to the README.md file for further details.
204204
```
@@ -302,7 +302,7 @@ The **--call-logging [-c]** option traces Level Zero and/or OpenCL calls on the
302302
The **--host-timing [-h]** option outputs a Level Zero and/or OpenCL host call timing summary:
303303
![Host Call Timing!](/tools/unitrace/doc/images/host-timing.png)
304304

305-
The **--chrome-call-logging** option generates a Level Zero and/or OpenCL host .json event trace that can be viewd in **https://ui.perfetto.dev/**:
305+
The **--chrome-call-logging** option generates a Level Zero and/or OpenCL host .json event trace that can be viewed in **https://ui.perfetto.dev/**:
306306
![Host Event Trace!](/tools/unitrace/doc/images/call-logging.png)
307307

308308

@@ -360,7 +360,7 @@ Device Logging:
360360
![Device Logging!](/tools/unitrace/doc/images/device-logging.png)
361361

362362
In case both **--chrome-kernel-logging** and **--chrome-device-logging** are present, **--chrome-kernel-logging** takes precedence.
363-
### Include and Exlcude Kernels
363+
### Include and Exclude Kernels
364364

365365
If you care about the performance of just a subset of kernels in an application, for example, kernels you are currently developing or optimizing, you can use the kernel inclusion and/or exclusion options **--include-kernels**, **--exclude-kernels**, **--include-kernels-file** and **--exclude-kernels-file** to instruct unitrace to profile and trace only the kernels of interest, reducing performance overhead and improving analysis efficiency.
366366

@@ -443,11 +443,11 @@ The **--chrome-itt-logging** traces activities in applications instrumented usin
443443
The **--ccl-summary-report [-r]** option outputs CCL call timing summary:
444444
![CCL Call Timing!](/tools/unitrace/doc/images/ccl_summary_report.png)
445445

446-
If the application is a PyTorch workload, one or more options from **--chrome-mpi-logging**, **--chrome-ccl-logging** and **--chrome-dnn-logging** also enables PyTorch profiling(see [Profile PyTorch](#profile-pytorch) for more information).
446+
If the application is a PyTorch workload, one or more options from **--chrome-mpi-logging**, **--chrome-ccl-logging** and **--chrome-dnn-logging** also enable PyTorch profiling (see [Profile PyTorch](#profile-pytorch) for more information).
447447

448448
### Trace Operating System Kernel and/or Device Driver Activities (Linux)
449449

450-
To trace operating system kernel and/or device driver activities, yon must have root access and a [bpftrace](https://bpftrace.org) script as the argument to option **--chrome-kmd-logging**. The [script](/tools/unitrace/examples/kmdprobes/probes.bt) is a simple exmaple.
450+
To trace operating system kernel and/or device driver activities, you must have root access and a [bpftrace](https://bpftrace.org) script as the argument to option **--chrome-kmd-logging**. The [script](/tools/unitrace/examples/kmdprobes/probes.bt) is a simple example.
451451

452452
The trace data for each operating system and/or GPU device driver event or function collected using bpftrace should be in the format of
453453

@@ -459,7 +459,7 @@ The **data** is optional. If it is present, it will be treated as a string argum
459459

460460
The trace is stored in file **oskmd.0.json**.
461461

462-
The **--chrome-kmd-logging** can be used together with other options, for example, **--chrome-kernel-logging**, to trace user space and kernel space event at the same time, for example:
462+
The **--chrome-kmd-logging** can be used together with other options, for example, **--chrome-kernel-logging**, to trace user space and kernel space events at the same time, for example:
463463

464464
```sh
465465
$ unitrace --chrome-kmd-logging probes.bt --chrome-kernel-logging ./testapp
@@ -487,7 +487,7 @@ unitrace --chrome-kernel-logging --output-dir-path /tmp/unitrace-result myapp
487487

488488
The output profile data are written to files in **/tmp/unitrace-result**.
489489

490-
This option is especially useful when the application is distributed workload.
490+
This option is especially useful when the application is a distributed workload.
491491

492492
### Hardware Performance Metrics
493493

@@ -504,40 +504,40 @@ The **--metric-query [-q]** option enables metric query for each kernel instance
504504
```sh
505505
unitrace -q -o perfquery.csv myapp
506506
```
507-
Performance metrics data are stored in **perfquery.<pid>.csv** file.
507+
Performance metrics data are stored in **perfquery.pid.csv** file.
508508

509509
![Metric Query!](/tools/unitrace/doc/images/metric-query.png)
510510

511511
By default, counters in **ComputeBasic** metric group are profiled. You can use the **--group [-g]** option to specify a different group. All available metric groups can be listed by **--metric-list** option.
512512

513513
#### Sample Metrics in Time-based Mode
514514

515-
Different from **--metric-query [-q]** option, the **--metric-sampling [-k]** option profile hardware metrics in time-based sampling mode.
515+
Different from the **--metric-query [-q]** option, the **--metric-sampling [-k]** option profiles hardware metrics in time-based sampling mode.
516516

517517
```sh
518518
unitrace -k -o perfmetrics.csv myapp
519519
```
520-
Performance metrics data are stored in **perfmetrics.<pid>.csv** file.
520+
Performance metrics data are stored in **perfmetrics.pid.csv** file.
521521

522522
![Metric Sampling!](/tools/unitrace/doc/images/metric-sampling.png)
523523

524-
To kernels that take short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option, for example:
524+
For kernels that take a short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option, for example:
525525

526526
```sh
527527
unitrace -k -i 20 -o perfmetrics.csv myapp
528528
```
529529

530530
By default, counters in **ComputeBasic** metric group are profiled. You can use the **--group [-g]** option to specify a different group. All available metric groups can be listed by **--metric-list** option.
531531

532-
The **--metric-sampling [-k]** option alone samples all devices. but it can be used together with the **--devices-to-sample** option to sample only specific devices. The devices are given in a comma-separated list of integer identifiers as reported by **--device-list**. Those identifiers that do not match actual devices will be ignored. In the event that no valid or existent device is specified, no sampling will be performed at all.
532+
The **--metric-sampling [-k]** option alone samples all devices, but it can be used together with the **--devices-to-sample** option to sample only specific devices. The devices are given in a comma-separated list of integer identifiers as reported by **--device-list**. Those identifiers that do not match actual devices will be ignored. In the event that no valid or existent device is specified, no sampling will be performed at all.
533533

534534
#### Sample Stalls at Instruction Level
535535

536-
The **--stall-sampling** works on Intel(R) Data Center GPU Max Series and later products.
536+
The **--stall-sampling** option works on Intel(R) Data Center GPU Max Series and later products.
537537

538538
![Metric Query!](/tools/unitrace/doc/images/stall-sampling.png)
539539

540-
To kernels that take short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option.
540+
For kernels that take a short time, you may find that the default sampling rate is not high enough and the sampling rate or the sampling interval needs to be adjusted using **--sampling-interval [-i]** option.
541541

542542
#### Sample Metrics of MPI Ranks
543543

@@ -610,9 +610,9 @@ Device 0
610610
"main::{lambda(auto:1)#7}[SIMD32 {2048; 1; 1} {512; 1; 1}]", 5
611611
```
612612

613-
The **Device** is the device on which the metrics are sampled. In this example output, the decice is 0. If multiple devices are used and sampled, multiple sections of **Device** will be present.
613+
The **Device** is the device on which the metrics are sampled. In this example output, the device is 0. If multiple devices are used and sampled, multiple sections of **Device** will be present.
614614

615-
The **Metric** section shows the metrics collected on the device and the **Kernel, Number of Instances** shows the kernels and number of instances for each kernel are profiled. An instance is one kernel execution sampled on the device. For example, The kernel "main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]" having 5 instances means the 5 exeuctions of the kernel are sampled. Please note that the number of instances of a kernel here may be less than the total number of exeuctions or submissions of the kernel in the application, especially when the kernel is short and/or sampling interval is large.
615+
The **Metric** section shows the metrics collected on the device, and the **Kernel, Number of Instances** section lists each kernel with its number of sampled instances. An instance is one kernel execution sampled on the device. For example, the kernel "main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]" having 5 instances means the 5 executions of the kernel are sampled. Please note that the number of instances of a kernel here may be less than the total number of executions or submissions of the kernel in the application, especially when the kernel is short and/or sampling interval is large.
616616

617617
The number of instances is not applicable to stall sampling metric data:
618618

@@ -656,7 +656,7 @@ This command plots a chart of XVE stall and function unit utilizations for the *
656656

657657
![Analyze Kernel Performance Metrics!](/tools/unitrace/doc/images/perfchart.png)
658658

659-
If instance is 0, all 5 instances of the kernel **"main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]"** are analyzed.
659+
If the instance is 0, all 5 instances of the kernel **"main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]"** are analyzed.
660660

661661
```sh
662662
python analyzeperfmetrics.py -d 0 -k "main::{lambda(auto:1)#4}[SIMD32 {4096; 1; 1} {256; 1; 1}]" -i 0 -m "XVE_STALL[%],XVE_INST_EXECUTED_ALU0_ALL_UTILIZATION[%],XVE_INST_EXECUTED_ALU1_ALL_UTILIZATION[%],XVE_INST_EXECUTED_SEND_ALL_UTILIZATION[%],XVE_INST_EXECUTED_CONTROL_ALL_UTILIZATION[%],XVE_INST_EXECUTED_XMX_ALL_UTILIZATION[%]" -y "Utilization and Stall (%)" -t "Utilization and Stall" -o perfchart.pdf perfmetrics.12345.csv
@@ -800,7 +800,7 @@ Now load the event trace .json file into https://ui.perfetto.dev:
800800

801801
Once you click the link next to **metrics:** in the **"Arguments"**, another browser window is opened:
802802

803-
![Performance Metrics Browswe Window!](/tools/unitrace/doc/images/perfmetricsbrowser.png)
803+
![Performance Metrics Browser Window!](/tools/unitrace/doc/images/perfmetricsbrowser.png)
804804

805805
The metrics shown in the browser are the metrics passed to the **-m** option when you start **analyzeperfmetrics.py**. If you stop and restart **analyzeperfmetrics.py** with a different set of metrics passed to **-m** option, for example:
806806

@@ -810,7 +810,7 @@ The metrics shown in the browser are the metrics passed to the **-m** option whe
810810

811811
Refreshing the same link will show the new metrics:
812812

813-
![Performance Metrics Browswe Window #2!](/tools/unitrace/doc/images/perfmetricsbrowser2.png)
813+
![Performance Metrics Browser Window #2!](/tools/unitrace/doc/images/perfmetricsbrowser2.png)
814814

815815
In case of stall sampling, for example:
816816

@@ -824,7 +824,7 @@ The **-m** option is not required for **analyzeperfmetrics.py**:
824824
python analyzeperfmetrics.py -s ./dump.1 -p ./perfstall.metrics.564289.csv -t "XVE Stall Statistics and Report"
825825
```
826826

827-
Rereshing the same link will show stall statistics by type and instruction address:
827+
Refreshing the same link will show stall statistics by type and instruction address:
828828

829829
![Stall Statistics!](/tools/unitrace/doc/images/stallstatistics.png)
830830

@@ -843,15 +843,15 @@ If both temporal or out-of-application control and spatial or in-application con
843843

844844
### Temporal or Out-of-Application Control (Linux Only)
845845

846-
The temporal or out-of-application control runs control commands in a sperate process to pause/resume/stop tracing/profiling. It does not require any application code change.
846+
The temporal or out-of-application control runs control commands in a separate process to pause/resume/stop tracing/profiling. It does not require any application code change.
847847

848848
By default, a unitrace session is unnamed. To use temporal or out-of-application control, you have to name the unitrace session using the **--session** option. The name must be an alphanumeric string.
849849

850850
```sh
851851
unitrace --chrome-call-logging --chrome-kernel-logging --session mysession1 --start-paused <application> [args]
852852
```
853853

854-
The optional **--start-paused** flag paues tracing/profiling of the application when it starts. Later, when it is the time to trace/profile the execution, you can run the following commnad in a different terminal:
854+
The optional **--start-paused** flag pauses tracing/profiling of the application when it starts. Later, when it is time to trace/profile the execution, you can run the following command in a different terminal:
855855

856856
```sh
857857
unitrace --resume mysession1
@@ -867,7 +867,7 @@ unitrace --pause mysession1
867867

868868
to pause tracing/profiling.
869869

870-
You can pause and resume multiple time. When all the executions of interest are traced/profiled, you can run command
870+
You can pause and resume multiple times. When all the executions of interest are traced/profiled, you can run command
871871

872872
```sh
873873
unitrace --stop mysession1

0 commit comments

Comments
 (0)