You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,21 @@
2
2
3
3
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
4
4
5
+
## ROCm Compute Profiler 3.2.0 for ROCm 6.4.2
6
+
7
+
### Added
8
+
9
+
* Add FP8 metrics' support for MI300
10
+
* Add additional datatype for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
11
+
* Add datatype selection option for roofline profiling: --roofline-data-type / -R option (Default is FP32)
12
+
* Change dependency from rocm-smi to amd-smi
13
+
14
+
### Changed
15
+
16
+
17
+
### Resolved issues
18
+
* Fixed a crash related to Agent ID caused by the new format of the rocprofv3 output CSV file
Copy file name to clipboardExpand all lines: docs/how-to/profile/mode.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -398,6 +398,9 @@ Roofline options
398
398
Allows you to specify a device ID to collect performance data from when
399
399
running a roofline benchmark on your system.
400
400
401
+
``--roofline-data-type <datatype>``
402
+
Allows you to specify the data types that you want plotted in the roofline PDF output(s). Selecting more than one data type will overlay the results onto the same plot. Default data type: FP32
403
+
401
404
To distinguish different kernels in your ``.pdf`` roofline plot use
402
405
``--kernel-names``. This will give each kernel a unique marker identifiable from
403
406
the plot's key.
@@ -431,8 +434,7 @@ successfully.
431
434
432
435
$ ls workloads/vcopy/MI200/
433
436
total 48
434
-
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_fp32_fp64.pdf
435
-
-rw-r--r-- 1 auser agroup 13136 Mar 1 16:05 empirRoof_gpu-0_int8_fp16.pdf
437
+
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_FP32.pdf
436
438
drwxr-xr-x 1 auser agroup 0 Mar 1 16:03 perfmon
437
439
-rw-r--r-- 1 auser agroup 1101 Mar 1 16:03 pmc_perf.csv
438
440
-rw-r--r-- 1 auser agroup 1715 Mar 1 16:05 roofline.csv
@@ -441,11 +443,9 @@ successfully.
441
443
442
444
.. note::
443
445
444
-
ROCm Compute Profiler generates two roofline outputs to organize results and reduce
445
-
clutter. One chart plots FP32/FP64 performance while the other plots I8/FP16
446
-
performance.
446
+
ROCm Compute Profiler currently captures roofline profiling for all data types, and you can reduce the clutter in the PDF outputs by filtering the data type(s). Selecting multiple data types will overlay the results into the same PDF. To generate results in separate PDFs for each data type from the same workload run, you can re-run the profiling command with each data type as long as the ``roofline.csv`` file still exists in the workload folder.
447
447
448
-
The following image is a sample ``empirRoof_gpu-0_int8_fp16.pdf`` roofline
448
+
The following image is a sample ``empirRoof_gpu-0_FP32.pdf`` roofline
0 commit comments