You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
4
4
5
-
## Unreleased
5
+
## ROCm Compute Profiler 3.2.1 for ROCm 7.0.0
6
6
7
7
### Added
8
8
9
-
* Support Roofline plot on CLI (single run)
10
-
11
-
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
12
-
13
-
* Sorting of PC sampling by type: offset or count.
14
-
15
-
* Add rocprof-compute Text User Interface (TUI) support for analyze mode (beta version)
16
-
* A command line based user interface to support interactive single-run analysis
17
-
* launch with `--tui` option in analyze mode. i.e., `rocprof-compute analyze --tui`
18
-
19
-
* Add support to be able to acquire from rocprofv3 every single channle on each XCD of TCC counters
20
-
21
-
* Add Docker files to package the application and dependencies into a single portable and executable standalone binary file
22
-
23
-
* Analysis report based filtering
24
-
* -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
25
-
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
26
-
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
27
-
28
-
* Datatype selection option for roofline profiling
29
-
* --roofline-data-type / -R option added to specify which datatypes the user wants to capture in the roofline PDF plot outputs
30
-
* Default is FP32, but user can specify as many types as desired to overlay on the same plot output
31
-
32
-
* Additional datatypes for roofline profiling
33
-
* Now supports FP4, FP6, FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
34
-
35
-
* Support host-trap PC Sampling on CLI (beta version)
9
+
#### CDNA4 (AMD Instinct MI350/MI355) support
36
10
37
11
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
38
12
* VALU co-issue (Two VALUs are issued instructions) efficiency
@@ -53,63 +27,129 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
53
27
* L2 to EA stalls
54
28
* L2 to EA stalls per channel
55
29
56
-
* Roofline support for RHEL 10
30
+
* Roofline support for AMD Instinct MI350 series architecture.
31
+
32
+
#### Textual User Interface (TUI) (beta version)
33
+
34
+
* Text User Interface (TUI) support for analyze mode
35
+
* A command line based user interface to support interactive single-run analysis
36
+
* To launch, use `--tui` option in analyze mode. For example, ``rocprof-compute analyze --tui``.
37
+
38
+
#### PC Sampling (beta version)
39
+
40
+
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
41
+
42
+
* Host-trap PC Sampling has been enabled for AMD Instinct MI200 series and later accelerators.
43
+
44
+
* Support for sorting of PC sampling by type: offset or count.
45
+
46
+
* PC Sampling Support on CLI and TUI analysis.
47
+
48
+
#### Roofline
49
+
50
+
* Support for Roofline plot on CLI (single run) analysis.
51
+
52
+
* Roofline support for RHEL 10 OS.
57
53
58
-
*Roofline support for MI350 series architecture
54
+
*FP4 and FP6 data types have been added for roofline profiling on AMD Instinct MI350 series.
59
55
60
-
* Interface to rocprofiler-sdk
61
-
* Setting ROCPROF=rocprofiler-sdk environment variable will use rocprofiler-sdk C++ library instead of rocprofv3 python script
56
+
#### rocprofv3 support
57
+
58
+
*``rocprofv3`` is supported as the default backend for profiling.
59
+
* Support to obtain performance information for all channels for TCC counters.
60
+
* Support for profiling on AMD Instinct MI 100 using ``rocprofv3``.
61
+
* Deprecation warning for ``rocprofv3`` interface in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
62
+
63
+
#### Others
64
+
65
+
* Docker files to package the application and dependencies into a single portable and executable standalone binary file.
66
+
67
+
* Analysis report based filtering
68
+
*``-b`` option in profile mode now also accepts metric id(s) for analysis report based filtering.
69
+
*``-b`` option in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.
70
+
*``--list-metrics`` option added in profile mode to list possible metric id(s), similar to analyze mode.
71
+
72
+
* Interface to ROCprofiler-SDK.
73
+
* Setting the environment variable ``ROCPROF=rocprofiler-sdk`` will use ROCprofiler-SDK C++ library instead of ``rocprofv3`` python script.
62
74
* Add --rocprofiler-sdk-library-path runtime option to choose the path to rocprofiler-sdk library to be used
63
75
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
64
76
65
77
* Support MEM chart on CLI (single run)
66
78
67
-
* Add deprecation warning for database update mode.
79
+
* Deprecation warning for MongoDB database update mode.
80
+
81
+
* Deprecation warning for ``rocm-smi``
82
+
83
+
*``--specs-correction`` option to provide missing system specifications for analysis.
68
84
69
85
### Changed
70
86
71
-
* Change the default rocprof version to rocprofv3, this is used when environment variable "ROCPROF" is not set
72
-
* Change the rocprof version for unit tests to rocprofv3 on all SoCs except MI100
73
-
* Change normal_unit default to per_kernel
74
-
* Change dependency from rocm-smi to amd-smi
75
-
* Decrease profiling time by not collecting counters not used in post analysis
76
-
* Update definition of following metrics for MI 350:
77
-
* VGPR Writes
78
-
* Total FLOPs (consider fp6 and fp4 ops)
79
-
* Update Dash to >=3.0.0 (for web UI)
80
-
* Change when Roofline PDFs are generated- during general profiling and --roof-only profiling (skip only when --no-roof option is present)
81
-
* Update Roofline binaries
87
+
* Changed the default ``rocprof`` version to ``rocprofv3``. This is used when environment variable ``ROCPROF`` is not set.
88
+
* Changed ``normal_unit`` default to ``per_kernel``.
89
+
* Decreased profiling time by not collecting unused counters in post-analysis.
90
+
* Updated Dash to >=3.0.0 (for web UI).
91
+
* Changed the condition when Roofline PDFs are generated during general profiling and ``--roof-only`` profiling (skip only when ``--no-roof`` option is present).
92
+
* Updated Roofline binaries:
82
93
* Rebuild using latest ROCm stack
83
-
* OS distribution support minimum for roofline feature is now Ubuntu22.04, RHEL9, and SLES15SP6
94
+
* Minimum OS distribution support minimum for roofline feature is now Ubuntu 22.04, RHEL 9, and SLES15 SP6.
95
+
96
+
### Optimized
97
+
98
+
* ROCm Compute Profiler CLI has been improved to better display the GPU architecture analytics
84
99
85
100
### Resolved issues
86
101
87
-
* Fixed MI 100 counters not being collected when rocprofv3 is used
88
-
* Fixed option specs-correction
89
-
* Fixed kernel name and kernel dispatch filtering when using rocprof v3
90
-
* Fixed not collecting TCC channel counters in rocprof v3
91
-
* Fixed peak FLOPS of F8 I8 F16 and BF16 on MI300
102
+
* Fixed kernel name and kernel dispatch filtering when using ``rocprofv3``.
103
+
* Fixed an issue of TCC channel counters collection in ``rocprofv3``.
104
+
* Fixed peak FLOPS of F8, I8, F16, and BF16 on AMD Instinct MI 300.
92
105
93
106
### Known issues
94
107
95
-
* On MI 100, accumulation counters will not be collected and the following metrics will not show up in analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
96
-
* As a workaround, use ROCPROF=rocprof environement variable, to use rocprofv1 for profiling on MI 100
108
+
* On AMD Instinct MI100, accumulation counters are not collected, resulting in the following metrics failing to show up in the analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
109
+
* As a workaround, use the environment variable``ROCPROF=rocprof``, to use ``rocprof v1``for profiling on AMD Instinct MI100.
97
110
98
-
* GPU id filtering is not supported when using rocprof v3
111
+
* GPU id filtering is not supported when using ``rocprofv3``.
99
112
100
-
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
101
-
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
102
-
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
103
-
This assumes your system specification hasn't changed since the creation of the previous workload data.
113
+
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change.
114
+
* As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
115
+
Followed by copying the ``sysinfo.csv`` file from the new data folder to the old one.
116
+
This assumes your system specification hasn't changed since the creation of the previous workload data.
104
117
105
118
* Analysis of new workloads might require providing shader/memory clock speed using
106
-
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
119
+
``--specs-correction`` operation if amd-smi or rocminfo does not provide clock speeds.
107
120
108
-
* Memory chart on CLI might look corrupted if CLI width is too narrow
121
+
* Memory chart on ROCm Compute Profiler CLI might look corrupted if the CLI width is too narrow.
109
122
110
123
### Removed
111
124
112
125
* Roofline support for Ubuntu 20.04 and SLES below 15.6
126
+
* Removed support for AMD Instinct MI50 and MI60.
127
+
128
+
### Upcoming changes
129
+
130
+
*``rocprof v1/v2/v3`` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
131
+
* To use ROCprofiler-SDK interface, set environment variable `ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ``--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so``
132
+
* Hardware IP block based filtering using ``-b`` option in profile mode will be removed in favor of analysis report block based filtering using ``-b`` option in profile mode.
133
+
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
134
+
* MongoDB database support will be removed.
135
+
* Usage of ``rocm-smi`` will be removed in favor of ``amd-smi``.
136
+
137
+
## ROCm Compute Profiler 3.1.1 for ROCm 6.4.2
138
+
139
+
### Added
140
+
141
+
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
142
+
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
143
+
* Data type selection option ``--roofline-data-type / -R`` for roofline profiling. The default data type is FP32.
144
+
145
+
### Changed
146
+
147
+
* Change dependency from `rocm-smi` to `amd-smi`.
148
+
149
+
### Resolved issues
150
+
151
+
* Fixed a crash related to Agent ID caused by the new format of the `rocprofv3` output CSV file.
0 commit comments