Skip to content

Commit b934f19

Browse files
vedithal-amdxuchen-amdprbasyal-amdfeizheng10vstojilj
authored
ROCm release 7.0 RC3 (#815)
* Generalize config path. (#802) * Generalize config path. * Fix format. * Fix typo. * Fix roofline and TUI bugs (#803) * Fix roofline rocm version bug * Fix utils bug * Remove unnecessary tests * Do not check textual-fspicker package in cmake build * Use rocprofv3 to test MI 100 and fix tests * tui user experience improvement (#805) * roofline footnote updated (#808) * Update PC sampling doc (#798) * Update cli doc description (#804) * Update peak flops for MI350 (#810) * Update TUI docs. (#796) * Architecture data support and diagrams added (#814) * Architeture data support and diagrams added * Architecture image added * CDNA4 Image updated * Review feedback incorporated * CDNA 4 partition mode added * Fei review feedback incorporated * Update VERSION and CHANGELOG * Update CHANGELOG for better readability * Update CHANGELOG for better readability * Update CHANGELOG for better readability * Update CHANGELOG for better readability * Update CHANGELOG for better readability * Update CHANGELOG for better readability * Update CHANGELOG for better readability * rocm-smi deprecation warning (#806) * Minor editorial changes data type selection feature (#816) * Add missing <cassert> include (#800) Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com> * Remove MI50/MI60 gfx906 support per documentation (#819) * Address documentation review of CHANGELOG * Update standalone roofline intro (#830) * Address doc review comments * Address doc review comments * Fix roofline block print during CLI output (#811) Fix roofline panel in CLI analyze stage when using --block option. Improve roofline CLI output checks and logs. --------- Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com> * Correct CHANGELOG * Fix CHANGELOG --------- Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com> Co-authored-by: xuchen-amd <xuchen@amd.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com> Co-authored-by: vstojilj <vstojilj@amd.com> Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com> Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>
1 parent c51b9d1 commit b934f19

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+777
-4130
lines changed

CHANGELOG.md

Lines changed: 100 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -2,45 +2,19 @@
22

33
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
44

5-
## Unreleased
5+
## ROCm Compute Profiler 3.2.1 for ROCm 7.0.0
66

77
### Added
88

9-
* Support Roofline plot on CLI (single run)
10-
11-
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
12-
13-
* Sorting of PC sampling by type: offset or count.
14-
15-
* Add rocprof-compute Text User Interface (TUI) support for analyze mode (beta version)
16-
* A command line based user interface to support interactive single-run analysis
17-
* launch with `--tui` option in analyze mode. i.e., `rocprof-compute analyze --tui`
18-
19-
* Add support to be able to acquire from rocprofv3 every single channle on each XCD of TCC counters
20-
21-
* Add Docker files to package the application and dependencies into a single portable and executable standalone binary file
22-
23-
* Analysis report based filtering
24-
* -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
25-
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
26-
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
27-
28-
* Datatype selection option for roofline profiling
29-
* --roofline-data-type / -R option added to specify which datatypes the user wants to capture in the roofline PDF plot outputs
30-
* Default is FP32, but user can specify as many types as desired to overlay on the same plot output
31-
32-
* Additional datatypes for roofline profiling
33-
* Now supports FP4, FP6, FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
34-
35-
* Support host-trap PC Sampling on CLI (beta version)
9+
#### CDNA4 (AMD Instinct MI350/MI355) support
3610

3711
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
3812
* VALU co-issue (Two VALUs are issued instructions) efficiency
3913
* Stream Processor Instruction (SPI) Wave Occupancy
4014
* Scheduler-Pipe Wave Utilization
4115
* Scheduler FIFO Full Rate
4216
* CPC ADC Utilization
43-
* F6F4 datatype metrics
17+
* F6F4 data type metrics
4418
* Update formula for total FLOPs while taking into account F6F4 ops
4519
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
4620
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
@@ -53,63 +27,129 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
5327
* L2 to EA stalls
5428
* L2 to EA stalls per channel
5529

56-
* Roofline support for RHEL 10
30+
* Roofline support for AMD Instinct MI350 series architecture.
31+
32+
#### Textual User Interface (TUI) (beta version)
33+
34+
* Text User Interface (TUI) support for analyze mode
35+
* A command line based user interface to support interactive single-run analysis
36+
* To launch, use `--tui` option in analyze mode. For example, ``rocprof-compute analyze --tui``.
37+
38+
#### PC Sampling (beta version)
39+
40+
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
41+
42+
* Host-trap PC Sampling has been enabled for AMD Instinct MI200 series and later accelerators.
43+
44+
* Support for sorting of PC sampling by type: offset or count.
45+
46+
* PC Sampling Support on CLI and TUI analysis.
47+
48+
#### Roofline
49+
50+
* Support for Roofline plot on CLI (single run) analysis.
51+
52+
* Roofline support for RHEL 10 OS.
5753

58-
* Roofline support for MI350 series architecture
54+
* FP4 and FP6 data types have been added for roofline profiling on AMD Instinct MI350 series.
5955

60-
* Interface to rocprofiler-sdk
61-
* Setting ROCPROF=rocprofiler-sdk environment variable will use rocprofiler-sdk C++ library instead of rocprofv3 python script
56+
#### rocprofv3 support
57+
58+
* ``rocprofv3`` is supported as the default backend for profiling.
59+
* Support to obtain performance information for all channels for TCC counters.
60+
* Support for profiling on AMD Instinct MI 100 using ``rocprofv3``.
61+
* Deprecation warning for ``rocprofv3`` interface in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
62+
63+
#### Others
64+
65+
* Docker files to package the application and dependencies into a single portable and executable standalone binary file.
66+
67+
* Analysis report based filtering
68+
* ``-b`` option in profile mode now also accepts metric id(s) for analysis report based filtering.
69+
* ``-b`` option in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.
70+
* ``--list-metrics`` option added in profile mode to list possible metric id(s), similar to analyze mode.
71+
72+
* Interface to ROCprofiler-SDK.
73+
* Setting the environment variable ``ROCPROF=rocprofiler-sdk`` will use ROCprofiler-SDK C++ library instead of ``rocprofv3`` python script.
6274
* Add --rocprofiler-sdk-library-path runtime option to choose the path to rocprofiler-sdk library to be used
6375
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
6476

6577
* Support MEM chart on CLI (single run)
6678

67-
* Add deprecation warning for database update mode.
79+
* Deprecation warning for MongoDB database update mode.
80+
81+
* Deprecation warning for ``rocm-smi``
82+
83+
* ``--specs-correction`` option to provide missing system specifications for analysis.
6884

6985
### Changed
7086

71-
* Change the default rocprof version to rocprofv3, this is used when environment variable "ROCPROF" is not set
72-
* Change the rocprof version for unit tests to rocprofv3 on all SoCs except MI100
73-
* Change normal_unit default to per_kernel
74-
* Change dependency from rocm-smi to amd-smi
75-
* Decrease profiling time by not collecting counters not used in post analysis
76-
* Update definition of following metrics for MI 350:
77-
* VGPR Writes
78-
* Total FLOPs (consider fp6 and fp4 ops)
79-
* Update Dash to >=3.0.0 (for web UI)
80-
* Change when Roofline PDFs are generated- during general profiling and --roof-only profiling (skip only when --no-roof option is present)
81-
* Update Roofline binaries
87+
* Changed the default ``rocprof`` version to ``rocprofv3``. This is used when environment variable ``ROCPROF`` is not set.
88+
* Changed ``normal_unit`` default to ``per_kernel``.
89+
* Decreased profiling time by not collecting unused counters in post-analysis.
90+
* Updated Dash to >=3.0.0 (for web UI).
91+
* Changed the condition when Roofline PDFs are generated during general profiling and ``--roof-only`` profiling (skip only when ``--no-roof`` option is present).
92+
* Updated Roofline binaries:
8293
* Rebuild using latest ROCm stack
83-
* OS distribution support minimum for roofline feature is now Ubuntu22.04, RHEL9, and SLES15SP6
94+
* Minimum OS distribution support minimum for roofline feature is now Ubuntu 22.04, RHEL 9, and SLES15 SP6.
95+
96+
### Optimized
97+
98+
* ROCm Compute Profiler CLI has been improved to better display the GPU architecture analytics
8499

85100
### Resolved issues
86101

87-
* Fixed MI 100 counters not being collected when rocprofv3 is used
88-
* Fixed option specs-correction
89-
* Fixed kernel name and kernel dispatch filtering when using rocprof v3
90-
* Fixed not collecting TCC channel counters in rocprof v3
91-
* Fixed peak FLOPS of F8 I8 F16 and BF16 on MI300
102+
* Fixed kernel name and kernel dispatch filtering when using ``rocprofv3``.
103+
* Fixed an issue of TCC channel counters collection in ``rocprofv3``.
104+
* Fixed peak FLOPS of F8, I8, F16, and BF16 on AMD Instinct MI 300.
92105

93106
### Known issues
94107

95-
* On MI 100, accumulation counters will not be collected and the following metrics will not show up in analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
96-
* As a workaround, use ROCPROF=rocprof environement variable, to use rocprofv1 for profiling on MI 100
108+
* On AMD Instinct MI100, accumulation counters are not collected, resulting in the following metrics failing to show up in the analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
109+
* As a workaround, use the environment variable ``ROCPROF=rocprof``, to use ``rocprof v1`` for profiling on AMD Instinct MI100.
97110

98-
* GPU id filtering is not supported when using rocprof v3
111+
* GPU id filtering is not supported when using ``rocprofv3``.
99112

100-
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
101-
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
102-
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
103-
This assumes your system specification hasn't changed since the creation of the previous workload data.
113+
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change.
114+
* As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
115+
Followed by copying the ``sysinfo.csv`` file from the new data folder to the old one.
116+
This assumes your system specification hasn't changed since the creation of the previous workload data.
104117

105118
* Analysis of new workloads might require providing shader/memory clock speed using
106-
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
119+
``--specs-correction`` operation if amd-smi or rocminfo does not provide clock speeds.
107120

108-
* Memory chart on CLI might look corrupted if CLI width is too narrow
121+
* Memory chart on ROCm Compute Profiler CLI might look corrupted if the CLI width is too narrow.
109122

110123
### Removed
111124

112125
* Roofline support for Ubuntu 20.04 and SLES below 15.6
126+
* Removed support for AMD Instinct MI50 and MI60.
127+
128+
### Upcoming changes
129+
130+
* ``rocprof v1/v2/v3`` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
131+
* To use ROCprofiler-SDK interface, set environment variable `ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ``--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so``
132+
* Hardware IP block based filtering using ``-b`` option in profile mode will be removed in favor of analysis report block based filtering using ``-b`` option in profile mode.
133+
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
134+
* MongoDB database support will be removed.
135+
* Usage of ``rocm-smi`` will be removed in favor of ``amd-smi``.
136+
137+
## ROCm Compute Profiler 3.1.1 for ROCm 6.4.2
138+
139+
### Added
140+
141+
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
142+
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
143+
* Data type selection option ``--roofline-data-type / -R`` for roofline profiling. The default data type is FP32.
144+
145+
### Changed
146+
147+
* Change dependency from `rocm-smi` to `amd-smi`.
148+
149+
### Resolved issues
150+
151+
* Fixed a crash related to Agent ID caused by the new format of the `rocprofv3` output CSV file.
152+
113153

114154
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
115155

CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,11 @@ if(CHECK_PYTHON_DEPS)
9292
if(${ARGV0} STREQUAL "pyyaml")
9393
set(PACKAGE "yaml")
9494
endif()
95+
# Skip check for textual-fspicker
96+
if(${package} STREQUAL "textual-fspicker")
97+
message(STATUS "Skipping check for textual-fspicker")
98+
return()
99+
endif()
95100
execute_process(
96101
COMMAND ${Python3_EXECUTABLE} -c "import ${PACKAGE}"
97102
OUTPUT_QUIET ERROR_QUIET

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.2.0
1+
3.2.1

0 commit comments

Comments
 (0)