Full documentation for ROCm Compute Profiler is available at https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/.
- Add
rocpdchoice for--format-rocprof-outputoption in profile mode - Add
--retain-rocpd-outputoption in profile mode to save large raw rocpd databases in workload directory - Show description of metrics during analysis
- Use
--include-cols Descriptionto show the Description column, which is excluded by default from the ROCm Compute Profiler CLI output.
- Use
-
Add notice for change in default output format to
rocpdin a future release- This is displayed when
--format-rocprof-output rocpdis not used in profile mode
- This is displayed when
-
When
--format-rocprof-output rocpdis used, only pmc_perf.csv will be written to workload directory instead of mulitple csv files. -
Improve analysis block based filtering to accept metric id level filtering
- This can be used to collect individual metrics from various sections of analysis config
-
CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
- Remove metrics from analysis configuration files which are explicitly marked as empty or None
-
Change the basic view of TUI from aggregated analysis data to individual kernel analysis data
- Fixed not detecting memory clock issue when using amd-smi
- Fixed standalone GUI crashing
- Fixed L2 read/write/atomic bandwidths on MI350
- Update metric names for better alignment between analysis configuration and documentation
- Improved
--time-unitoption in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
- Usage of rocm-smi
- Hardware IP block based filtering has been removed in favor of analysis report block based filtering
- Remove aggregated analysis view from TUI mode
-
Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
- VALU co-issue (Two VALUs are issued instructions) efficiency
- Stream Processor Instruction (SPI) Wave Occupancy
- Scheduler-Pipe Wave Utilization
- Scheduler FIFO Full Rate
- CPC ADC Utilization
- F6F4 data type metrics
- Update formula for total FLOPs while taking into account F6F4 ops
- LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
- LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
- LDS FIFO full rate
- Sequencer -> TA ADDR Stall rates
- Sequencer -> TA CMD Stall rates
- Sequencer -> TA DATA Stall rates
- L1 latencies
- L2 latencies
- L2 to EA stalls
- L2 to EA stalls per channel
-
Roofline support for AMD Instinct MI350 series architecture.
- Text User Interface (TUI) support for analyze mode
- A command line based user interface to support interactive single-run analysis
- To launch, use
--tuioption in analyze mode. For example,rocprof-compute analyze --tui.
-
Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
-
Host-trap PC Sampling has been enabled for AMD Instinct MI200 series and later accelerators.
-
Support for sorting of PC sampling by type: offset or count.
-
PC Sampling Support on CLI and TUI analysis.
-
Support for Roofline plot on CLI (single run) analysis.
-
Roofline support for RHEL 10 OS.
-
FP4andFP6data types have been added for roofline profiling on AMD Instinct MI350 series.
rocprofv3is supported as the default backend for profiling.- Support to obtain performance information for all channels for TCC counters.
- Support for profiling on AMD Instinct MI 100 using
rocprofv3. - Deprecation warning for
rocprofv3interface in favor of the ROCprofiler-SDK interface, which directly accessesrocprofv3C++ tool.
-
Docker files to package the application and dependencies into a single portable and executable standalone binary file.
-
Analysis report based filtering
-boption in profile mode now also accepts metric id(s) for analysis report based filtering.-boption in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.--list-metricsoption added in profile mode to list possible metric id(s), similar to analyze mode.
-
Support MEM chart on CLI (single run)
-
--specs-correctionoption to provide missing system specifications for analysis.
- Changed the default
rocprofversion torocprofv3. This is used when environment variableROCPROFis not set. - Changed
normal_unitdefault toper_kernel. - Decreased profiling time by not collecting unused counters in post-analysis.
- Updated Dash to >=3.0.0 (for web UI).
- Changed the condition when Roofline PDFs are generated during general profiling and
--roof-onlyprofiling (skip only when--no-roofoption is present). - Updated Roofline binaries:
- Rebuild using latest ROCm stack
- Minimum OS distribution support minimum for roofline feature is now Ubuntu 22.04, RHEL 8, and SLES15 SP6.
- Roofline support for Ubuntu 20.04 and SLES below 15.6
- Removed support for AMD Instinct MI50 and MI60.
- ROCm Compute Profiler CLI has been improved to better display the GPU architecture analytics
- Fixed kernel name and kernel dispatch filtering when using
rocprofv3. - Fixed an issue of TCC channel counters collection in
rocprofv3. - Fixed peak FLOPS of
F8,I8,F16, andBF16on AMD Instinct MI300. - Fixed not detecting memory clock issue when using amd-smi
- Fixed standalone GUI crashing
- Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series.
-
On AMD Instinct MI100, accumulation counters are not collected, resulting in the following metrics failing to show up in the analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
- As a workaround, use the environment variable
ROCPROF=rocprof, to userocprof v1for profiling on AMD Instinct MI100.
- As a workaround, use the environment variable
-
GPU id filtering is not supported when using
rocprofv3. -
Analysis of previously collected workload data will not work due to sysinfo.csv schema change.
- As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
Followed by copying the
sysinfo.csvfile from the new data folder to the old one. This assumes your system specification hasn't changed since the creation of the previous workload data.
- As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
Followed by copying the
-
Analysis of new workloads might require providing shader/memory clock speed using
--specs-correctionoperation if amd-smi or rocminfo does not provide clock speeds. -
Memory chart on ROCm Compute Profiler CLI might look corrupted if the CLI width is too narrow.
-
Roofline feature is currently not functional on Azure Linux 3.0 and Debian 12.
rocprof v1/v2/v3interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accessesrocprofv3C++ tool. Usingrocprof v1/v2/v3interfaces will trigger a deprecation warning.- To use ROCprofiler-SDK interface, set environment variable
ROCPROF=rocprofiler-sdkand optionally provide profile mode option--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so. Add--rocprofiler-sdk-library-pathruntime option to choose the path to ROCprofiler-SDK library to be used.
- To use ROCprofiler-SDK interface, set environment variable
- Hardware IP block based filtering using
-boption in profile mode will be removed in favor of analysis report block based filtering using-boption in profile mode. - MongoDB database support will be removed, and a deprecation warning has been added to the application interface.
- Usage of
rocm-smiis deprecated in favor ofamd-smi, and a deprecation warning has been added to the application interface.
- 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
- Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
- Data type selection option
--roofline-data-type / -Rfor roofline profiling. The default data type is FP32.
- Change dependency from
rocm-smitoamd-smi.
- Fixed a crash related to Agent ID caused by the new format of the
rocprofv3output CSV file.
- Roofline support for Ubuntu 24.04
- Experimental support rocprofv3 (not enabled as default)
- Fixed PoP of VALU Active Threads
- Workaround broken mclk for old version of rocm-smi
- Renamed Omniperf to ROCm Compute Profiler (#475)
- enable rocprofv1 for MI300 hardware (#391)
- refactoring and updating documemtation (#362, #394, #398, #414, #420)
- branch renaming and workflow updates (#389, #404, #409)
- bug fix for analysis output
- add dependency checks on application launch (#393)
- patch for profiling multi-process/multi-GPU applications (#376, #396)
- packaging updates (#386)
- rename CHANGES to CHANGELOG.md (#410)
- rollback Grafana version in Dockerfile for Angular plugin compatibility (#416)
- enable CI triggers for Azure CI (#426)
- add GPU model distinction for MI300 systems (#423)
- new MAINTAINERS.md guide for omniperf publishing procedures (#402)
- reduced running time of Omniperf when profiling (#384)
- console logging improvements
- new option to force hardware target via
OMNIPERF_ARCH_OVERRIDEglobal (#370) - CI/CD support for MI300 hardware (#373)
- support for MI308X hardware (#375)
- cmake build improvements (#374)
- improved logging than spans all modes (#177) (#317) (#335) (#341)
- overhauled CI/CD that spans all modes (#179)
- extensible SoC classes to better support adding new hardware configs (#180)
- --kernel-verbose no longer overwrites kernel names (#193)
- general cleanup and improved organization of source code (#200) (#210)
- separate requirement files for docs and testing dependencies (#205) (#262) (#358)
- add support for MI300 hardware (#231)
- upgrade Grafana assets and build script to latest release (#235)
- update minimum ROCm and Python requirements (#277)
- sort rocprofiler input files prior to profiling (#304)
- new --quiet option will suppress verbose output and show a progress bar (#308)
- roofline support for Ubuntu 22.04 (#319)
- standardize headers to use 'avg' instead of 'mean'
- add color code thresholds to standalone gui to match grafana
- modify kernel name shortener to use cpp_filt (#168)
- enable stochastic kernel dispatch selection (#183)
- patch grafana plugin module to address a known issue in the latest version (#186)
- enhanced communication between analyze mode kernel flags (#187)
- critical patch for detection of llvm in rocm installs on SLURM systems
- add units to L2 per-channel panel (#133)
- new quickstart guide for Grafana setup in docs (#135)
- more detail on kernel and dispatch filtering in docs (#136, #137)
- patch manual join utility for ROCm >5.2.x (#139)
- add % of peak values to low level speed-of-light panels (#140)
- patch critical bug in Grafana by removing a deprecated plugin (#141)
- enhancements to KernelName demangeler (#142)
- general metric updates and enhancements (#144, #155, #159)
- add min/max/avg breakdown to instruction mix panel (#154)
-
add
--kernel-namesoption to toggle kernelName overlay in standalone roofline plot (#93) -
remove unused python modules (#96)
-
fix empirical roofline calculation for single dispatch workloads (#97)
-
match color of arithmetic intensity points to corresponding bw lines
-
ux improvements in standalone GUI (#101)
-
enhanced readability for filtering dropdowns in standalone GUI (#102)
-
new logfile to capture rocprofiler output (#106)
-
roofline support for sles15 sp4 and future service packs (#109)
-
adding dockerfiles for all supported Linux distros
-
new examples for
--roof-onlyand--kerneloptions added to documentation -
enable cli analysis in Windows (#110)
-
optional random port number in standalone GUI (#111)
-
limit length of visible kernelName in
--kernel-namesoption (#115) -
adjust metric definitions (#117, #130)
-
manually merge rocprof runs, overriding default rocprofiler implementation (#125)
-
fixed compatibility issues with Python 3.11 (#131)
- ux improvements in standalone GUI (#101)
- enhanced readability for filtering dropdowns in standalone GUI (#102)
- new logfile to capture rocprofiler output (#106)
- roofline support for sles15 sp4 and future service packs (#109)
- adding dockerfiles for all supported Linux distros
- new examples for
--roof-onlyand--kerneloptions added to documentation
- add
--kernel-namesoption to toggle kernelName overlay in standalone roofline plot (#93) - remove unused python modules (#96)
- fix empirical roofline calculation for single dispatch workloads (#97)
- match color of arithmetic intensity points to corresponding bw lines
- update documentation (#52, #64)
- improved detection of invalid command line arguments (#58, #76)
- enhancements to standalone roofline (#61)
- enable Omniperf on systems with X-server (#62)
- raise minimum version requirement for rocm (#64)
- enable baseline comparison in CLI analysis (#65)
- add multi-normalization to new metrics (#68, #81)
- support alternative profilers (#70)
- add MI100 configs to override rocprofiler's incomplete default (#75)
- improve error message when no GPU(s) detected (#85)
- separate CI tests by Linux distro and add status badges
- CI update: documentation now published via github action (#22)
- better error detection for incomplete ROCm installs (#56)
- store application command-line parameters in profiling output (#27)
- enable additional normalizations in CLI mode (#30)
- add missing ubuntu 20.04 roofline binary to packaging (#34)
- update L1 bandwidth metric calculations (#36)
- add L1 <-> L2 bandwidth calculation (#37)
- documentation updates (#38, #41)
- enhanced subprocess logging to identify critical errors in rocprofiler (#50)
- maintain git sha in production installs from tarball (#53)
- update python requirements.txt with minimum versions for numpy and pandas
- addition of progress bar indicator in web-based GUI (#8)
- reduced default content for web-based GUI to reduce load times (#9)
- minor packaging and CI updates
- variety of documentation updates
- added an optional argument to vcopy.cpp workload example to specify device id
- initial Omniperf release