Skip to content

Commit e54f819

Browse files
vedithal-amdfxmarty-amdfeizheng10prbasyal-amd
authored
ROCm 7.0 RC3 cherry pick (#846)
* Fix memory clock detection with amd-smi (#824) * bugfix to make amd-smi usage backward compatible (#836) * Update soc_base.py Fixes #835 Signed-off-by: fxmarty-amd <felmarty@amd.com> * address comments --------- Signed-off-by: fxmarty-amd <felmarty@amd.com> * Standalone GUI bugfix (#825) * Fix barchart elements table ids * Add HBM bandwidth section to L2 cache report for gfx950 * bugfix for standlone GUI Co-authored-by: Felix Marty <Felix.Marty@amd.com> * Fix tests and formatting (#826) * Fix L2 read/write/atomic bandwidths on MI350 (#831) * Fix rocprofv3 supported counters not being detected (#832) * Fix rocprofv3 supported counters not being detected * Fix rocprof interface deprecation warning appearing twice * Update VERSION due to cherry pick * quick fix how to call v3 with pc sampling * Fix pc sampling unit test (#847) * 7.0.0 Changelog review feedback added --------- Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Felix Marty <Felix.Marty@amd.com> Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com> Co-authored-by: prbasyal <prbasyal@amd.com>
1 parent b934f19 commit e54f819

File tree

12 files changed

+71
-70
lines changed

12 files changed

+71
-70
lines changed

CHANGELOG.md

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
44

5-
## ROCm Compute Profiler 3.2.1 for ROCm 7.0.0
5+
## ROCm Compute Profiler 3.2.2 for ROCm 7.0.0
66

77
### Added
88

@@ -69,17 +69,8 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
6969
* ``-b`` option in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.
7070
* ``--list-metrics`` option added in profile mode to list possible metric id(s), similar to analyze mode.
7171

72-
* Interface to ROCprofiler-SDK.
73-
* Setting the environment variable ``ROCPROF=rocprofiler-sdk`` will use ROCprofiler-SDK C++ library instead of ``rocprofv3`` python script.
74-
* Add --rocprofiler-sdk-library-path runtime option to choose the path to rocprofiler-sdk library to be used
75-
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
76-
7772
* Support MEM chart on CLI (single run)
7873

79-
* Deprecation warning for MongoDB database update mode.
80-
81-
* Deprecation warning for ``rocm-smi``
82-
8374
* ``--specs-correction`` option to provide missing system specifications for analysis.
8475

8576
### Changed
@@ -102,6 +93,9 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
10293
* Fixed kernel name and kernel dispatch filtering when using ``rocprofv3``.
10394
* Fixed an issue of TCC channel counters collection in ``rocprofv3``.
10495
* Fixed peak FLOPS of F8, I8, F16, and BF16 on AMD Instinct MI 300.
96+
* Fixed not detecting memory clock issue when using amd-smi
97+
* Fixed standalone GUI crashing
98+
* Fixed L2 read/write/atomic bandwidths on MI350
10599

106100
### Known issues
107101

@@ -127,12 +121,11 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
127121

128122
### Upcoming changes
129123

130-
* ``rocprof v1/v2/v3`` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
131-
* To use ROCprofiler-SDK interface, set environment variable `ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ``--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so``
124+
* ``rocprof v1/v2/v3`` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool. Using ``rocprof v1/v2/v3`` interfaces will trigger a deprecation warning.
125+
* To use ROCprofiler-SDK interface, set environment variable `ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ``--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so``. Add ``--rocprofiler-sdk-library-path`` runtime option to choose the path to ROCprofiler-SDK library to be used.
132126
* Hardware IP block based filtering using ``-b`` option in profile mode will be removed in favor of analysis report block based filtering using ``-b`` option in profile mode.
133-
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
134-
* MongoDB database support will be removed.
135-
* Usage of ``rocm-smi`` will be removed in favor of ``amd-smi``.
127+
* MongoDB database support will be removed, and a deprecation warning has been added to the application interface.
128+
* Usage of ``rocm-smi`` is deprecated in favor of ``amd-smi``, and a deprecation warning has been added to the application interface.
136129

137130
## ROCm Compute Profiler 3.1.1 for ROCm 6.4.2
138131

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.2.1
1+
3.2.2

src/rocprof_compute_analyze/analysis_webui.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
from rocprof_compute_analyze.analysis_base import OmniAnalyze_Base
3737
from utils import file_io, parser
3838
from utils.gui import build_bar_chart, build_table_chart
39-
from utils.logger import console_debug, console_error, demarcate
39+
from utils.logger import console_debug, console_error, console_warning, demarcate
4040

4141

4242
class webui_analysis(OmniAnalyze_Base):
@@ -53,7 +53,9 @@ def __init__(self, args, supported_archs):
5353
# define different types of bar charts
5454
self.__barchart_elements = {
5555
"instr_mix": [1001, 1002],
56-
"multi_bar": [1604, 1704],
56+
# 1604: L1D - L2 Transactions
57+
# 1705: L2 - Fabric Interface Stalls
58+
"multi_bar": [1604, 1705],
5759
"sol": [1101, 1201, 1301, 1401, 1601, 1701],
5860
# "l2_cache_per_chan": [1802, 1803]
5961
}
@@ -371,7 +373,11 @@ def determine_chart_type(
371373

372374
# Determine chart type:
373375
# a) Barchart
374-
if table_config["id"] in [x for i in barchart_elements.values() for x in i]:
376+
if original_df.empty:
377+
console_warning(
378+
f"The dataframe with id={table_config['id']} is empty! Not displaying it."
379+
)
380+
elif table_config["id"] in [x for i in barchart_elements.values() for x in i]:
375381
d_figs = build_bar_chart(display_df, table_config, barchart_elements, norm_filt)
376382
# Smaller formatting if barchart yeilds several graphs
377383
if (

src/rocprof_compute_profile/profiler_base.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import logging
2727
import os
2828
import re
29+
import shlex
2930
import shutil
3031
import time
3132
from abc import ABC, abstractmethod
@@ -453,7 +454,9 @@ def run_profiling(self, version: str, prog: str):
453454
method=self.get_args().pc_sampling_method,
454455
interval=self.get_args().pc_sampling_interval,
455456
workload_dir=self.get_args().path,
456-
appcmd=self.get_args().remaining,
457+
appcmd=shlex.split(
458+
self.get_args().remaining
459+
), # FIXME: the right solution is applying it when argparsing once!
457460
rocprofiler_sdk_library_path=self.get_args().rocprofiler_sdk_library_path,
458461
)
459462
end_run_prof = time.time()

src/rocprof_compute_soc/analysis_configs/gfx950/1700_L2_cache.yaml

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ Panel Config:
4040
* 32)) / (End_Timestamp - Start_Timestamp)))
4141
unit: GB/s
4242
tips:
43+
HBM Bandwidth:
44+
value: $hbmBandwidth
45+
unit: GB/s
46+
tips:
4347

4448
- metric_table:
4549
id: 1702
@@ -179,21 +183,21 @@ Panel Config:
179183
unit: (Bytes + $normUnit)
180184
tips:
181185
Read Bandwidth:
182-
avg: AVG(TCC_READ_SECTORS_sum / $denom)
183-
min: MIN(TCC_READ_SECTORS_sum / $denom)
184-
max: MAX(TCC_READ_SECTORS_sum / $denom)
186+
avg: AVG(TCC_READ_SECTORS_sum * 32 / $denom)
187+
min: MIN(TCC_READ_SECTORS_sum * 32 / $denom)
188+
max: MAX(TCC_READ_SECTORS_sum * 32 / $denom)
185189
unit: (Bytes + $normUnit)
186190
tips:
187191
Write Bandwidth:
188-
avg: AVG(TCC_WRITE_SECTORS_sum / $denom)
189-
min: MIN(TCC_WRITE_SECTORS_sum / $denom)
190-
max: MAX(TCC_WRITE_SECTORS_sum / $denom)
192+
avg: AVG(TCC_WRITE_SECTORS_sum * 32 / $denom)
193+
min: MIN(TCC_WRITE_SECTORS_sum * 32 / $denom)
194+
max: MAX(TCC_WRITE_SECTORS_sum * 32 / $denom)
191195
unit: (Bytes + $normUnit)
192196
tips:
193197
Atomic Bandwidth:
194-
avg: AVG(TCC_ATOMIC_SECTORS_sum / $denom)
195-
min: MIN(TCC_ATOMIC_SECTORS_sum / $denom)
196-
max: MAX(TCC_ATOMIC_SECTORS_sum / $denom)
198+
avg: AVG(TCC_ATOMIC_SECTORS_sum * 32 / $denom)
199+
min: MIN(TCC_ATOMIC_SECTORS_sum * 32 / $denom)
200+
max: MAX(TCC_ATOMIC_SECTORS_sum * 32 / $denom)
197201
unit: (Bytes + $normUnit)
198202
tips:
199203
Req:

src/rocprof_compute_soc/soc_base.py

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424

2525
import ctypes
2626
import glob
27+
import json
2728
import math
2829
import os
2930
import re
@@ -166,13 +167,21 @@ def populate_mspec(self):
166167
)
167168
)
168169

169-
# we get the max mclk from rocm-smi --showmclkrange
170-
# Regular expression to extract the max memory clock (third frequency level in MEM)
171-
memory_clock_pattern = (
172-
r"MEM:\s*[^:]*FREQUENCY_LEVELS:\s*(?:\d+: \d+ MHz\s*){2}(\d+)\s*MHz"
173-
)
174-
amd_smi_mclk = run(["amd-smi", "static"], exit_on_error=True)
175-
self._mspec.max_mclk = search(memory_clock_pattern, amd_smi_mclk)
170+
# Parse json from amd-smi static --clock
171+
amd_smi_mclk = run(["amd-smi", "static", "--clock", "--json"], exit_on_error=True)
172+
amd_smi_mclk = json.loads(amd_smi_mclk)
173+
174+
if isinstance(amd_smi_mclk, dict):
175+
# The output of `amd-smi static --clock --json` is a dict with amd-smi>=26.0.0.
176+
amd_smi_mclk = amd_smi_mclk["gpu_data"][0]["clock"]["mem"]["frequency_levels"]
177+
else:
178+
# For backward compatibility: the output of `amd-smi static --clock --json` used to be a list for amd-smi<26.0.0.
179+
amd_smi_mclk = amd_smi_mclk[0]["clock"]["mem"]["frequency_levels"]
180+
181+
# Choose the highest level of memory clock frequency
182+
amd_smi_mclk = amd_smi_mclk[sorted(amd_smi_mclk.keys())[-1]]
183+
# 100 Mhz -> 100
184+
self._mspec.max_mclk = amd_smi_mclk.split(" ")[0]
176185

177186
console_debug("max mem clock is {}".format(self._mspec.max_mclk))
178187

@@ -440,6 +449,16 @@ def parse_counters_text(self, text):
440449

441450
def get_rocprof_supported_counters(self):
442451
rocprof_cmd = detect_rocprof(self.get_args())
452+
453+
if rocprof_cmd != "rocprofiler-sdk":
454+
console_warning(
455+
"rocprof v1 / v2 / v3 interfaces will be removed in favor of "
456+
"rocprofiler-sdk interface in a future release. To use rocprofiler-sdk "
457+
"interface, please set the environment variable ROCPROF to 'rocprofiler-sdk' "
458+
"and optionally provide the path to librocprofiler-sdk.so library via the "
459+
"--rocprofiler-sdk-library-path option."
460+
)
461+
443462
rocprof_counters = set()
444463

445464
if str(rocprof_cmd).endswith("rocprof"):
@@ -489,7 +508,7 @@ def get_rocprof_supported_counters(self):
489508
f"Failed to list rocprof supported counters using command: {command}"
490509
)
491510
for line in output.splitlines():
492-
if "Name:" in line:
511+
if "counter_name" in line:
493512
counters, _ = self.parse_counters_text(line.split(":")[1].strip())
494513
rocprof_counters.update(counters)
495514
# Custom counter support for mi100 for rocprofv3

src/utils/gui.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def multi_bar_chart(table_id, display_df):
5252
nested_bar = {"NC": {}, "UC": {}, "RW": {}, "CC": {}}
5353
for index, row in display_df.iterrows():
5454
nested_bar[row["Coherency"]][row["Xfer"]] = row["Avg"]
55-
if table_id == 1704:
55+
if table_id == 1705: # L2 - Fabric Interface Stalls
5656
nested_bar = {"Read": {}, "Write": {}}
5757
for index, row in display_df.iterrows():
5858
nested_bar[row["Transaction"]][row["Type"]] = row["Avg"]
@@ -197,9 +197,7 @@ def build_bar_chart(display_df, table_config, barchart_elements, norm_filt):
197197

198198
# Speed-of-light bar chart
199199
elif table_config["id"] in barchart_elements["sol"]:
200-
display_df["Avg"] = [
201-
x.astype(float) if x != "" else float(0) for x in display_df["Avg"]
202-
]
200+
display_df["Avg"] = [float(x) if x != "" else float(0) for x in display_df["Avg"]]
203201
if table_config["id"] == 1701:
204202
# special layout for L2 Cache SOL
205203
d_figs.append(

src/utils/utils.py

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -245,14 +245,6 @@ def detect_rocprof(args):
245245
)
246246
return rocprof_cmd
247247

248-
console_warning(
249-
"rocprof v1 / v2 / v3 interfaces will be deprecated in favor of "
250-
"rocprofiler-sdk interface in a future release. To use rocprofiler-sdk "
251-
"interface, please set the environment variable ROCPROF to 'rocprofiler-sdk' "
252-
"and optionally provide the path to librocprofiler-sdk.so library via the "
253-
"--rocprofiler-sdk-library-path option."
254-
)
255-
256248
# detect rocprof
257249
if not "ROCPROF" in os.environ.keys():
258250
# default rocprof
@@ -998,8 +990,9 @@ def pc_sampling_prof(
998990
"-o",
999991
"ps_file", # todo: sync up with the name from source in 2100_.yaml
1000992
"--",
1001-
appcmd,
1002993
]
994+
options.extend(appcmd)
995+
1003996
success, output = capture_subprocess_output(
1004997
[rocprof_cmd] + options, new_env=os.environ.copy(), profileMode=True
1005998
)

tests/test_TCP_counters.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def test_L1_cache_counters(
144144

145145
# set up two apps: sequential and random access
146146
app_names = ["vseq", "vrand"]
147-
options = ["-b", "TCP"]
147+
options = ["-b", "16"]
148148

149149
result = {}
150150
metrics = ["Read Req", "Write Req", "Cache Hit Rate"]

tests/test_analyze_commands.py

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -79,21 +79,6 @@ def test_list_metrics_gfx90a(binary_handler_analyze_rocprof_compute):
7979
test_utils.clean_output_dir(config["cleanup"], workload_dir)
8080

8181

82-
@pytest.mark.list_metrics
83-
def test_list_metrics_gfx906(binary_handler_analyze_rocprof_compute):
84-
code = binary_handler_analyze_rocprof_compute(["analyze", "--list-metrics", "gfx906"])
85-
assert code == 1
86-
87-
for dir in indirs:
88-
workload_dir = test_utils.setup_workload_dir(dir)
89-
code = binary_handler_analyze_rocprof_compute(
90-
["analyze", "--path", workload_dir, "--list-metrics", "gfx906"]
91-
)
92-
assert code == 0
93-
94-
test_utils.clean_output_dir(config["cleanup"], workload_dir)
95-
96-
9782
@pytest.mark.list_metrics
9883
def test_list_metrics_gfx908(binary_handler_analyze_rocprof_compute):
9984
code = binary_handler_analyze_rocprof_compute(["analyze", "--list-metrics", "gfx908"])

0 commit comments

Comments
 (0)