Skip to content

Commit 837fe1e

Browse files
authored
Backport of 6.4.2 for cherry-pick list (#714)
Auto-submit by Jenkins
1 parent 7b25d95 commit 837fe1e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+650
-208
lines changed

.github/workflows/formatting.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ concurrency:
1313

1414
jobs:
1515
python:
16-
runs-on: ubuntu-20.04
16+
runs-on: ubuntu-22.04
1717

1818
steps:
1919
- name: Checkout
@@ -35,7 +35,7 @@ jobs:
3535
uses: isort/isort-action@master
3636

3737
cmake:
38-
runs-on: ubuntu-20.04
38+
runs-on: ubuntu-22.04
3939

4040
steps:
4141
- uses: actions/checkout@v4
@@ -58,7 +58,7 @@ jobs:
5858
fi
5959
6060
python-bytecode:
61-
runs-on: ubuntu-20.04
61+
runs-on: ubuntu-22.04
6262

6363
steps:
6464
- uses: actions/checkout@v4

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,21 @@
22

33
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
44

5+
## ROCm Compute Profiler 3.2.0 for ROCm 6.4.2
6+
7+
### Added
8+
9+
* Add FP8 metrics' support for MI300
10+
* Add additional datatype for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
11+
* Add datatype selection option for roofline profiling: --roofline-data-type / -R option (Default is FP32)
12+
* Change dependency from rocm-smi to amd-smi
13+
14+
### Changed
15+
16+
17+
### Resolved issues
18+
* Fixed a crash related to Agent ID caused by the new format of the rocprofv3 output CSV file
19+
520
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
621

722
### Added

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.1.0
1+
3.2.0
-24.4 KB
Loading

docs/how-to/analyze/standalone-gui.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,11 @@ application's profiling data:
7575

7676
#. Memory Chart Analysis
7777
#. Empirical Roofline Analysis
78+
79+
Use ``--roofline-data-type`` option to specify which data type(s) you would like plotted on the roofline PDFs in the standalone analysis GUI.
80+
Datatypes can be stacked- for example, "--roofline-data-type FP32 FP64 I32" would display one PDF with FP32 and FP64 stacked, and one PDF with INT32.
81+
Default roofline datatype plotted is FP32.
82+
7883
#. Top Stats (Top Kernel Statistics)
7984
#. System Info
8085
#. System Speed-of-Light

docs/how-to/profile/mode.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -398,6 +398,9 @@ Roofline options
398398
Allows you to specify a device ID to collect performance data from when
399399
running a roofline benchmark on your system.
400400

401+
``--roofline-data-type <datatype>``
402+
Allows you to specify the data types that you want plotted in the roofline PDF output(s). Selecting more than one data type will overlay the results onto the same plot. Default data type: FP32
403+
401404
To distinguish different kernels in your ``.pdf`` roofline plot use
402405
``--kernel-names``. This will give each kernel a unique marker identifiable from
403406
the plot's key.
@@ -431,8 +434,7 @@ successfully.
431434
432435
$ ls workloads/vcopy/MI200/
433436
total 48
434-
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_fp32_fp64.pdf
435-
-rw-r--r-- 1 auser agroup 13136 Mar 1 16:05 empirRoof_gpu-0_int8_fp16.pdf
437+
-rw-r--r-- 1 auser agroup 13331 Mar 1 16:05 empirRoof_gpu-0_FP32.pdf
436438
drwxr-xr-x 1 auser agroup 0 Mar 1 16:03 perfmon
437439
-rw-r--r-- 1 auser agroup 1101 Mar 1 16:03 pmc_perf.csv
438440
-rw-r--r-- 1 auser agroup 1715 Mar 1 16:05 roofline.csv
@@ -441,11 +443,9 @@ successfully.
441443
442444
.. note::
443445

444-
ROCm Compute Profiler generates two roofline outputs to organize results and reduce
445-
clutter. One chart plots FP32/FP64 performance while the other plots I8/FP16
446-
performance.
446+
ROCm Compute Profiler currently captures roofline profiling for all data types, and you can reduce the clutter in the PDF outputs by filtering the data type(s). Selecting multiple data types will overlay the results into the same PDF. To generate results in separate PDFs for each data type from the same workload run, you can re-run the profiling command with each data type as long as the ``roofline.csv`` file still exists in the workload folder.
447447

448-
The following image is a sample ``empirRoof_gpu-0_int8_fp16.pdf`` roofline
448+
The following image is a sample ``empirRoof_gpu-0_FP32.pdf`` roofline
449449
plot.
450450

451451
.. image:: ../../data/profile/sample-roof-plot.jpg

docs/how-to/use.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ The following table lists ROCm Compute Profiler's basic operations, their
231231

232232
* - :ref:`Standalone roofline analysis <standalone-roofline>`
233233
- ``profile``
234-
- ``--name``, ``--roof-only``, ``-- <profile_cmd>``
234+
- ``--name``, ``--roof-only``, ``--roofline-data-type <data_type>``, ``-- <profile_cmd>``
235235

236236
* - :ref:`Import a workload to database <grafana-gui-import>`
237237
- ``database``

src/argparser.py

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ def omniarg_parser(
154154
default=False,
155155
action="store_true",
156156
help=argparse.SUPPRESS,
157-
#help="\t\t\tKokkos trace, traces Kokkos API calls.",
157+
# help="\t\t\tKokkos trace, traces Kokkos API calls.",
158158
)
159159
profile_group.add_argument(
160160
"-k",
@@ -316,6 +316,19 @@ def omniarg_parser(
316316
action="store_true",
317317
help="\t\t\tInclude kernel names in roofline plot.",
318318
)
319+
320+
roofline_group.add_argument(
321+
"-R",
322+
"--roofline-data-type",
323+
required=False,
324+
choices=["FP8", "FP16", "BF16", "FP32", "FP64", "I8", "I32", "I64"],
325+
metavar="",
326+
nargs="+",
327+
type=str,
328+
default=["FP32"],
329+
help="\t\t\tChoose datatypes to view roofline PDFs for: (DEFAULT: FP32)\n\t\t\t FP8\n\t\t\t FP16\n\t\t\t BF16\n\t\t\t FP32\n\t\t\t FP64\n\t\t\t I8\n\t\t\t I32\n\t\t\t I64\n\t\t\t ",
330+
)
331+
319332
# roofline_group.add_argument('-w', '--workgroups', required=False, default=-1, type=int, help="\t\t\tNumber of kernel workgroups (DEFAULT: 1024)")
320333
# roofline_group.add_argument('--wsize', required=False, default=-1, type=int, help="\t\t\tWorkgroup size (DEFAULT: 256)")
321334
# roofline_group.add_argument('--dataset', required=False, default = -1, type=int, help="\t\t\tDataset size (DEFAULT: 536M)")
@@ -510,6 +523,19 @@ def omniarg_parser(
510523
const=8050,
511524
help="\t\tActivate a GUI to interate with rocprofiler-compute metrics.\n\t\tOptionally, specify port to launch application (DEFAULT: 8050)",
512525
)
526+
527+
analyze_group.add_argument(
528+
"-R",
529+
"--roofline-data-type",
530+
required=False,
531+
choices=["FP8", "FP16", "BF16", "FP32", "FP64", "I8", "I32", "I64"],
532+
metavar="",
533+
nargs="+",
534+
type=str,
535+
default=["FP32"],
536+
help="\t\t\tChoose datatypes to view roofline PDFs for: (DEFAULT: FP32)\n\t\t\t FP8\n\t\t\t FP16\n\t\t\t BF16\n\t\t\t FP32\n\t\t\t FP64\n\t\t\t I8\n\t\t\t I32\n\t\t\t I64\n\t\t\t ",
537+
)
538+
513539
analyze_advanced_group.add_argument(
514540
"--random-port",
515541
action="store_true",

src/rocprof_compute_analyze/analysis_webui.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ def __init__(self, args, supported_archs):
6161
# define any elements which will have full width
6262
self.__full_width_elements = {1801}
6363

64+
self.__roofline_data_type = args.roofline_data_type
65+
6466
@demarcate
6567
def build_layout(self, input_filters, arch_configs):
6668
"""
@@ -180,6 +182,7 @@ def generate_from_filter(
180182
"mem_level": "ALL",
181183
"include_kernel_names": False,
182184
"is_standalone": False,
185+
"roofline_data_type": self.__roofline_data_type,
183186
}
184187
)
185188
roof_obj = self.get_socs()[self.arch].roofline_obj

src/rocprof_compute_soc/analysis_configs/gfx906/0200_system-speed-of-light.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ Panel Config:
3232
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
3333
pop: None # No perf counter
3434
tips:
35+
MFMA FLOPs (F8):
36+
value: None # No HW module
37+
unit: GFLOP
38+
peak: None # No HW module
39+
pop: None # No HW module
40+
tips:
3541
MFMA FLOPs (BF16):
3642
value: None # No perf counter
3743
unit: GFLOPs

0 commit comments

Comments
 (0)