Skip to content

Commit 8d9f4c8

Browse files
committed
Review comments
* Clear CHANGELOG indicating we resolved roofline peaks for MI350 * Fix typo in pre-processor guard preventing roofline from running on MI300 * Ruff formatting
1 parent d5de7d5 commit 8d9f4c8

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

projects/rocprofiler-compute/CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
1414

1515
### Resolved issues
1616

17+
* Fixed roofline benchmark MFMA FP16/BF16/INT8 peaks for MI 350
18+
1719
### Upcoming changes
1820

1921
## ROCm Compute Profiler 3.5.0 for ROCm 7.12.0
@@ -53,8 +55,6 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
5355
* Processes attempting to benchmark on the same GPU will wait with user-visible feedback and execute sequentially.
5456
* Lock applies specifically to the roofline.csv file generated during benchmarking, not other files generated in profile mode.
5557

56-
* Added proper support for `gfx950` in roofline benchmark.
57-
5858
* Missing metric descriptions for gfx950 and gfx942 architecture.
5959

6060
* Added `--membw-analysis` under experimental features to allow memory bandwidth specific profiling and analysis with metric block 30.

projects/rocprofiler-compute/src/utils/benchmark.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
##############################################################################
22
# MIT License
33
#
4-
# Copyright (c) 2026 Advanced Micro Devices, Inc. All Rights Reserved.
4+
# Copyright (c) 2025 - 2026 Advanced Micro Devices, Inc. All Rights Reserved.
55
#
66
# Permission is hereby granted, free of charge, to any person obtaining a copy
77
# of this software and associated documentation files (the "Software"), to deal
@@ -753,7 +753,8 @@ def flops_bench(device: int, type: str, unit: str, rate: int) -> PerfMetrics:
753753
extern "C" __global__ void mfma_f16(int iter, float *dummy)
754754
{
755755
vec16<float> result = {0};
756-
#if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942___)
756+
#if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx940__) || \
757+
defined(__gfx941__) || defined(__gfx942__)
757758
vec4<__fp16> a;
758759
a[1] = a[0] = threadIdx.x;
759760
for(int i = 0; i < iter; ++i)

0 commit comments

Comments
 (0)