Skip to content

Commit ed700d7

Browse files
committed
Review comments
* Clear CHANGELOG indicating we resolved roofline peaks for MI350 * Fix typo in pre-processor guard preventing roofline from running on MI300 * Ruff formatting
1 parent 493bfd7 commit ed700d7

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

projects/rocprofiler-compute/CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
1616

1717
### Resolved issues
1818

19+
* Fixed roofline benchmark MFMA FP16/BF16/INT8 peaks for MI 350
20+
1921
### Upcoming changes
2022

2123
## ROCm Compute Profiler 3.5.0 for ROCm 7.12.0
@@ -55,8 +57,6 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
5557
* Processes attempting to benchmark on the same GPU will wait with user-visible feedback and execute sequentially.
5658
* Lock applies specifically to the roofline.csv file generated during benchmarking, not other files generated in profile mode.
5759

58-
* Added proper support for `gfx950` in roofline benchmark.
59-
6060
* Missing metric descriptions for gfx950 and gfx942 architecture.
6161

6262
* Added `--membw-analysis` under experimental features to allow memory bandwidth specific profiling and analysis with metric block 30.

projects/rocprofiler-compute/src/utils/benchmark.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
##############################################################################
22
# MIT License
33
#
4-
# Copyright (c) 2026 Advanced Micro Devices, Inc. All Rights Reserved.
4+
# Copyright (c) 2025 - 2026 Advanced Micro Devices, Inc. All Rights Reserved.
55
#
66
# Permission is hereby granted, free of charge, to any person obtaining a copy
77
# of this software and associated documentation files (the "Software"), to deal
@@ -753,7 +753,8 @@ def flops_bench(device: int, type: str, unit: str, rate: int) -> PerfMetrics:
753753
extern "C" __global__ void mfma_f16(int iter, float *dummy)
754754
{
755755
vec16<float> result = {0};
756-
#if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942___)
756+
#if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx940__) || \
757+
defined(__gfx941__) || defined(__gfx942__)
757758
vec4<__fp16> a;
758759
a[1] = a[0] = threadIdx.x;
759760
for(int i = 0; i < iter; ++i)

0 commit comments

Comments
 (0)