Skip to content

[rocprofiler-compute] Roofline: Use gfx950 builtins for MFMA#3886

Merged
vedithal-amd merged 9 commits intousers/vedithal/rocprofiler-compute-temp-developfrom
benrichard-amd/roofline-mfma-gfx950
Mar 9, 2026
Merged

[rocprofiler-compute] Roofline: Use gfx950 builtins for MFMA#3886
vedithal-amd merged 9 commits intousers/vedithal/rocprofiler-compute-temp-developfrom
benrichard-amd/roofline-mfma-gfx950

Conversation

@vedithal-amd
Copy link
Contributor

Motivation

Resolves #3506

MFMA tests for F16, BF16, and I8 were far below spec on gfx950.

Copy of #3837 to remove unintended rccl changes being merged.
Original author @benrichard-amd

Technical Details

  • gfx950 introduced new MFMA instructions with higher throughput for F16, BF16 and I8. Use the builtins for these on gfx950.
type peak actual
FP16 2.3 PFLOPS 1.8 PFLOPS
BF16 2.3 PFLOPS 2.0 PFLOPS
I8 4.6 POPS 4.2POPS

JIRA ID

Test Plan

  • Run roofline on MI350. Verify expected performance improvement.
  • Verify reported MI350 TFLOPs using rocprof-compute.
  • Run roofline on MI100/MI250/MI300. Verify works as expected.

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings March 9, 2026 19:48
@vedithal-amd vedithal-amd requested review from a team and prbasyal-amd as code owners March 9, 2026 19:48
@vedithal-amd vedithal-amd changed the base branch from develop to users/vedithal/rocprofiler-compute-temp-develop March 9, 2026 19:48
@vedithal-amd vedithal-amd requested a review from a team as a code owner March 9, 2026 19:48
@vedithal-amd vedithal-amd force-pushed the benrichard-amd/roofline-mfma-gfx950 branch from fc5d17f to 8d9f4c8 Compare March 9, 2026 19:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates profiling/benchmarking infrastructure to improve MFMA roofline accuracy on gfx950 and expands ROCprofiler-SDK/rocprofv3 functionality (pause/resume, topology overrides, firmware restrictions), along with several build/test/config maintenance tweaks across the repo.

Changes:

  • Update rocprofiler-compute MFMA microbench kernels to use gfx950-specific MFMA builtins and adjust peak-rate tables.
  • Add rocprofiler-sdk rocprofv3 roctx pause/resume integration tests and implement pause/resume behavior using context start/stop.
  • Introduce firmware restriction parsing/checking (YAML) and new dynamic-library path helpers; update metrics file naming/lookup and related docs/build scripts.

Reviewed changes

Copilot reviewed 158 out of 162 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
projects/rocshmem/CMakeLists.txt Disable examples when building tests-only.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/validate.py New pytest validator for tracing output.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/pytest.ini New pytest configuration for tracing validator.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/conftest.py Fixture for loading JSON output for tracing tests.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/CMakeLists.txt New CTest integration tests for tracing (execute + validate).
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/validate.py New pytest validator for PC sampling output.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/pytest.ini New pytest configuration for PC sampling validator.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/conftest.py Fixture for loading JSON output; skips if unavailable.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/CMakeLists.txt New CTest integration tests for PC sampling (execute + validate).
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/validate.py New pytest validator for nested pause/resume behavior.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/pytest.ini New pytest configuration for nested pause/resume validator.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/conftest.py Fixture for loading JSON output for nested pause/resume tests.
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/CMakeLists.txt New CTest integration tests for nested pause/resume (execute + validate).
projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/CMakeLists.txt Add roctx pause/resume test subdirectories.
projects/rocprofiler-sdk/tests/rocprofv3/CMakeLists.txt Register roctx-pause-resume tests in rocprofv3 suite.
projects/rocprofiler-sdk/tests/bin/roctx-pause-resume/roctx-pause-resume.cpp Add HIP+ROCTX test binary exercising pause/resume + nested behavior.
projects/rocprofiler-sdk/tests/bin/roctx-pause-resume/CMakeLists.txt Build/test binary configuration for roctx-pause-resume.
projects/rocprofiler-sdk/tests/bin/CMakeLists.txt Add roctx-pause-resume test binary to build.
projects/rocprofiler-sdk/source/share/rocprofiler-sdk/counter_defs.yaml Add firmware restriction metadata to counters YAML.
projects/rocprofiler-sdk/source/share/rocprofiler-sdk/CMakeLists.txt Change installed share YAML from counter_defs.yaml to config.yaml.
projects/rocprofiler-sdk/source/lib/tests/common/dl.cpp Add unit tests for new dl path helpers.
projects/rocprofiler-sdk/source/lib/tests/common/CMakeLists.txt Build/link new dl.cpp test.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/2/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/p2p_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/p2p_links/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/p2p_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/p2p_links/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/p2p_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/p2p_links/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/1/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/2/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/3/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/4/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/5/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/6/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/7/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/p2p_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/name Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/mem_banks/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/mem_banks/0/used_memory Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/io_links/0/properties Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/gpu_id Add local topology fixture data for agent tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/agent.cpp Add local-topology verification and improve agent type/name handling.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/CMakeLists.txt Copy topology fixtures into unit test runtime dir + trigger reconfigure on updates.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/registration.cpp Set ROCPROFILER_REGISTER_LIBRARY to current SDK path using new dl helpers.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hip/hip.def.cpp Remove hipModuleGetLoadingMode instrumentation entry.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hip/abi.cpp Update ABI enforcement to remove hipModuleGetLoadingMode.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/firmware_restrictions.cpp Add unit tests for firmware restriction YAML parsing/checking.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt Build firmware restriction tests.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/metrics.cpp Switch metrics YAML lookup from counter_defs.yaml to config.yaml.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.hpp New firmware restriction API.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.cpp Implement YAML parsing + installed-file check.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/controller.cpp Run firmware restriction check during controller init.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/CMakeLists.txt Build/link firmware restriction implementation.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/agent.cpp Add topology path overrides and default agent naming fallback.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-tool/tool.cpp Implement pause/resume by starting/stopping relevant contexts.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-tool/config.hpp Add selected-regions reference counting config flag.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-rocattach/symbol_lookup.cpp Use new dl helpers to locate rocattach library path.
projects/rocprofiler-sdk/source/lib/common/logging.hpp Include fmt helpers used by logging.
projects/rocprofiler-sdk/source/lib/common/environment.hpp Extend get_env template to floating-point.
projects/rocprofiler-sdk/source/lib/common/environment.cpp Implement floating-point get_env specialization + expand set_env instantiations.
projects/rocprofiler-sdk/source/lib/common/dl.hpp New dl helper declarations.
projects/rocprofiler-sdk/source/lib/common/dl.cpp New dl helper implementations (iterate loaded libs, dladdr helpers).
projects/rocprofiler-sdk/source/lib/common/CMakeLists.txt Build/install new dl helpers.
projects/rocprofiler-sdk/source/include/rocprofiler-sdk/hip/runtime_api_id.h Remove hipModuleGetLoadingMode enum ID.
projects/rocprofiler-sdk/source/include/rocprofiler-sdk/hip/api_args.h Remove hipModuleGetLoadingMode args.
projects/rocprofiler-sdk/source/include/rocprofiler-sdk/cxx/enum_string.hpp Remove hipModuleGetLoadingMode label + adjust asserts.
projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst Update docs to refer to config.yaml.
projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst Update docs to refer to config.yaml and document firmware restrictions.
projects/rocprofiler-sdk/source/bin/rocprofv3.py Add CLI flag/env plumb for selected-regions ref counting.
projects/rocprofiler-sdk/cmake/rocprofiler_utilities.cmake Extend unit-test helper to install/copy data and configure file triggers.
projects/rocprofiler-sdk/cmake/rocprofiler_config_packaging.cmake Include CPackComponent for packaging config.
projects/rocprofiler-sdk/CHANGELOG.md Add/update changelog entries for multiple features/fixes.
projects/rocprofiler-sdk/.cmake-format.yaml Teach cmake-format about new macro args (DATA/CONFIGURE_FILES).
projects/rocprofiler-register/cmake/rocprofiler_register_config_packaging.cmake Include CPackComponent for packaging config.
projects/rocprofiler-compute/src/utils/benchmark.py Use gfx950 MFMA builtins and adjust peak tables for FP16/BF16/I8.
projects/rocprofiler-compute/CHANGELOG.md Note MFMA roofline fix for MI350.
projects/rocprof-trace-decoder/test/CMakeLists.txt Add dependency ordering between execute and check tests.
projects/rocminfo/.github/dependabot.yml Remove per-project dependabot config (now centralized).
projects/rocm-smi-lib/.github/dependabot.yml Remove per-project dependabot config (now centralized).
projects/rdc/.github/dependabot.yml Remove per-project dependabot config (now centralized).
projects/rccl/tools/topo_expl/stubs.cc Add AINIC-related stub symbols for topo_expl.
projects/rccl/src/transport/net.cc Add NIC detection + set primary NIC details + expose API.
projects/rccl/src/plugin/net.cc Select ROCm internal IB plugin depending on detected NIC.
projects/rccl/src/misc/ionicdvwrap.cc Gate ionicdv symbol wrapping on detected AINIC usage.
projects/rccl/src/include/plugin/tuner/tuner_v5.h Add bwRatio field to tuner constants struct.
projects/rccl/src/include/net.h Add rcclPrimaryNic/rcclUseAinic API + NIC type/info structs.
projects/rccl/src/graph/tuning.cc Add tuning model, wire bwRatio/hwLatencies into tuner constants, apply NIC-based overrides.
projects/rccl/src/graph/topo.h Declare rcclApplyTuningOverrides.
projects/rccl/src/graph/rome_models.cc Centralize gfx950 tuning override logic.
projects/rccl/ext-tuner/example/plugin.c Update example tuner to use bwRatio and hwLatencies.
projects/rccl/ext-tuner/example/nccl/tuner.h Update example tuner header struct layout.
projects/rccl/ext-src/rocm_netib.patch Update patch to use rcclUseAinic instead of param directly.
projects/hipother/hipnv/include/hip/nvidia_detail/nvidia_hip_runtime_api.h Remove hipModuleGetLoadingMode wrapper (NV path).
projects/hip/include/hip/hip_runtime_api.h Remove hipModuleGetLoadingMode API + associated enum type.
projects/hip/.github/dependabot.yml Remove per-project dependabot config (now centralized).
projects/hip-tests/catch/unit/module/hipModuleGetLoadingMode.cc Remove hipModuleGetLoadingMode test coverage (API removed).
projects/hip-tests/catch/unit/module/CMakeLists.txt Stop building removed hipModuleGetLoadingMode test.
projects/clr/rocclr/utils/flags.hpp Remove HIP_MODULE_LOADING flag definition.
projects/clr/rocclr/device/devprogram.cpp Refactor symbol enumeration; always load vars/funcs for DynCO.
projects/clr/hipamd/src/hip_table_interface.cpp Remove hipModuleGetLoadingMode dispatch wrapper.
projects/clr/hipamd/src/hip_platform.hpp Remove PlatformState::GetLoadingMode declaration.
projects/clr/hipamd/src/hip_platform.cpp Remove module loading mode handling in platform state.
projects/clr/hipamd/src/hip_module.cpp Remove hipModuleGetLoadingMode API implementation.
projects/clr/hipamd/src/hip_hcc.map.in Stop exporting hipModuleGetLoadingMode symbol.
projects/clr/hipamd/src/hip_graph_internal.hpp Update leaf node sync detection logic.
projects/clr/hipamd/src/hip_code_object.hpp Remove lazy-loading state flags; adjust managed-var accessor.
projects/clr/hipamd/src/hip_code_object.cpp Always populate globals at load; remove lazy flags.
projects/clr/hipamd/src/hip_api_trace.cpp Remove hipModuleGetLoadingMode tracing entry + drop step version to 25.
projects/clr/hipamd/src/amdhip.def Stop exporting hipModuleGetLoadingMode from def file.
projects/clr/hipamd/include/hip/amd_detail/hip_prof_str.h Remove hipModuleGetLoadingMode from profiler strings/args.
projects/clr/hipamd/include/hip/amd_detail/hip_api_trace.hpp Drop runtime API step version to 25 and remove loading-mode pointer.
projects/clr/CHANGELOG.md Remove changelog mention of hipModuleGetLoadingMode being added.
projects/amdsmi/tests/amd_smi_test/CMakeLists.txt Update test link target to use selected AMD SMI library target.
projects/amdsmi/src/nic/ai-nic/amdsmi_unified/CMakeLists.txt Improve include interface for NIC static lib for build/install.
projects/amdsmi/src/CMakeLists.txt Build both shared/static, restrict exported symbols via version script, and update install/export logic.
projects/amdsmi/example/CMakeLists.txt Link examples against static library when available.
projects/amdsmi/amdsmi_cli/amdsmi_init.py Adjust init error handling; prefer driver-reload guidance.
projects/amdsmi/amdsmi_cli/amdsmi_helpers.py Update partition-mode warning text and remove reload-driver prompt.
projects/amdsmi/amdsmi_cli/amdsmi_commands.py Delay default info collection until amdgpu initialized; update reload message.
projects/amdsmi/CMakeLists.txt Add options to auto-build static libs and build both libs.
projects/amdsmi/CHANGELOG.md Document removal of reload-driver CLI option(s).
projects/amdsmi/.github/dependabot.yml Remove per-project dependabot config (now centralized).
.github/dependabot.yml Centralize pip dependabot configs and reduce PR rate via grouping.
Comments suppressed due to low confidence (6)

projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/metrics.cpp:1

  • loadMetrics now hardcodes config.yaml, but this diff does not add a config.yaml file (and still modifies counter_defs.yaml). This will reliably fail at runtime with the ROCP_FATAL_IF when config.yaml is not present. Either add/rename the YAML file to config.yaml (and ensure it’s installed to the searched location), or keep the filename as counter_defs.yaml consistently across lookup, install rules, and documentation.
    projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.cpp:1
  • Both helpers return std::string but currently return path expressions (path / ...). Unless common::filesystem::path provides an implicit conversion to std::string (unusual), this is a compile error. Convert the resulting path explicitly (e.g., .string()) before returning, and consider normalizing to absolute/canonical if that’s what downstream expects.
    projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst:1
  • The documented key fw_restriction_schema_version uses underscores, but the implementation in parse_firmware_restrictions requires fw-restriction-schema-version (hyphens). This mismatch will cause users to author YAML that the parser rejects. Update the docs to match the implemented key (or update the parser to accept both for backward/forward compatibility).
    projects/rocprofiler-sdk/source/lib/common/dl.cpp:1
  • In the ROCPROFILER_SYMBOL_PATH_USE_DLOPEN branch, the function never returns a resolved path even if dlsym succeeds (it always falls through to return std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., via dladdr) and return it, or else the branch is misleading and ineffective.
    projects/rocprofiler-sdk/source/lib/common/dl.cpp:1
  • In the ROCPROFILER_SYMBOL_PATH_USE_DLOPEN branch, the function never returns a resolved path even if dlsym succeeds (it always falls through to return std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., via dladdr) and return it, or else the branch is misleading and ineffective.
    projects/rocprofiler-sdk/source/lib/common/dl.cpp:1
  • In the ROCPROFILER_SYMBOL_PATH_USE_DLOPEN branch, the function never returns a resolved path even if dlsym succeeds (it always falls through to return std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., via dladdr) and return it, or else the branch is misleading and ineffective.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2003 to +2008
amd_comgr_status_t getSymbolFromModule(amd_comgr_symbol_t symbol, void* userData) {
size_t nlen = 0;
size_t* userDataInfo = nullptr;
amd_comgr_status_t status;
amd_comgr_symbol_type_t type;
std::vector<std::string>* var_names = nullptr;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.

Copilot uses AI. Check for mistakes.
Comment on lines +2023 to +2028
/* Retrieve the symbol name */
char* name = new char[nlen + 1];
status = amd::Comgr::symbol_get_info(symbol, AMD_COMGR_SYMBOL_INFO_NAME, name);
if (status != AMD_COMGR_STATUS_SUCCESS) {
return status;
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.

Copilot uses AI. Check for mistakes.
sym_info->var_names->push_back(std::string(name));
}

delete[] name;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.

Copilot uses AI. Check for mistakes.
@vedithal-amd vedithal-amd force-pushed the benrichard-amd/roofline-mfma-gfx950 branch from 8d9f4c8 to ed700d7 Compare March 9, 2026 19:56
@vedithal-amd vedithal-amd force-pushed the benrichard-amd/roofline-mfma-gfx950 branch from ed700d7 to 1716c94 Compare March 9, 2026 19:59
benrichard-amd and others added 8 commits March 9, 2026 19:59
* Clear CHANGELOG indicating we resolved roofline peaks for MI350

* Fix typo in pre-processor guard preventing roofline from running on
  MI300

* Ruff formatting
@vedithal-amd vedithal-amd merged commit 9e338f9 into users/vedithal/rocprofiler-compute-temp-develop Mar 9, 2026
14 of 16 checks passed
@vedithal-amd vedithal-amd deleted the benrichard-amd/roofline-mfma-gfx950 branch March 9, 2026 20:08
vedithal-amd added a commit that referenced this pull request Mar 10, 2026
* Use gfx950 builtin for MFMA FP16

* Use gfx950 builtin for MFMA BF16

* Use gfx950 builtin for MFMA I8

* Fix comments

* Update copyright

* Update CHANGELOG

* Fix formatting

* Review comments

* Clear CHANGELOG indicating we resolved roofline peaks for MI350

* Fix typo in pre-processor guard preventing roofline from running on
  MI300

* Ruff formatting

* Fix uninitialized variables

---------

Co-authored-by: benrichard-amd <ben.richard@amd.com>
vedithal-amd added a commit that referenced this pull request Mar 11, 2026
* Use gfx950 builtin for MFMA FP16

* Use gfx950 builtin for MFMA BF16

* Use gfx950 builtin for MFMA I8

* Fix comments

* Update copyright

* Update CHANGELOG

* Fix formatting

* Review comments

* Clear CHANGELOG indicating we resolved roofline peaks for MI350

* Fix typo in pre-processor guard preventing roofline from running on
  MI300

* Ruff formatting

* Fix uninitialized variables

---------

Co-authored-by: benrichard-amd <ben.richard@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[rocprof-compute] Roofline: Use gfx950 MFMA builtins

4 participants