[rocprofiler-compute] Roofline: Use gfx950 builtins for MFMA#3886
Conversation
fc5d17f to
8d9f4c8
Compare
There was a problem hiding this comment.
Pull request overview
Updates profiling/benchmarking infrastructure to improve MFMA roofline accuracy on gfx950 and expands ROCprofiler-SDK/rocprofv3 functionality (pause/resume, topology overrides, firmware restrictions), along with several build/test/config maintenance tweaks across the repo.
Changes:
- Update rocprofiler-compute MFMA microbench kernels to use
gfx950-specific MFMA builtins and adjust peak-rate tables. - Add rocprofiler-sdk rocprofv3 roctx pause/resume integration tests and implement pause/resume behavior using context start/stop.
- Introduce firmware restriction parsing/checking (YAML) and new dynamic-library path helpers; update metrics file naming/lookup and related docs/build scripts.
Reviewed changes
Copilot reviewed 158 out of 162 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/rocshmem/CMakeLists.txt | Disable examples when building tests-only. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/validate.py | New pytest validator for tracing output. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/pytest.ini | New pytest configuration for tracing validator. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/conftest.py | Fixture for loading JSON output for tracing tests. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/tracing/CMakeLists.txt | New CTest integration tests for tracing (execute + validate). |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/validate.py | New pytest validator for PC sampling output. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/pytest.ini | New pytest configuration for PC sampling validator. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/conftest.py | Fixture for loading JSON output; skips if unavailable. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/pc-sampling/CMakeLists.txt | New CTest integration tests for PC sampling (execute + validate). |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/validate.py | New pytest validator for nested pause/resume behavior. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/pytest.ini | New pytest configuration for nested pause/resume validator. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/conftest.py | Fixture for loading JSON output for nested pause/resume tests. |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/nested-pause-resume/CMakeLists.txt | New CTest integration tests for nested pause/resume (execute + validate). |
| projects/rocprofiler-sdk/tests/rocprofv3/roctx-pause-resume/CMakeLists.txt | Add roctx pause/resume test subdirectories. |
| projects/rocprofiler-sdk/tests/rocprofv3/CMakeLists.txt | Register roctx-pause-resume tests in rocprofv3 suite. |
| projects/rocprofiler-sdk/tests/bin/roctx-pause-resume/roctx-pause-resume.cpp | Add HIP+ROCTX test binary exercising pause/resume + nested behavior. |
| projects/rocprofiler-sdk/tests/bin/roctx-pause-resume/CMakeLists.txt | Build/test binary configuration for roctx-pause-resume. |
| projects/rocprofiler-sdk/tests/bin/CMakeLists.txt | Add roctx-pause-resume test binary to build. |
| projects/rocprofiler-sdk/source/share/rocprofiler-sdk/counter_defs.yaml | Add firmware restriction metadata to counters YAML. |
| projects/rocprofiler-sdk/source/share/rocprofiler-sdk/CMakeLists.txt | Change installed share YAML from counter_defs.yaml to config.yaml. |
| projects/rocprofiler-sdk/source/lib/tests/common/dl.cpp | Add unit tests for new dl path helpers. |
| projects/rocprofiler-sdk/source/lib/tests/common/CMakeLists.txt | Build/link new dl.cpp test. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/io_links/2/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/0/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/p2p_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/p2p_links/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/1/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/p2p_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/p2p_links/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/2/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/p2p_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/p2p_links/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/3/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/1/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/2/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/3/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/4/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/5/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/6/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/io_links/7/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/p2p_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/4/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/5/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/name | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/mem_banks/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/mem_banks/0/used_memory | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/io_links/0/properties | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/data/topology/nodes/6/gpu_id | Add local topology fixture data for agent tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/agent.cpp | Add local-topology verification and improve agent type/name handling. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tests/CMakeLists.txt | Copy topology fixtures into unit test runtime dir + trigger reconfigure on updates. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/registration.cpp | Set ROCPROFILER_REGISTER_LIBRARY to current SDK path using new dl helpers. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hip/hip.def.cpp | Remove hipModuleGetLoadingMode instrumentation entry. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hip/abi.cpp | Update ABI enforcement to remove hipModuleGetLoadingMode. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/firmware_restrictions.cpp | Add unit tests for firmware restriction YAML parsing/checking. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt | Build firmware restriction tests. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/metrics.cpp | Switch metrics YAML lookup from counter_defs.yaml to config.yaml. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.hpp | New firmware restriction API. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.cpp | Implement YAML parsing + installed-file check. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/controller.cpp | Run firmware restriction check during controller init. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/CMakeLists.txt | Build/link firmware restriction implementation. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/agent.cpp | Add topology path overrides and default agent naming fallback. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-tool/tool.cpp | Implement pause/resume by starting/stopping relevant contexts. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-tool/config.hpp | Add selected-regions reference counting config flag. |
| projects/rocprofiler-sdk/source/lib/rocprofiler-sdk-rocattach/symbol_lookup.cpp | Use new dl helpers to locate rocattach library path. |
| projects/rocprofiler-sdk/source/lib/common/logging.hpp | Include fmt helpers used by logging. |
| projects/rocprofiler-sdk/source/lib/common/environment.hpp | Extend get_env template to floating-point. |
| projects/rocprofiler-sdk/source/lib/common/environment.cpp | Implement floating-point get_env specialization + expand set_env instantiations. |
| projects/rocprofiler-sdk/source/lib/common/dl.hpp | New dl helper declarations. |
| projects/rocprofiler-sdk/source/lib/common/dl.cpp | New dl helper implementations (iterate loaded libs, dladdr helpers). |
| projects/rocprofiler-sdk/source/lib/common/CMakeLists.txt | Build/install new dl helpers. |
| projects/rocprofiler-sdk/source/include/rocprofiler-sdk/hip/runtime_api_id.h | Remove hipModuleGetLoadingMode enum ID. |
| projects/rocprofiler-sdk/source/include/rocprofiler-sdk/hip/api_args.h | Remove hipModuleGetLoadingMode args. |
| projects/rocprofiler-sdk/source/include/rocprofiler-sdk/cxx/enum_string.hpp | Remove hipModuleGetLoadingMode label + adjust asserts. |
| projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst | Update docs to refer to config.yaml. |
| projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst | Update docs to refer to config.yaml and document firmware restrictions. |
| projects/rocprofiler-sdk/source/bin/rocprofv3.py | Add CLI flag/env plumb for selected-regions ref counting. |
| projects/rocprofiler-sdk/cmake/rocprofiler_utilities.cmake | Extend unit-test helper to install/copy data and configure file triggers. |
| projects/rocprofiler-sdk/cmake/rocprofiler_config_packaging.cmake | Include CPackComponent for packaging config. |
| projects/rocprofiler-sdk/CHANGELOG.md | Add/update changelog entries for multiple features/fixes. |
| projects/rocprofiler-sdk/.cmake-format.yaml | Teach cmake-format about new macro args (DATA/CONFIGURE_FILES). |
| projects/rocprofiler-register/cmake/rocprofiler_register_config_packaging.cmake | Include CPackComponent for packaging config. |
| projects/rocprofiler-compute/src/utils/benchmark.py | Use gfx950 MFMA builtins and adjust peak tables for FP16/BF16/I8. |
| projects/rocprofiler-compute/CHANGELOG.md | Note MFMA roofline fix for MI350. |
| projects/rocprof-trace-decoder/test/CMakeLists.txt | Add dependency ordering between execute and check tests. |
| projects/rocminfo/.github/dependabot.yml | Remove per-project dependabot config (now centralized). |
| projects/rocm-smi-lib/.github/dependabot.yml | Remove per-project dependabot config (now centralized). |
| projects/rdc/.github/dependabot.yml | Remove per-project dependabot config (now centralized). |
| projects/rccl/tools/topo_expl/stubs.cc | Add AINIC-related stub symbols for topo_expl. |
| projects/rccl/src/transport/net.cc | Add NIC detection + set primary NIC details + expose API. |
| projects/rccl/src/plugin/net.cc | Select ROCm internal IB plugin depending on detected NIC. |
| projects/rccl/src/misc/ionicdvwrap.cc | Gate ionicdv symbol wrapping on detected AINIC usage. |
| projects/rccl/src/include/plugin/tuner/tuner_v5.h | Add bwRatio field to tuner constants struct. |
| projects/rccl/src/include/net.h | Add rcclPrimaryNic/rcclUseAinic API + NIC type/info structs. |
| projects/rccl/src/graph/tuning.cc | Add tuning model, wire bwRatio/hwLatencies into tuner constants, apply NIC-based overrides. |
| projects/rccl/src/graph/topo.h | Declare rcclApplyTuningOverrides. |
| projects/rccl/src/graph/rome_models.cc | Centralize gfx950 tuning override logic. |
| projects/rccl/ext-tuner/example/plugin.c | Update example tuner to use bwRatio and hwLatencies. |
| projects/rccl/ext-tuner/example/nccl/tuner.h | Update example tuner header struct layout. |
| projects/rccl/ext-src/rocm_netib.patch | Update patch to use rcclUseAinic instead of param directly. |
| projects/hipother/hipnv/include/hip/nvidia_detail/nvidia_hip_runtime_api.h | Remove hipModuleGetLoadingMode wrapper (NV path). |
| projects/hip/include/hip/hip_runtime_api.h | Remove hipModuleGetLoadingMode API + associated enum type. |
| projects/hip/.github/dependabot.yml | Remove per-project dependabot config (now centralized). |
| projects/hip-tests/catch/unit/module/hipModuleGetLoadingMode.cc | Remove hipModuleGetLoadingMode test coverage (API removed). |
| projects/hip-tests/catch/unit/module/CMakeLists.txt | Stop building removed hipModuleGetLoadingMode test. |
| projects/clr/rocclr/utils/flags.hpp | Remove HIP_MODULE_LOADING flag definition. |
| projects/clr/rocclr/device/devprogram.cpp | Refactor symbol enumeration; always load vars/funcs for DynCO. |
| projects/clr/hipamd/src/hip_table_interface.cpp | Remove hipModuleGetLoadingMode dispatch wrapper. |
| projects/clr/hipamd/src/hip_platform.hpp | Remove PlatformState::GetLoadingMode declaration. |
| projects/clr/hipamd/src/hip_platform.cpp | Remove module loading mode handling in platform state. |
| projects/clr/hipamd/src/hip_module.cpp | Remove hipModuleGetLoadingMode API implementation. |
| projects/clr/hipamd/src/hip_hcc.map.in | Stop exporting hipModuleGetLoadingMode symbol. |
| projects/clr/hipamd/src/hip_graph_internal.hpp | Update leaf node sync detection logic. |
| projects/clr/hipamd/src/hip_code_object.hpp | Remove lazy-loading state flags; adjust managed-var accessor. |
| projects/clr/hipamd/src/hip_code_object.cpp | Always populate globals at load; remove lazy flags. |
| projects/clr/hipamd/src/hip_api_trace.cpp | Remove hipModuleGetLoadingMode tracing entry + drop step version to 25. |
| projects/clr/hipamd/src/amdhip.def | Stop exporting hipModuleGetLoadingMode from def file. |
| projects/clr/hipamd/include/hip/amd_detail/hip_prof_str.h | Remove hipModuleGetLoadingMode from profiler strings/args. |
| projects/clr/hipamd/include/hip/amd_detail/hip_api_trace.hpp | Drop runtime API step version to 25 and remove loading-mode pointer. |
| projects/clr/CHANGELOG.md | Remove changelog mention of hipModuleGetLoadingMode being added. |
| projects/amdsmi/tests/amd_smi_test/CMakeLists.txt | Update test link target to use selected AMD SMI library target. |
| projects/amdsmi/src/nic/ai-nic/amdsmi_unified/CMakeLists.txt | Improve include interface for NIC static lib for build/install. |
| projects/amdsmi/src/CMakeLists.txt | Build both shared/static, restrict exported symbols via version script, and update install/export logic. |
| projects/amdsmi/example/CMakeLists.txt | Link examples against static library when available. |
| projects/amdsmi/amdsmi_cli/amdsmi_init.py | Adjust init error handling; prefer driver-reload guidance. |
| projects/amdsmi/amdsmi_cli/amdsmi_helpers.py | Update partition-mode warning text and remove reload-driver prompt. |
| projects/amdsmi/amdsmi_cli/amdsmi_commands.py | Delay default info collection until amdgpu initialized; update reload message. |
| projects/amdsmi/CMakeLists.txt | Add options to auto-build static libs and build both libs. |
| projects/amdsmi/CHANGELOG.md | Document removal of reload-driver CLI option(s). |
| projects/amdsmi/.github/dependabot.yml | Remove per-project dependabot config (now centralized). |
| .github/dependabot.yml | Centralize pip dependabot configs and reduce PR rate via grouping. |
Comments suppressed due to low confidence (6)
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/metrics.cpp:1
loadMetricsnow hardcodesconfig.yaml, but this diff does not add aconfig.yamlfile (and still modifiescounter_defs.yaml). This will reliably fail at runtime with theROCP_FATAL_IFwhenconfig.yamlis not present. Either add/rename the YAML file toconfig.yaml(and ensure it’s installed to the searched location), or keep the filename ascounter_defs.yamlconsistently across lookup, install rules, and documentation.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/firmware_restrictions.cpp:1- Both helpers return
std::stringbut currently returnpathexpressions (path / ...). Unlesscommon::filesystem::pathprovides an implicit conversion tostd::string(unusual), this is a compile error. Convert the resulting path explicitly (e.g.,.string()) before returning, and consider normalizing to absolute/canonical if that’s what downstream expects.
projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst:1 - The documented key
fw_restriction_schema_versionuses underscores, but the implementation inparse_firmware_restrictionsrequiresfw-restriction-schema-version(hyphens). This mismatch will cause users to author YAML that the parser rejects. Update the docs to match the implemented key (or update the parser to accept both for backward/forward compatibility).
projects/rocprofiler-sdk/source/lib/common/dl.cpp:1 - In the
ROCPROFILER_SYMBOL_PATH_USE_DLOPENbranch, the function never returns a resolved path even ifdlsymsucceeds (it always falls through toreturn std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., viadladdr) and return it, or else the branch is misleading and ineffective.
projects/rocprofiler-sdk/source/lib/common/dl.cpp:1 - In the
ROCPROFILER_SYMBOL_PATH_USE_DLOPENbranch, the function never returns a resolved path even ifdlsymsucceeds (it always falls through toreturn std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., viadladdr) and return it, or else the branch is misleading and ineffective.
projects/rocprofiler-sdk/source/lib/common/dl.cpp:1 - In the
ROCPROFILER_SYMBOL_PATH_USE_DLOPENbranch, the function never returns a resolved path even ifdlsymsucceeds (it always falls through toreturn std::nullopt). If this code path is meant to be usable for debugging, it should translate the found symbol address (_fn) into a library path (e.g., viadladdr) and return it, or else the branch is misleading and ineffective.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| amd_comgr_status_t getSymbolFromModule(amd_comgr_symbol_t symbol, void* userData) { | ||
| size_t nlen = 0; | ||
| size_t* userDataInfo = nullptr; | ||
| amd_comgr_status_t status; | ||
| amd_comgr_symbol_type_t type; | ||
| std::vector<std::string>* var_names = nullptr; |
There was a problem hiding this comment.
This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.
| /* Retrieve the symbol name */ | ||
| char* name = new char[nlen + 1]; | ||
| status = amd::Comgr::symbol_get_info(symbol, AMD_COMGR_SYMBOL_INFO_NAME, name); | ||
| if (status != AMD_COMGR_STATUS_SUCCESS) { | ||
| return status; | ||
| } |
There was a problem hiding this comment.
This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.
| sym_info->var_names->push_back(std::string(name)); | ||
| } | ||
|
|
||
| delete[] name; |
There was a problem hiding this comment.
This introduces (1) unused locals (userDataInfo, var_names) which can break builds under -Werror, and (2) a leak: if symbol_get_info(...NAME...) fails, the function returns without delete[] name. Prefer removing the unused variables and using RAII (e.g., std::string/std::vector<char>/std::unique_ptr<char[]>) so early returns don’t leak.
8d9f4c8 to
ed700d7
Compare
ed700d7 to
1716c94
Compare
* Clear CHANGELOG indicating we resolved roofline peaks for MI350 * Fix typo in pre-processor guard preventing roofline from running on MI300 * Ruff formatting
9e338f9
into
users/vedithal/rocprofiler-compute-temp-develop
* Use gfx950 builtin for MFMA FP16 * Use gfx950 builtin for MFMA BF16 * Use gfx950 builtin for MFMA I8 * Fix comments * Update copyright * Update CHANGELOG * Fix formatting * Review comments * Clear CHANGELOG indicating we resolved roofline peaks for MI350 * Fix typo in pre-processor guard preventing roofline from running on MI300 * Ruff formatting * Fix uninitialized variables --------- Co-authored-by: benrichard-amd <ben.richard@amd.com>
* Use gfx950 builtin for MFMA FP16 * Use gfx950 builtin for MFMA BF16 * Use gfx950 builtin for MFMA I8 * Fix comments * Update copyright * Update CHANGELOG * Fix formatting * Review comments * Clear CHANGELOG indicating we resolved roofline peaks for MI350 * Fix typo in pre-processor guard preventing roofline from running on MI300 * Ruff formatting * Fix uninitialized variables --------- Co-authored-by: benrichard-amd <ben.richard@amd.com>
Motivation
Resolves #3506
MFMA tests for F16, BF16, and I8 were far below spec on gfx950.
Copy of #3837 to remove unintended rccl changes being merged.
Original author @benrichard-amd
Technical Details
gfx950introduced new MFMA instructions with higher throughput for F16, BF16 and I8. Use the builtins for these ongfx950.JIRA ID
Test Plan
Test Result
Submission Checklist