[rocprofiler-systems] AMD SMI collector using policy-based design#3703
Merged
adjordje-amd merged 26 commits intodevelopfrom Mar 30, 2026
Merged
[rocprofiler-systems] AMD SMI collector using policy-based design#3703adjordje-amd merged 26 commits intodevelopfrom
adjordje-amd merged 26 commits intodevelopfrom
Conversation
ddfae4b to
13392da
Compare
...ts/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/nic/perfetto_policy.hpp
Outdated
Show resolved
Hide resolved
...ts/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/nic/perfetto_policy.hpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/nic/device.hpp
Outdated
Show resolved
Hide resolved
...ects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/base/traits_check.hpp
Outdated
Show resolved
Hide resolved
Rewrite AMD SMI and AINIC collectors using a policy-based design pattern that enables code reuse and extensibility. Key changes: - Add base collector template with configurable policies for caching, Perfetto output, and device handling - Implement GPU collector with SDMA utilization metrics - Implement NIC collector for AINIC RDMA metrics - Reorganize collector files into unified hierarchy under pmc/ - Update trace cache processors for new sample types - Add comprehensive unit tests with mock drivers
- Replace std::stringstream with fmt::format in types.hpp - Use std::vector instead of std::unordered_map for SDMA states - Template sample() callback to avoid std::function overhead - Fix CMake include dirs and link GPU/NIC tests to unit test target
13392da to
3710324
Compare
Remove debug printf statement left in perfetto_policy.hpp that was printing JPEG activity status to stdout in production. Also apply clang-format-18 to fix alignment issues in: - sample_processor.hpp (type alias alignment) - base/collector.hpp (comment wrapping, blank line) - gpu/collector.hpp (LOG_ERROR formatting, blank line)
Address PR review feedback and modernize PMC collectors: - Replace strcmp chain with hash map for O(1) lookup (NIC device) - Move static storage into policy classes for encapsulation - Convert all namespaces to C++17 nested syntax (18 files) - Replace timemory::join with fmt::format for type safety Net reduction: 163 lines. Build verified.
Address PR review feedback and modernize PMC collectors: - Replace strcmp chain with hash map for O(1) lookup (NIC device) - Move static storage into policy classes for encapsulation - Convert all namespaces to C++17 nested syntax (18 files) - Replace timemory::join with fmt::format for type safety Net reduction: 163 lines. Build verified.
e946934 to
6dff09c
Compare
dgaliffiAMD
reviewed
Mar 12, 2026
Contributor
dgaliffiAMD
left a comment
There was a problem hiding this comment.
Still reviewing the new files, but here are some initial thoughts.
It looks like the AI-NIC addition is breaking the back-compat. configurations. It probably just needs that preprocessor version check.
Please add and update to CHANGELOGS.
I'm seeing some unit test failures locally but haven't had a change to root cause them yet.
1: [==========] 443 tests from 24 test suites ran. (2660 ms total)
1: [ PASSED ] 428 tests.
1: [ FAILED ] 15 tests, listed below:
1: [ FAILED ] DeviceTest.device_construction_no_support
1: [ FAILED ] DeviceTest.device_construction_partial_support
1: [ FAILED ] DeviceTest.edge_temperature_collection
1: [ FAILED ] DeviceTest.jpeg_activity_collection_all_xcps
1: [ FAILED ] DeviceTest.xcp_metrics_not_collected_when_unsupported
1: [ FAILED ] DeviceTest.mixed_vcn_jpeg_support
1: [ FAILED ] DeviceTest.all_metrics_supported_detection
1: [ FAILED ] DeviceTest.vcn_activity_support_detection_any_xcp
1: [ FAILED ] DeviceTest.vcn_activity_unsupported_all_sentinels
1: [ FAILED ] DeviceTest.jpeg_activity_support_detection_any_xcp
1: [ FAILED ] DeviceTest.vcn_activity_top_level_field_only
1: [ FAILED ] DeviceTest.vcn_activity_in_both_fields
1: [ FAILED ] DeviceTest.vcn_activity_detection_should_check_both_sources
1: [ FAILED ] DeviceTest.vcn_activity_xcp_disabled_top_level_valid
1: [ FAILED ] DeviceTest.full_lifecycle_with_realistic_data
1:
1: 15 FAILED TESTS
projects/rocprofiler-systems/source/lib/core/trace_cache/perfetto_processor.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/perfetto_processor.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/perfetto_processor.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/rocpd_processor.cpp
Outdated
Show resolved
Hide resolved
c8a3965 to
27dd089
Compare
27dd089 to
b815fa2
Compare
The NIC collector was a standalone 264-line implementation that duplicated much of the base::collector pattern. This refactoring aligns it with the GPU collector approach for consistency and maintainability. Key changes: - Create nic_traits.hpp with NIC-specific behavior (name-based filtering, device context caching, agent registration) - Simplify collector.hpp to a type alias (264 → 24 lines) - Make perfetto_policy::post_process_device() public for traits access - Use uint64_t consistently for timestamps - Add noexcept to simple getter methods The traits class bridges NIC-specific requirements to base::collector: - Name-based device filtering (vs GPU's index-based) - Device context storage for APIs needing device_name/product_name - Agent registration during enumeration
Add SDMA (System DMA) usage as a new GPU metric for PMC collection, enabling monitoring of DMA engine utilization. Key changes: - Add sdma_usage metric (ID 14) with accessor, setter, and mask - Add SDMA to user metric aliases and Perfetto track info - Update metric validation pattern to accept "sdma_usage" - Add pytest marker for SDMA tests Additional fixes: - Change get_use_rocpd/get_caching_perfetto to return bool by value - Fix store_sample parameter order (enabled before supported) - Use core/amd_smi.hpp wrapper instead of direct amdsmi.h include - Change sample.device_id from size_t to uint32_t - Add debug logging for perfetto processor creation
Save point before implementing device_view type erasure. Fix DeviceTest unit tests failing due to missing mock expectation for get_gpu_asic_info(), which is now called during device initialization to populate vendor/product names.
013b786 to
cfa3744
Compare
Fix issues identified during PR review: Log level corrections: - Revert LOG_INFO to LOG_TRACE in agent_manager (routine operation) - Remove debug LOG_INFO messages from cache_manager - Change LOG_INFO to LOG_TRACE in rocpd_processor (hot path) - Change LOG_INFO to LOG_DEBUG in perfetto_policy post-processing Dead code removal: - Remove unused device_desc variable in nic/cache_policy - Remove uninitialized sample_metrics method from device_slice - Remove commented-out code and unused typeinfo include Move semantics fix: - Add noexcept to provider destructor - Implement explicit move constructor/assignment to prevent double-shutdown of AMD SMI driver in moved-from objects
cfa3744 to
ff2044c
Compare
The NIC/AINIC AMD SMI APIs (amdsmi_nic_asic_info_t, amdsmi_get_nic_*, AMDSMI_PROCESSOR_TYPE_AMD_NIC, etc.) are only available in AMD SMI >= 26.3 (ROCm 7.3+). CI tests against older ROCm versions (6.3-7.2) were failing because these APIs were used unconditionally. This commit adds #if defined(ROCPROFSYS_BUILD_AINIC) guards around: - NIC API wrapper functions in driver.hpp - AMDSMI_PROCESSOR_TYPE_AMD_NIC usage in provider.hpp This matches the existing pattern used for SDMA support with AMD_SMI_SDMA_SUPPORTED.
cbaf852 to
5c708a2
Compare
5c708a2 to
5879cac
Compare
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/nic/device.hpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocm.cpp
Outdated
Show resolved
Hide resolved
mradosav-amd
approved these changes
Mar 24, 2026
dgaliffiAMD
reviewed
Mar 24, 2026
Contributor
dgaliffiAMD
left a comment
There was a problem hiding this comment.
I've tested on my "Navi" system. The output generated from the transpose and decode tests look good.
I haven't yet tested it on an MI system. I suspect those CI failures are MI-specific. Is it due to the XCP metrics?
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/base/collector.hpp
Outdated
Show resolved
Hide resolved
marantic-amd
approved these changes
Mar 26, 2026
Contributor
marantic-amd
left a comment
There was a problem hiding this comment.
Some minor nit-pics and suggestions. Looks good.
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/gpu/types.hpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/gpu/types.hpp
Outdated
Show resolved
Hide resolved
...rocprofiler-systems/source/lib/rocprof-sys/library/pmc/device_providers/amd_smi/provider.hpp
Show resolved
Hide resolved
...rocprofiler-systems/source/lib/rocprof-sys/library/pmc/device_providers/amd_smi/provider.hpp
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/gpu/types.hpp
Show resolved
Hide resolved
dgaliffiAMD
reviewed
Mar 26, 2026
.../rocprofiler-systems/source/lib/rocprof-sys/library/pmc/collectors/gpu/tests/test_device.cpp
Show resolved
Hide resolved
Track metadata registration was called inside the per-device loop, causing redundant re-initialization for each device. Move it before the loop alongside category metadata initialization, which is the correct initialization order.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The current AMD SMI implementation faces several challenges related to its maintainability and architecture, which this refactor aims to address:
Monolithic Design: The original amd_smi.cpp file, comprising 1354 lines, merges sampling, caching, and output logic. This tight coupling makes it challenging to modify or extend individual components.
Lack of Testability: Due to the integration with AMD SMI hardware APIs, the implementation is difficult to unit test and necessitates real hardware for verification.
Limited Flexibility: Introducing support for new output formats, such as RocPD alongside Perfetto, involves substantial code duplication.
Accumulated Bugs: The implementation contains several persistent bugs
The refactoring process aims to tackle these issues by replacing the monolithic setup with a contemporary, policy-based design. This new approach emphasizes modularity, testability, and enhanced performance
Technical Details
New architecture

JIRA ID
Test Plan
Test Result
Submission Checklist