Full documentation for ROCprofiler-SDK is available at rocm.docs.amd.com/projects/rocprofiler-sdk
- HSA API tracing
- Kernel dispatch tracing
- Kernel dispatch counter collection
- Instances reported as single dimension
- No serialization
- HIP API tracing
- ROCTx tracing
- Tracing ROCProf Tool V3
- Documentation packaging
- ROCTx control (start and stop)
- Memory copy tracing
- Kernel dispatch counter collection. This includes serialization and multidimensional instances.
- Kernel serialization.
- Serialization control (on and off).
- ROCprof tool plugin interface V3 for counters and dimensions.
- Support to list metrics.
- Correlation-Id retirement
- HIP and HSA trace distinction:
- --hip-runtime-trace For collecting HIP Runtime API traces
- --hip-compiler-trace For collecting HIP compiler-generated code traces
- --hsa-core-trace For collecting HSA API traces (core API)
- --hsa-amd-trace For collecting HSA API traces (AMD-extension API)
- --hsa-image-trace For collecting HSA API traces (image-extension API)
- --hsa-finalizer-trace For collecting HSA API traces (finalizer-extension API)
API:
- Page migration reporting
- Scratch memory reporting
- Kernel dispatch callback tracing
- External correlation Id request service
- Buffered counter collection record headers
- Option to remove HSA dependency from counter collection
Tool:
rocprofv3multi-GPU support in a single-process
API:
- Agent or device counter collection
- PC sampling (beta)
Tool:
- Single JSON output format support
- Perfetto output format support (.pftrace)
- Input YAML support for counter collection
- Input JSON support for counter collection
- Application replay in counter collection
rocprofv3multi-GPU support:- Multiprocess (multiple files)
rocprofv3tool now requires mentioning--before the application. For detailed use, see Using rocprofv3
- Fixed
SQ_ACCUM_PREVandSQ_ACCUM_PREV_HIREoverwriting issue
- OTF2 tool support
- Kernel and range filtering
- Counter collection definitions in YAML
- Documentation updates (SQ block, counter collection, tracing, tool usage)
rocprofv3option--kernel-renamerocprofv3options for Perfetto settings (buffer size and so on)- CSV columns for kernel trace
Thread_IdDispatch_Id
- CSV column for counter collection
- Start and end timestamp columns to the counter collection csv output
- Check to force tools to initialize context id with zero
- Support to specify hardware counters for collection using rocprofv3 as
rocprofv3 --pmc [COUNTER [COUNTER ...]] - Memory Allocation Tracing
- PC sampling tool support with CSV and JSON output formats
- List supported PC Sampling Configurations
--marker-traceoption forrocprofv3now supports the legacy ROCTx librarylibroctx64.sowhen the application is linked against the new librarylibrocprofiler-sdk-roctx.so.- Replaced deprecated
hipHostMallocandhipHostFreefunctions withhipExtHostAllocandhipFreeHostfor ROCm versions starting 6.3. - Updated
rocprofv3--helpoptions. - Changed naming of "agent profiling" to a more descriptive "device counting service". To convert existing tool or user code to the new name, use the following sed:
find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} + - Changed naming of "dispatch profiling service" to a more descriptive "dispatch counting service". To convert existing tool or user code to the new names, the following sed can be used:
-type f -exec sed -i -e 's/dispatch_profile_counting_service/dispatch_counting_service/g' -e 's/dispatch_profile.h/dispatch_counting_service.h/g' -e 's/rocprofiler_profile_counting_dispatch_callback_t/rocprofiler_dispatch_counting_service_callback_t/g' -e 's/rocprofiler_profile_counting_dispatch_data_t/rocprofiler_dispatch_counting_service_data_t/g' -e 's/rocprofiler_profile_counting_dispatch_record_t/rocprofiler_dispatch_counting_service_record_t/g' {} + FETCH_SIZEmetric on gfx94x now usesTCC_BUBBLEfor 128B reads.- PMC dispatch-based counter collection serialization is now per-device instead of being global across all devices.
- Added output return functionality to rocprofiler_sample_device_counting_service
- Added rocprofiler_load_counter_definition.
- Create subdirectory when
rocprofv3 --output-fileincludes a folder path - Fixed misaligned stores (undefined behavior) for buffer records
- Fixed crash when only scratch reporting is enabled
- Fixed
MeanOccupancymetrics - Fixed aborted-application validation test to properly check for
hipExtHostAlloccommand - Fixed implicit reduction of SQ and GRBM metrics
- Fixed support for derived counters in reduce operation
- Bug fixed in max-in-reduce operation
- Introduced fix to handle a range of values for
select()dimension in expressions parser - Conditional
aql::set_profiler_active_on_queueonly when counter collection is registered (resolves Navi3 kernel tracing issues)
- Removed gfx8 metric definitions
- Removed
rocprofv3installation to sbin directory
- Support for
select()operation in counter expression. reduce()operation for counter expression with respect to dimension.--collection-periodfeature inrocprofv3to enable filtering using time.--collection-period-unitfeature inrocprofv3to control time units used in collection period option.- Deprecation notice for ROCProfiler and ROCProfilerV2.
- Support for rocDecode API Tracing
- Usage documentation for ROCTx
- Usage documentation for MPI applications
- SDK:
rocprofiler_agent_v0_tsupport for agent UUIDs - SDK:
rocprofiler_agent_v0_tsupport for agent visibility based on gpu isolation environment variables such asROCR_VISIBLE_DEVICESand so on. - Accumulation VGPR support for
rocprofv3. - Host-trap based PC sampling support for rocprofv3.
- Support for OpenMP tool.
- Added support for rocJPEG API Tracing
- Added MI350X/MI355X support
- Added rocprofiler_create_counter to allow for adding custom derived counters at runtime.
- Added support for iteration based counter multiplexing to rocprofv3 (see documentation)
- Added perfetto support for counter collection.
- Added support for negating rocprofv3 tracing options when using aggregate options, e.g.
--sys-trace --hsa-trace=no - Added
--agent-indexoption in rocprofv3 to specify the agent naming convention in the output- absolute == node_id
- relative == logical_node_id
- type-relative == logical_node_type_id
- Added MI300/MI350 stochastic (hardware-based) PC sampling support in ROCProfiler-SDK and ROCProfV3
- Python bindings for rocprofiler-sdk-roctx
- SQLite3 output support for rocprofv3 (
--output-format rocpd) - Added
rocprofiler-sdk-rocpdpackage- public API in
include/rocprofiler-sdk-rocpd/rocpd.h - library implementation in
librocprofiler-sdk-rocpd.so - support for
find_package(rocprofiler-sdk-rocpd) rocprofiler-sdk-rocpdDEB and RPM packages
- public API in
- Support
--versionoption forrocprofv3 - Added
rocpdPython package - Added thread trace as experimental API
- Added ROCprof Trace Decoder as experimental API
- Requires ROCprof Trace Decoder plugin
- Added thread trace option to the rocprofv3 tool under the --att parameters
- See using thread trace with rocprofv3
- Requires the ROCprof Trace Decoder plugin installed (see above).
- Added
rocpdoutput format documentation- Requires the ROCprof Trace Decoder plugin installed (see above)
- Added perfetto support for scratch memory.
- SDK no longer creates a background thread when every tool returns a nullptr from
rocprofiler_configure. - Updated disassembly.hpp's vaddr-to-file-offset mapping to use the dedicated comgr API.
- rocprofiler_uuid_t ABI is changed to hold 128 bit value.
- rocprofv3 shorthand argument for
--collection-periodis now-P(upper-case) as-p(lower-case) is reserved for later use - default output format for rocprofv3 is now
rocpd(SQLite3 database) - rocprofv3 avail tool renamed from rocprofv3_avail to rocprofv3-avail tool
- rocprofv3 avail tool has support for command line arguments.
- rocprofv3 tool now allows for Thread Trace + PC Sampling on the same agent
- fixed inconsistency for what is a "null" handle in
rocprofiler_*_id_tstructs.- correct answer is
.handle = 0but some definitions usedUINT64_MAX
- correct answer is
- Fixed missing callbacks around internal thread creation within counter collection service
- Fixed potential data race in rocprofiler-sdk double buffering scheme
- Usage of std::regex in core rocprofiler-sdk library which causes segfaults/exceptions when used under dual ABI
- Fixed perfetto counter collection by introducing per dispatch accumulation.
- Code object disassembly was missing function inlining information
- Fixed queue preemption error and HSA_STATUS_ERROR_INVALID_PACKET_FORMAT error for stochastic PC-sampling for MI300X, leading to more stable runs.
- Fixed the system hang issue for host-trap PC-sampling on MI300X.
- Fixed rocpd counter collection issue when counter collection alone is enabled, rocpd_kernel_dispatch table gets populated by counters data instead of kernel_dispatch data.
- Support of gfx940 and gfx941 targets from compilation