[rocprofiler-systems] Selective ROCTX region tracing, unified pause/resume, and marker pipeline refactor#4143
Open
mradosav-amd wants to merge 30 commits intodevelopfrom
Open
Conversation
3095a92 to
58f7563
Compare
* Add pause/resume for kokkosp component * Applied suggestions from code review
* Add ROCPROFSYS_TRACE_REGION for roctx-based region filtering * Encapsulate control logic into control_client class. Expose just start/stop callback register APIs * Separate stopable and always on contexts handling within same tool and client ID * Add counters handling in case of stop/pause * Address PR findings and suggestions
* Add python pause/resume integration * Remove unnecessary comments
…ler. Separate logic for start/pause and for writing marker regions. (#3887) <!-- Explain the purpose of this PR and the goals it aims to achieve. --> <!-- Explain the changes along with any relevant GitHub links. --> <!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). --> <!-- Do not post any JIRA links here. --> <!-- Explain any relevant testing done to verify this PR. --> <!-- Briefly summarize test outcomes. --> - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
9ede90f to
1abd5f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Applications need to limit collection to specific ROCTX-named regions and to pause or resume tracing from the app (including Python) without skewing the rest of the run. This work adds region-based filtering (
ROCPROFSYS_TRACE_REGION), extends pause/resume across sampling, Kokkos, and gotcha-based components, and refactors marker handling so region filtering and lifecycle control are explicit and testable.Technical Details
get_trace_region()/ROCPROFSYS_TRACE_REGIONdrives which ROCTX ranges enable tracing;rocprofsys_external_register_pause_callbackswires Python (or other) pause/resume into the runtime.marker_writer,roctx_client, andtrace_control(region filter state, start/stop callbacks, initial pause when a filter is active), with counter and SDK integration updates inrocprofiler-sdk.cpp.kokkosp, and multiple gotcha components (MPI, NUMA, pthread, UCX, VAAPI, etc.).examples/roctx/demonstrate selective regions and pause/resume; tests add GTest coverage for the new units, a large CMake-driven selective-region suite, andpytestfor selective-region scenarios.JIRA ID
AIPROFSYST-230
AIPROFSYST-231
Test Plan
rocprofiler-sdkunit tests (test_marker_writer,test_roctx_client).tests/pytest/test_selective_regions.py.examples/roctxtargets (selective_region,pause_resume,selective_region_pause_*) with tracing enabled and configs covering filter on/off and pause/resume.Test Result
Submission Checklist