Skip to content

[rocprofiler-systems] Selective ROCTX region tracing, unified pause/resume, and marker pipeline refactor#4143

Open
mradosav-amd wants to merge 30 commits intodevelopfrom
users/mradosav-amd/rocprofsys-selective-region
Open

[rocprofiler-systems] Selective ROCTX region tracing, unified pause/resume, and marker pipeline refactor#4143
mradosav-amd wants to merge 30 commits intodevelopfrom
users/mradosav-amd/rocprofsys-selective-region

Conversation

@mradosav-amd
Copy link
Contributor

@mradosav-amd mradosav-amd commented Mar 17, 2026

Motivation

Applications need to limit collection to specific ROCTX-named regions and to pause or resume tracing from the app (including Python) without skewing the rest of the run. This work adds region-based filtering (ROCPROFSYS_TRACE_REGION), extends pause/resume across sampling, Kokkos, and gotcha-based components, and refactors marker handling so region filtering and lifecycle control are explicit and testable.

Technical Details

  • New config/API: get_trace_region() / ROCPROFSYS_TRACE_REGION drives which ROCTX ranges enable tracing; rocprofsys_external_register_pause_callbacks wires Python (or other) pause/resume into the runtime.
  • Core refactor: marker tracing is split into marker_writer, roctx_client, and trace_control (region filter state, start/stop callbacks, initial pause when a filter is active), with counter and SDK integration updates in rocprofiler-sdk.cpp.
  • Components: pause/resume hooks added or completed for sampling, kokkosp, and multiple gotcha components (MPI, NUMA, pthread, UCX, VAAPI, etc.).
  • Examples under examples/roctx/ demonstrate selective regions and pause/resume; tests add GTest coverage for the new units, a large CMake-driven selective-region suite, and pytest for selective-region scenarios.

JIRA ID

AIPROFSYST-230
AIPROFSYST-231

Test Plan

  • Build and run rocprofiler-sdk unit tests (test_marker_writer, test_roctx_client).
  • Run the selective-region CMake test target and tests/pytest/test_selective_regions.py.
  • Build and run the new examples/roctx targets (selective_region, pause_resume, selective_region_pause_*) with tracing enabled and configs covering filter on/off and pause/resume.

Test Result

  • With a region filter set, only tracing inside matching ROCTX ranges (and correct pause/resume behavior at boundaries) should appear in the output; unfiltered mode should match prior behavior aside from intentional pause/resume.
  • No regressions: app runs that do not set region filter or external callbacks should behave as before; Python registration should pause/resume without crashes.

Submission Checklist

@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/rocprofsys-selective-region branch from 3095a92 to 58f7563 Compare March 19, 2026 10:12
@mradosav-amd mradosav-amd changed the title [rocprofiler-systems] [don't review] ROCTx selective tracing [rocprofiler-systems] Selective ROCTX region tracing, unified pause/resume, and marker pipeline refactor Mar 20, 2026
@mradosav-amd mradosav-amd marked this pull request as ready for review March 23, 2026 10:24
@mradosav-amd mradosav-amd requested review from a team and jrmadsen as code owners March 23, 2026 10:25
mradosav-amd and others added 23 commits March 23, 2026 15:03
* Add pause/resume for kokkosp component

* Applied suggestions from code review
* Add ROCPROFSYS_TRACE_REGION for roctx-based region filtering

* Encapsulate control logic into control_client class. Expose just start/stop callback register APIs

* Separate stopable and always on contexts handling within same tool and client ID

* Add counters handling in case of stop/pause

* Address PR findings and suggestions
* Add python pause/resume integration

* Remove unnecessary comments
…ler. Separate logic for start/pause and for writing marker regions. (#3887)

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

<!-- Explain the changes along with any relevant GitHub links. -->

<!-- If applicable, mention the JIRA ID resolved by this PR (Example:
Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

<!-- Explain any relevant testing done to verify this PR. -->

<!-- Briefly summarize test outcomes. -->

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/rocprofsys-selective-region branch from 9ede90f to 1abd5f8 Compare March 23, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants