Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

sphinx:
configuration: docs/conf.py

# RTD by default builds html only
# Additional formats available for extra build time: htmlzip, pdf, epub
formats: []

python:
install:
- requirements: docs/sphinx/requirements.txt

# Defines build environment
build:
os: ubuntu-22.04
tools:
python: "3.10"
61 changes: 61 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

import re

'''
html_theme is usually unchanged (rocm_docs_theme).
flavor defines the site header display, select the flavor for the corresponding portals
flavor options: rocm, rocm-docs-home, rocm-blogs, rocm-ds, instinct, ai-developer-hub, local, generic
'''
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}


# This section turns on/off article info
setting_all_article_info = True
all_article_info_os = ["linux"]
all_article_info_author = ""

Dynamically extract component version
with open('../CMakeLists.txt', encoding='utf-8') as f:
pattern = r'.*\brocm_setup_version\(VERSION\s+([0-9.]+)[^0-9.]+' # Update according to each component's CMakeLists.txt
match = re.search(pattern,
f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = "1.0"

# for PDF output on Read the Docs
project = "AQLprofile"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number

external_toc_path = "./sphinx/_toc.yml" # Defines Table of Content structure definition path

'''
Doxygen Settings
Ensure Doxyfile is located at docs/doxygen.
If the component does not need doxygen, delete this section for optimal build time
'''
#doxygen_root = "doxygen"
#doxysphinx_enabled = False
# doxygen_project = {
# "name": "doxygen",
# "path": "doxygen/xml",
#}

# Add more addtional package accordingly
extensions = [
"rocm_docs",
# "rocm_docs.doxygen",
]

html_title = f"{project} {version_number} documentation"

external_projects_current_project = "AQLprofile"
109 changes: 109 additions & 0 deletions docs/examples/pmc-workflow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
.. meta::
:description: A typical workflow for collecting PMC data
:keywords: AQLprofile, ROCm, API, how-to, PMC

******************************************
Performance Monitor Control (PMC) workflow
******************************************

This page describes a typical workflow for collecting PMC data using AQLprofile (as integrated in ``rocprofiler-sdk``).
This workflow relies on creating a profile object, generating command packets, and iterating over output buffers:

1. **Intercept kernel dispatch**: The SDK intercepts kernel dispatch packets submitted to the GPU queue.
2. **Create a profile object**: A profile/session object is created, specifying the agent (GPU), events (counters), and output buffers.
3. **Generate command packets**: Start, stop, and read command packets are generated and injected into the queue around the kernel dispatch.
4. **Submit packets and run the kernel**: The kernel and profiling packets are submitted to the GPU queue for execution.
5. **Collect the output buffer**: After execution, the output buffer is read back from the GPU.
6. **Iterate and extract the results**: The SDK iterates over the output buffer to extract and report counter results.

The SDK abstracts queue interception and packet management so tool developers can focus on results.

Key API code snippets
=====================

These API snippets use the legacy interfaces from ``hsa_ven_amd_aqlprofile.h``. These are provided for understanding purposes only.
For new development, refer to the updated APIs in ``aql_profile_v2.h``.

.. note::

The ``rocprofiler-sdk`` is migrating to these newer interfaces; the old APIs may be deprecated in future releases.

Define the events and profile
-----------------------------

.. code:: cpp

// Select events (counters) to collect
hsa_ven_amd_aqlprofile_event_t events[] = {
{ HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQ, 0, 2 }, // Example: SQ block, instance 0, counter 2
{ HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQ, 0, 3 }
};

// Create profile object
hsa_ven_amd_aqlprofile_profile_t profile = {
.agent = agent, // hsa_agent_t
.type = HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC,
.events = events,
.event_count = sizeof(events)/sizeof(events[0]),
.parameters = nullptr,
.parameter_count = 0,
.output_buffer = {output_ptr, output_size},
.command_buffer = {cmd_ptr, cmd_size}
};


Validate events
---------------

.. code:: cpp

bool valid = false;
hsa_ven_amd_aqlprofile_validate_event(agent, &events[0], &valid);
if (!valid) {
// Handle invalid event
}


Generate command packets
-------------------------

.. code:: cpp

hsa_ext_amd_aql_pm4_packet_t start_pkt, stop_pkt, read_pkt;
hsa_ven_amd_aqlprofile_start(&profile, &start_pkt);
hsa_ven_amd_aqlprofile_stop(&profile, &stop_pkt);
hsa_ven_amd_aqlprofile_read(&profile, &read_pkt);


Submit packets and run the kernel
---------------------------------

.. code:: cpp

// Pseudocode: inject packets into HSA queue
queue->Submit(&start_pkt);
queue->Submit(&kernel_pkt);
queue->Submit(&stop_pkt);
queue->Submit(&read_pkt);


Iterate and extract results
----------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_iterate_data(
&profile,
[](hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* user_data) -> hsa_status_t {
if (info_type == HSA_VEN_AMD_AQLPROFILE_INFO_PMC_DATA) {
printf("Event: block %d, id %d, value: %llu\n",
info_data->pmc_data.event.block_name,
info_data->pmc_data.event.counter_id,
info_data->pmc_data.result);
}
return HSA_STATUS_SUCCESS;
},
nullptr
);
93 changes: 93 additions & 0 deletions docs/examples/sqtt-workflow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. meta::
:description: A typical workflow for collecting detailed instruction-level traces
:keywords: AQLprofile, ROCm, API, how-to, SQTT

*******************************
SQ Thread Trace (SQTT) workflow
*******************************

The SQ Thread Trace workflow focuses on collecting detailed instruction-level traces.
This workflow relies on creating a profile object, generating command packets, and iterating over output buffers:

1. **Intercept the kernel dispatch**: The SDK intercepts the kernel dispatch.
2. **Create a SQTT profile object**: A profile object is created for SQTT, specifying trace parameters and output buffers.
3. **Generate SQTT command packets**: Start, stop, and read packets for SQTT are generated and injected into the queue.
4. **Submit packets and run the kernel**: The kernel and SQTT packets are submitted for execution.
5. **Collect the trace buffer**: The trace output buffer is collected after execution.
6. **Iterate and decode trace data**: The SDK iterates over the trace buffer and decodes the SQTT data for analysis.

The SDK abstracts queue interception and packet management so tool developers can focus on results.

Key API code snippets
=====================

These API snippets use the legacy interfaces from ``hsa_ven_amd_aqlprofile.h``. These are provided for understanding purposes only.
For new development, refer to the updated APIs in ``aql_profile_v2.h``.

In the ``rocprofiler-sdk`` codebase, these APIs are wrapped and orchestrated in the ``aql``, ``hsa``, and ``thread_trace`` folders for queue interception, packet construction, and result iteration.

.. note::

The ``rocprofiler-sdk`` is migrating to these newer interfaces; the old APIs may be deprecated in future releases.

Define parameters and profile
------------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_parameter_t params[] = {
{ HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_ATT_BUFFER_SIZE, 16 } // 16 MB buffer
};

hsa_ven_amd_aqlprofile_profile_t profile = {
.agent = agent,
.type = HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_TRACE,
.events = nullptr,
.event_count = 0,
.parameters = params,
.parameter_count = sizeof(params)/sizeof(params[0]),
.output_buffer = {trace_ptr, trace_size},
.command_buffer = {cmd_ptr, cmd_size}
};


Generate SQTT start/stop packets
---------------------------------

.. code:: cpp

hsa_ext_amd_aql_pm4_packet_t sqtt_start_pkt, sqtt_stop_pkt;
hsa_ven_amd_aqlprofile_start(&profile, &sqtt_start_pkt);
hsa_ven_amd_aqlprofile_stop(&profile, &sqtt_stop_pkt);


Submit packets and run the kernel
---------------------------------

.. code:: cpp

queue->Submit(&sqtt_start_pkt);
queue->Submit(&kernel_pkt);
queue->Submit(&sqtt_stop_pkt);


Iterate and decode trace data
-----------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_iterate_data(
&profile,
[](hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* user_data) -> hsa_status_t {
if (info_type == HSA_VEN_AMD_AQLPROFILE_INFO_TRACE_DATA) {
// info_data->trace_data.ptr, info_data->trace_data.size
decode_trace(info_data->trace_data.ptr, info_data->trace_data.size);
}
return HSA_STATUS_SUCCESS;
},
nullptr
);


40 changes: 40 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. meta::
:description: AQLprofile is an open source library that enables advanced GPU profiling and tracing on AMD platforms.
:keywords: AQLprofile, ROCm, tool, Instinct, accelerator, AMD

.. _index:

********************************
AQLprofile documentation
********************************

This documentation provides a comprehensive overview of the AQLprofile library. This documentation explains the ideas motivating the design
behind the tool and its components.

If you're new to AQLprofile, see :doc:`What is AQLprofile? <what-is-aqlprofile>`.

AQLprofile is open source and hosted at .

.. grid:: 2
:gutter: 3

.. grid-item-card:: Install

* :doc:`Install AQLprofile <install/aqlprofile-install>`

.. grid-item-card:: Examples

* :doc:`Performance Monitor Control (PMC) workflow <examples/pmc-workflow>`
* :doc:`SQ Thread Trace (SQTT) workflow <examples/sqtt-workflow>`

.. grid-item-card:: Reference

* :doc:`Terms <reference/terms>`
* :doc:`APIs <reference/api-list>`


To contribute to the documentation, refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.

You can find licensing information on the
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
Loading