Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

import re

'''
html_theme is usually unchanged (rocm_docs_theme).
flavor defines the site header display, select the flavor for the corresponding portals
flavor options: rocm, rocm-docs-home, rocm-blogs, rocm-ds, instinct, ai-developer-hub, local, generic
'''
html_theme = "rocm_docs_theme"
html_theme_options = {"flavor": "rocm-docs-home"}


# This section turns on/off article info
setting_all_article_info = True
all_article_info_os = ["linux"]
all_article_info_author = ""

# Dynamically extract component version
with open('../CMakeLists.txt', encoding='utf-8') as f:
pattern = r'.*\brocm_setup_version\(VERSION\s+([0-9.]+)[^0-9.]+' # Update according to each component's CMakeLists.txt
match = re.search(pattern,
f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = "1.0"

# for PDF output on Read the Docs
project = "AQLprofile"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number

external_toc_path = "./sphinx/_toc.yml" # Defines Table of Content structure definition path

'''
Doxygen Settings
Ensure Doxyfile is located at docs/doxygen.
If the component does not need doxygen, delete this section for optimal build time
'''
#doxygen_root = "doxygen"
#doxysphinx_enabled = False
# doxygen_project = {
# "name": "doxygen",
# "path": "doxygen/xml",
#}

# Add more addtional package accordingly
extensions = [
"rocm_docs",
# "rocm_docs.doxygen",
]

html_title = f"{project} {version_number} documentation"

external_projects_current_project = "AQLprofile"
109 changes: 109 additions & 0 deletions docs/examples/pmc-workflow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
.. meta::
:description: A typical workflow for collecting PMC data
:keywords: AQLprofile, ROCm, API, how-to, PMC

**********************************************************
Performance Monitor Control (PMC) workflow with AQLprofile
**********************************************************

This page describes a typical workflow for collecting PMC data using AQLprofile (as integrated in `ROCprofiler-SDK <https://github.com/ROCm/rocprofiler-sdk>`__).
This workflow relies on creating a profile object, generating command packets, and iterating over output buffers:

1. **Intercept kernel dispatch**: The SDK intercepts kernel dispatch packets submitted to the GPU queue.
2. **Create a profile object**: A profile/session object is created, specifying the agent (GPU), events (counters), and output buffers.
3. **Generate command packets**: Start, stop, and read command packets are generated and injected into the queue around the kernel dispatch.
4. **Submit packets and run the kernel**: The kernel and profiling packets are submitted to the GPU queue for execution.
5. **Collect the output buffer**: After execution, the output buffer is read back from the GPU.
6. **Iterate and extract the results**: The SDK iterates over the output buffer to extract and report counter results.

The SDK abstracts queue interception and packet management so tool developers can focus on results.

Key API code snippets
=====================

These API snippets use the legacy interfaces from ``hsa_ven_amd_aqlprofile.h``. These are provided for understanding purposes only.
For new development, refer to the updated APIs in ``aql_profile_v2.h``.

.. note::

The ROCprofiler-SDK is migrating to these newer interfaces in ``aql_profile_v2.h``. You should use the APIs in ``aql_profile_v2.h`` to stay up-to-date.

Define the events and profile
-----------------------------

.. code:: cpp

// Select events (counters) to collect
hsa_ven_amd_aqlprofile_event_t events[] = {
{ HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQ, 0, 2 }, // Example: SQ block, instance 0, counter 2
{ HSA_VEN_AMD_AQLPROFILE_BLOCK_NAME_SQ, 0, 3 }
};

// Create profile object
hsa_ven_amd_aqlprofile_profile_t profile = {
.agent = agent, // hsa_agent_t
.type = HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC,
.events = events,
.event_count = sizeof(events)/sizeof(events[0]),
.parameters = nullptr,
.parameter_count = 0,
.output_buffer = {output_ptr, output_size},
.command_buffer = {cmd_ptr, cmd_size}
};


Validate events
---------------

.. code:: cpp

bool valid = false;
hsa_ven_amd_aqlprofile_validate_event(agent, &events[0], &valid);
if (!valid) {
// Handle invalid event
}


Generate command packets
-------------------------

.. code:: cpp

hsa_ext_amd_aql_pm4_packet_t start_pkt, stop_pkt, read_pkt;
hsa_ven_amd_aqlprofile_start(&profile, &start_pkt);
hsa_ven_amd_aqlprofile_stop(&profile, &stop_pkt);
hsa_ven_amd_aqlprofile_read(&profile, &read_pkt);


Submit packets and run the kernel
---------------------------------

.. code:: cpp

// Pseudocode: inject packets into HSA queue
queue->Submit(&start_pkt);
queue->Submit(&kernel_pkt);
queue->Submit(&stop_pkt);
queue->Submit(&read_pkt);


Iterate and extract results
----------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_iterate_data(
&profile,
[](hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* user_data) -> hsa_status_t {
if (info_type == HSA_VEN_AMD_AQLPROFILE_INFO_PMC_DATA) {
printf("Event: block %d, id %d, value: %llu\n",
info_data->pmc_data.event.block_name,
info_data->pmc_data.event.counter_id,
info_data->pmc_data.result);
}
return HSA_STATUS_SUCCESS;
},
nullptr
);
93 changes: 93 additions & 0 deletions docs/examples/sqtt-workflow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. meta::
:description: A typical workflow for collecting detailed instruction-level traces
:keywords: AQLprofile, ROCm, API, how-to, SQTT

***********************************************
SQ Thread Trace (SQTT) workflow with AQLprofile
***********************************************

The SQ Thread Trace workflow focuses on collecting detailed instruction-level traces.
This workflow relies on creating a profile object, generating command packets, and iterating over output buffers:

1. **Intercept the kernel dispatch**: The SDK intercepts the kernel dispatch.
2. **Create a SQTT profile object**: A profile object is created for SQTT, specifying trace parameters and output buffers.
3. **Generate SQTT command packets**: Start, stop, and read packets for SQTT are generated and injected into the queue.
4. **Submit packets and run the kernel**: The kernel and SQTT packets are submitted for execution.
5. **Collect the trace buffer**: The trace output buffer is collected after execution.
6. **Iterate and decode trace data**: The SDK iterates over the trace buffer and decodes the SQTT data for analysis.

The SDK abstracts queue interception and packet management so tool developers can focus on results.

Key API code snippets
=====================

These API snippets use the legacy interfaces from ``hsa_ven_amd_aqlprofile.h``. These are provided for understanding purposes only.
For new development, refer to the updated APIs in ``aql_profile_v2.h``.

In the `ROCprofiler-SDK <https://github.com/ROCm/rocprofiler-sdk>`__ codebase, these APIs are wrapped and orchestrated in the ``aql``, ``hsa``, and ``thread_trace`` folders for queue interception, packet construction, and result iteration.

.. note::

The`ROCprofiler-SDK is migrating to these newer interfaces in ``aql_profile_v2.h``. You should use the APIs in ``aql_profile_v2.h`` to stay up-to-date.

Define parameters and profile
------------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_parameter_t params[] = {
{ HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_ATT_BUFFER_SIZE, 0x1000000} // 16 MB buffer
};

hsa_ven_amd_aqlprofile_profile_t profile = {
.agent = agent,
.type = HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_TRACE,
.events = nullptr,
.event_count = 0,
.parameters = params,
.parameter_count = sizeof(params)/sizeof(params[0]),
.output_buffer = {trace_ptr, trace_size},
.command_buffer = {cmd_ptr, cmd_size}
};


Generate SQTT start/stop packets
---------------------------------

.. code:: cpp

hsa_ext_amd_aql_pm4_packet_t sqtt_start_pkt, sqtt_stop_pkt;
hsa_ven_amd_aqlprofile_start(&profile, &sqtt_start_pkt);
hsa_ven_amd_aqlprofile_stop(&profile, &sqtt_stop_pkt);


Submit packets and run the kernel
---------------------------------

.. code:: cpp

queue->Submit(&sqtt_start_pkt);
queue->Submit(&kernel_pkt);
queue->Submit(&sqtt_stop_pkt);


Iterate and decode trace data
-----------------------------

.. code:: cpp

hsa_ven_amd_aqlprofile_iterate_data(
&profile,
[](hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* user_data) -> hsa_status_t {
if (info_type == HSA_VEN_AMD_AQLPROFILE_INFO_TRACE_DATA) {
// info_data->trace_data.ptr, info_data->trace_data.size
decode_trace(info_data->trace_data.ptr, info_data->trace_data.size);
}
return HSA_STATUS_SUCCESS;
},
nullptr
);


43 changes: 43 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. meta::
:description: AQLprofile is an open source library that enables advanced GPU profiling and tracing on AMD platforms.
:keywords: AQLprofile, ROCm, tool, Instinct, accelerator, AMD

.. _index:

************************
AQLprofile documentation
************************

The Architected Queuing Language profiling library (AQLprofile) is an
open source library that enables advanced GPU profiling and tracing on
AMD platforms.

This documentation provides a comprehensive overview of the AQLprofile library.

If you're new to AQLprofile, see :doc:`What is AQLprofile? <what-is-aqlprofile>`.

AQLprofile is open source and hosted at `AQLprofile on GitHub <https://github.com/ROCm/aqlprofile>`_.

.. grid:: 2
:gutter: 3

.. grid-item-card:: Install

* :doc:`Install AQLprofile <install/aqlprofile-install>`

.. grid-item-card:: Examples

* :doc:`Performance Monitor Control (PMC) workflow <examples/pmc-workflow>`
* :doc:`SQ Thread Trace (SQTT) workflow <examples/sqtt-workflow>`

.. grid-item-card:: Reference

* :doc:`Terms <reference/terms>`
* :doc:`APIs <reference/api-list>`


To contribute to the documentation, refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.

You can find licensing information on the
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
77 changes: 77 additions & 0 deletions docs/install/aqlprofile-install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
.. meta::
:description: AQLprofile installation process
:keywords: AQLprofile, ROCm, install

******************
Install AQLprofile
******************

Learn how to build AQLprofile with a script or with CMake, then install the library with a command.

Prerequisites
=============

Before you begin, ensure these tools and dependencies are installed:

* ROCm stack
* ``rocm-llvm-dev`` (required to build tests)


Build AQLprofile
================

You can build AQLprofile using either the provided build script (recommended for most users) or by manually invoking CMake for custom builds.


Option 1: Use the build script (Recommended)
--------------------------------------------

This configures and builds the project with the default settings:

.. code:: bash

./build.sh


Option 2: Use CMake for custom builds
-------------------------------------

For more control over the build process, you can set the CMake options manually:

.. code:: bash

# Set the CMAKE_PREFIX_PATH to point to hsa-runtime includes path and hsa-runtime library path
export CMAKE_PREFIX_PATH=<path to hsa-runtime includes>:<path to hsa-runtime library>
# For example, if ROCm is installed at /opt/rocm:
# export CMAKE_PREFIX_PATH=/opt/rocm/lib:/opt/rocm/include/hsa

export CMAKE_BUILD_TYPE=<debug|release> # release by default

cd /path/to/aqlprofile
mkdir build
cd build
cmake ..
make -j


Enable debug tracing (Optional)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To enable debug tracing, set this environment variable before running CMake:

.. code:: bash

export CMAKE_DEBUG_TRACE=1

This enables verbose debug output of the command packets while this library executes.


Install the AQLprofile libraries
================================

Once your build is successful, install the AQLprofile libraries with:

.. code:: bash

cd build
sudo make install
Loading