|
| 1 | +# AQLprofile: Architected Queuing Language Profiling Library |
| 2 | + |
| 3 | +AQLprofile is an open source library that enables advanced GPU profiling and tracing on AMD platforms. It works in conjunction with [rocprofiler-sdk](https://github.com/ROCm/rocprofiler-sdk) to support profiling methods such as performance counters (PMC) and SQ thread trace (SQTT). AQLprofile provides the foundational mechanisms for constructing AQL packets and managing profiling operations across multiple AMD GPU architecture families. |
| 4 | + |
| 5 | +### Background |
| 6 | + |
| 7 | +AQLprofile builds on concepts from the Heterogeneous System Architecture (HSA) and Architected Queuing Language (AQL), which define the foundations for GPU command processing and profiling on AMD platforms. For further reading: |
| 8 | + |
| 9 | +- [HSA Platform System Architecture Specification](http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf) |
| 10 | +- [HSA Runtime Programmer’s Reference Specification](http://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf) |
| 11 | + |
| 12 | +## Overview |
| 13 | + |
| 14 | +AQLprofile is a companion library to [rocprofiler-sdk](https://github.com/ROCm/rocprofiler-sdk). |
| 15 | +It provides the low-level mechanisms required by rocprofiler-sdk to enable advanced GPU profiling and tracing capabilities on AMD platforms. The development and evolution of AQLprofile are closely aligned with the needs of rocprofiler-sdk, ensuring compatibility and feature support for new GPU architectures and profiling requirements. |
| 16 | + |
| 17 | +AQLprofile abstracts the complexity of constructing and managing AQL (Architected Queuing Language) packets, command buffers, and register programming. These components are essential for orchestrating profiling operations such as performance counter collection and thread tracing. The library supports a range of AMD GPU architecture families such as GFX9, GFX10, GFX11, GFX12 and so on. |
| 18 | +It provides the necessary infrastructure for rocprofiler-sdk to interact with hardware-level profiling features. |
| 19 | + |
| 20 | +## Features |
| 21 | + |
| 22 | +- Profiling AQL packets for GPU workloads. |
| 23 | +- Performance counters (PMC) and SQ thread traces (SQTT). |
| 24 | +- Support for GFX9, GFX10, GFX11 and GFX12 architecture families. |
| 25 | +- Verbose tracing and error logging capabilities. |
| 26 | +- Thread trace binary data generated by AQLprofile can be decoded using [rocprof-trace-decoder](https://github.com/ROCm/rocprof-trace-decoder/releases). |
| 27 | + |
| 28 | +### Supported Architectures and Counter Blocks |
| 29 | + |
| 30 | +The AQLprofile library supports profiling and tracing GPU workloads across multiple architectures.<br> |
| 31 | +Below is a summary of the counter blocks supported for each architecture: |
| 32 | + |
| 33 | +| Counter Block Name | GFX9 | GFX908 | GFX90A | GFX942 | GFX10 | GFX11 | GFX12 | |
| 34 | +|-------------------------|-------------|----------------|----------------|----------------|-------------|------------- |------------| |
| 35 | +| ATC | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | |
| 36 | +| ATC_L2 | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 37 | +| CHA | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| 38 | +| CHC | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| 39 | +| CPC | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 40 | +| CPF | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 41 | +| CPG | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| 42 | +| GCEA | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 43 | +| GCR | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |
| 44 | +| GDS | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | |
| 45 | +| GL1A | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |
| 46 | +| GL1C | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |
| 47 | +| GL2A | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |
| 48 | +| GL2C | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | |
| 49 | +| GRBM | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 50 | +| GRBMH | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | |
| 51 | +| GRBM_SE | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 52 | +| GUS | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | |
| 53 | +| MC_VM_L2 | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 54 | +| RPB | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 55 | +| SDMA | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 56 | +| SPI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 57 | +| SQ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 58 | +| SQ_CS | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | |
| 59 | +| SX | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | |
| 60 | +| TA | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 61 | +| TCA | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | |
| 62 | +| TCC | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | |
| 63 | +| TCP | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | |
| 64 | +| TD | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | |
| 65 | + |
| 66 | +Legend: |
| 67 | +- ✅: Supported |
| 68 | +- ❌: Not Supported |
| 69 | + |
| 70 | +## Build and Installation |
| 71 | + |
| 72 | +### Prerequisites |
| 73 | + |
| 74 | +Ensure the following tools and dependencies are installed: |
| 75 | +- ROCm stack |
| 76 | +- `rocm-llvm-dev` (required to build tests) |
| 77 | + |
| 78 | +### Building AQLprofile |
| 79 | + |
| 80 | +You can build AQLprofile using either the provided build script (recommended for most users) or by manually invoking CMake for custom builds. |
| 81 | + |
| 82 | +#### Option 1: Using the Build Script (Recommended) |
| 83 | + |
| 84 | +This will configure and build the project with default settings: |
| 85 | + |
| 86 | +```bash |
| 87 | +./build.sh |
| 88 | +``` |
| 89 | + |
| 90 | +#### Option 2: Custom Build with CMake |
| 91 | + |
| 92 | +For more control over the build process, you can set CMake options manually: |
| 93 | + |
| 94 | +```bash |
| 95 | +# Set the CMAKE_PREFIX_PATH to point to hsa-runtime includes path and hsa-runtime library path |
| 96 | +export CMAKE_PREFIX_PATH=<path to hsa-runtime includes>:<path to hsa-runtime library> |
| 97 | +# For example, if ROCm is installed at /opt/rocm: |
| 98 | +# export CMAKE_PREFIX_PATH=/opt/rocm/lib:/opt/rocm/include/hsa |
| 99 | + |
| 100 | +export CMAKE_BUILD_TYPE=<debug|release> # release by default |
| 101 | + |
| 102 | +cd /path/to/aqlprofile |
| 103 | +mkdir build |
| 104 | +cd build |
| 105 | +cmake .. |
| 106 | +make -j |
| 107 | +``` |
| 108 | + |
| 109 | +### Debug Trace Mode (optional; for debugging only) |
| 110 | + |
| 111 | +To enable debug tracing, set the following environment variable before running CMake: |
| 112 | + |
| 113 | +```bash |
| 114 | +export CMAKE_DEBUG_TRACE=1 |
| 115 | +``` |
| 116 | + |
| 117 | +This enables verbose debug output of the command packets while this library executes |
| 118 | + |
| 119 | +### Installation |
| 120 | + |
| 121 | +After building, install the AQLprofile libraries with: |
| 122 | + |
| 123 | +```bash |
| 124 | +cd build |
| 125 | +sudo make install |
| 126 | +``` |
| 127 | + |
| 128 | +## Support |
| 129 | + |
| 130 | +For issues or questions, please report them in the GitHub Issues section or contact AMD support at <dl.ROCm-Profiler.support@amd.com>. |
| 131 | + |
| 132 | +## License |
| 133 | + |
| 134 | +AQLprofile is open source and distributed under the MIT License. See the LICENSE file for more details. |
0 commit comments