Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@
[submodule "source_third_party/khronos/vulkan-utilities"]
path = source_third_party/khronos/vulkan-utilities
url = https://github.com/KhronosGroup/Vulkan-Utility-Libraries/
[submodule "source_third_party/libGPUCounters"]
path = source_third_party/libGPUCounters
url = https://github.com/ARM-software/libGPUCounters.git
46 changes: 46 additions & 0 deletions layer_gpu_profile/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# SPDX-License-Identifier: MIT
# -----------------------------------------------------------------------------
# Copyright (c) 2024-2025 Arm Limited
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# -----------------------------------------------------------------------------

cmake_minimum_required(VERSION 3.19)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRES ON)

project(VkLayerGPUProfile VERSION 1.0.0)

# Common configuration
set(LGL_LOG_TAG "VkLayerGPUProfile")
set(LGL_CONFIG_TRACE 0)
set(LGL_CONFIG_LOG 1)

include(../source_common/compiler_helper.cmake)
include(../cmake/clang-tools.cmake)

# Build steps
add_subdirectory(../source_third_party/libGPUCounters source_third_party/libGPUCounters)

add_subdirectory(../source_common/comms source_common/comms)
add_subdirectory(../source_common/framework source_common/framework)
add_subdirectory(../source_common/trackers source_common/trackers)

add_subdirectory(source)
134 changes: 134 additions & 0 deletions layer_gpu_profile/README_LAYER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Layer: GPU Profile

This layer is a frame profiler that can capture per workload performance
counters for selected frames running on an Arm GPU.

## What devices are supported?

This layer requires Vulkan 1.0 and an Arm GPU because it uses an Arm-specific
counter sampling library.

## What data can be collected?

The layer serializes workloads for instrumented frames and injects counter
samples between them, allowing the layer to measure the hardware cost of
render passes, compute dispatches, transfers, etc.

The serialization is very invasive to wall-clock performance, due to removal
of pipeline overlap between workloads and additional GPU idle time waiting for
the layer to performs each performance counter sampling operation. This will
have an impact on the counter data being captured!

Derived counters that show queue and functional unit utilization as a
percentage of the overall "active" time of their parent block will report low
because of time spent refilling and then draining the GPU pipeline between
workloads. The overall _GPU Active Cycles_ counter is known to be unreliable,
because the serialization means that command stream setup and teardown costs
are not hidden in the shadow of surrounding work. We recommend using the
individual queue active cycles counters as the main measure of performance.

Note that any counter that measure direct work, such as architectural issue
cycles, or workload nouns, such as primitives or threads, should be unaffected
by the loss of pipelining.

Arm GPUs provide a wide range of performance counters covering many different
aspects of hardware performance. The layer will collect a standard set of
counters by default but, with source modification, can collect any of the
hardware counters and derived expressions supported by the
[libGPUCounters][LGC] library that Arm provides on GitHub.

[LGC]: https://github.com/ARM-software/libGPUCounters

### GPU clock frequency impact

The GPU idle time waiting for the CPU to take a counter sample can cause the
system DVFS power governor to decide that the GPU is not busy. In production
devices we commonly see that the GPU will be down-clocked during the
instrumented frame, which may have an impact on a subset of the available
performance counters.

When running on a pre-production device we recommend pinning CPU, GPU, and bus
clock speeds to avoid the performance instability.

## How do I use the layer?

### Prerequisites

Device setup steps:

* Ensure your Android device is in developer mode, with `adb` support enabled
in developer settings.
* Ensure the Android device is connected to your development workstation, and
visible to `adb` with an authorized debug connection.

Application setup steps:

* Build a debuggable build of your application and install it on the Android
device.

Tooling setup steps

* Install the Android platform tools and ensure `adb` is on your `PATH`
environment variable.
* Install the Android NDK and set the `ANDROID_NDK_HOME` environment variable
to its installation path.

### Layer build

Build the Profile layer for Android using the provided build script, or using
equivalent manual commands, from the `layer_gpu_profile` directory. For full
instructions see the _Build an Android layer_ and _Build a Linux layer_
sections in the [Build documentation](../docs/building.md).

### Running using the layer

You can configure a device to run a profile by using the Android helper utility
found in the root directory to configure the layer and manage the application.
You must enable the profile layer, and provide a configuration file to
parameterize it.

```sh
python3 lgl_android_install.py --layer layer_gpu_profile --config <your.json> --profile <out_dir>
```

The [`layer_config.json`](layer_config.json) file in this directory is a
template configuration file you can start from. It defaults to periodic
sampling every 600 frames, but you can modify this to suit your needs.

The `--profile` option specifies an output directory on the host to contain
the CSV files written by the tool. One CSV is written for each frame, each CSV
containing a table with one row per workload profiled in the frame, listed
in API submit order.

The Android helper utility contains many other options for configuring the
application under test and the capture process. For full instructions see the
[Running on Android documentation](../docs/running_android.md).

## Layer configuration

The current layer supports two `sampling_mode` values:

* `periodic_frame`: Sample every N frames.
* `frame_list`: Sample specific frames.

When `mode` is `periodic_frame` the integer value of the `periodic_frame` key
defines the frame sampling period. The integer value of the
`periodic_min_frame` key defines the first possible frame that could be
profiled, allowing profiles to skip over any loading frames. By default frame 0
is ignored.

When `mode` is `frame_list` the value of the `frame_list` key defines a list
of integers giving the specific frames to capture.

## Layer counters

The current layer uses a hard-coded set of performance counters defined in the
`Device` class constructor. If you wish to collect different counters you must
edit the [Device source](./source.device.cpp) and rebuild the layer.

Any counters that are specified but that are not available on the current GPU
will be ignored.

- - -

_Copyright © 2025, Arm Limited and contributors._
83 changes: 83 additions & 0 deletions layer_gpu_profile/android_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: MIT
# ----------------------------------------------------------------------------
# Copyright (c) 2024-2025 Arm Limited
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
# ----------------------------------------------------------------------------

# ----------------------------------------------------------------------------
# Configuration

# Exit immediately if any component command errors
set -e

BUILD_DIR_64=build_arm64
BUILD_DIR_PACK=build_package

# ----------------------------------------------------------------------------
# Process command line options
if [ "$#" -lt 1 ]; then
BUILD_TYPE=Release
else
BUILD_TYPE=$1
fi

# Process command line options
if [ "$#" -lt 2 ]; then
PACKAGE=0
else
PACKAGE=$2
fi

if [ "${PACKAGE}" -gt "0" ]; then
echo "Building a ${BUILD_TYPE} build with packaging"
else
echo "Building a ${BUILD_TYPE} build without packaging"
fi

# ----------------------------------------------------------------------------
# Build the 64-bit layer
mkdir -p ${BUILD_DIR_64}
pushd ${BUILD_DIR_64}

cmake \
-DCMAKE_SYSTEM_NAME=Android \
-DANDROID_PLATFORM=29 \
-DANDROID_ABI=arm64-v8a \
-DANDROID_TOOLCHAIN=clang \
-DANDROID_STL=c++_static \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake" \
-DCMAKE_WARN_DEPRECATED=OFF \
..

make -j16

popd

# ----------------------------------------------------------------------------
# Build the release package
if [ "${PACKAGE}" -gt "0" ]; then
# Setup the package directories
mkdir -p ${BUILD_DIR_PACK}/bin/android/arm64

# Install the 64-bit layer
cp ${BUILD_DIR_64}/source/*.so ${BUILD_DIR_PACK}/bin/android/arm64
fi
Loading