VTune Guide

This guide will attempt to give an overview of the basics of setting up and working with VTune.

Note: If you are using a developer VM and want to do remote profiling from Windows, follow the next section to install VTune. Otherwise, you can skip it and read the VTune documentation for a more standard installation.

Installing VTune on a Developer VM

If you plan to use a VM for running VTune analysis, this section will guide you through the steps for configuring remote sessions from a local Windows box.

Download VTune for Windows

https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler-download.html

Setup Linux Host for User-Mode Sampling

Hardware event-based sampling is not possible when running under a VM because we don't have access to hardware counters etc. If you need this, you may want to look at setup on a local machine.

# Temporarily disregard normal proxy config
# You may need to run these commands as root since sudo doesn't preserve
# environment variables _or_ you can try `sudo -E`.
export no_proxy=

# Download and install Intel GPG key
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

# Add Intel repos to APT
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

# Fetch package lists from repos
sudo apt update

# Install VTune and related tools to enable user-mode sampling
sudo apt install intel-oneapi-vtune
sudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r`

# Check that everything is working (expect to see failures for hardware based features)
/opt/intel/oneapi/vtune/latest/bin64/vtune-self-checker.sh

Standard Installation

Check VTune documentation here

Profiling

Prepare Application for Profiling

https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/prepare-application.html

Application should be compiled with debug symbols. Using CMake, we can use the build type: "RelWithDebInfo". You can create a user preset, inheriting from vpuxDeveloper and simply changing the build type or pass the build type manually.

cmake -DOpenVINODeveloperPackage_DIR=../../openvino/build --preset vpuxProfiler

Example CMake User Preset

This preset builds the project in release mode with debug symbols enabled (-g).

{
  "name": "vpuxProfiler",
  "displayName": "vpuxProfiler",
  "description": "Build for use with a profiler",
  "binaryDir": "${sourceDir}/build-x86_64/Release",
  "inherits": [
    "vpuxDeveloper",
    "LinkerOptimization"
  ],
  "cacheVariables": {
    "CMAKE_CXX_FLAGS": "-g",
    "InferenceEngineDeveloperPackage_DIR": {
      "type": "FILEPATH",
      "value": "$env{OPENVINO_HOME}/build-x86_64/Release"
    },
    "CMAKE_BUILD_TYPE": {
      "type": "STRING",
      "value": "Release"
    },
    "ENABLE_DEVELOPER_BUILD": true,
    "ENABLE_CLANG_FORMAT": false,
    "ENABLE_VPUX_DOCS": false,
    "ENABLE_TESTS": true,
    "ENABLE_FUNCTIONAL_TESTS": true,
    "LIT_TESTS_USE_LINKS": true,
    "ENABLE_SPLIT_DWARF": true
  }
}

Start Profiling

https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/analyze-performance.html

Note: Some of the following steps may change if you're running locally instead.

From your Windows instance of VTune, choose "Configure Analysis":

Choose "Remote Linux (SSH)" and enter the details of your VM:

Specify application binary and command-line options:

Note: Make sure to use absolute paths for all command-line options!

Most options can be left as default but take note of the following options:

Select an appropriate run time for the model you're compiling, you don't have to be exact but lower number means higher sampling frequency. This can influence the consistency of results for smaller models a fair amount.

Finalisation mode will affect how quickly metrics are generated but effectively reduces the number of samples it pulls from. Generally you'll want to use "Full" in most cases but again depends on size of model.

Lastly, make sure to use user-mode sampling and "Hotspots" analysis: