Skip to content

Latest commit

 

History

History
189 lines (130 loc) · 6.64 KB

File metadata and controls

189 lines (130 loc) · 6.64 KB

VTune Guide

This guide will attempt to give an overview of the basics of setting up and working with VTune.

Note: If you are using a developer VM and want to do remote profiling from Windows, follow the next section to install VTune. Otherwise, you can skip it and read the VTune documentation for a more standard installation.

Installing VTune on a Developer VM

If you plan to use a VM for running VTune analysis, this section will guide you through the steps for configuring remote sessions from a local Windows box.

Download VTune for Windows

Setup Linux Host for User-Mode Sampling

Hardware event-based sampling is not possible when running under a VM because we don't have access to hardware counters etc. If you need this, you may want to look at setup on a local machine.

# Temporarily disregard normal proxy config
# You may need to run these commands as root since sudo doesn't preserve
# environment variables _or_ you can try `sudo -E`.
export no_proxy=

# Download and install Intel GPG key
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

# Add Intel repos to APT
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

# Fetch package lists from repos
sudo apt update

# Install VTune and related tools to enable user-mode sampling
sudo apt install intel-oneapi-vtune
sudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r`

# Check that everything is working (expect to see failures for hardware based features)
/opt/intel/oneapi/vtune/latest/bin64/vtune-self-checker.sh

Standard Installation

Check VTune documentation here

Profiling

Prepare Application for Profiling

Application should be compiled with debug symbols. Using CMake, we can use the build type: "RelWithDebInfo". You can create a user preset, inheriting from vpuxDeveloper and simply changing the build type or pass the build type manually.

cmake -DOpenVINODeveloperPackage_DIR=../../openvino/build --preset vpuxProfiler
Example CMake User Preset

This preset builds the project in release mode with debug symbols enabled (-g).

{
  "name": "vpuxProfiler",
  "displayName": "vpuxProfiler",
  "description": "Build for use with a profiler",
  "binaryDir": "${sourceDir}/build-x86_64/Release",
  "inherits": [
    "vpuxDeveloper",
    "LinkerOptimization"
  ],
  "cacheVariables": {
    "CMAKE_CXX_FLAGS": "-g",
    "InferenceEngineDeveloperPackage_DIR": {
      "type": "FILEPATH",
      "value": "$env{OPENVINO_HOME}/build-x86_64/Release"
    },
    "CMAKE_BUILD_TYPE": {
      "type": "STRING",
      "value": "Release"
    },
    "ENABLE_DEVELOPER_BUILD": true,
    "ENABLE_CLANG_FORMAT": false,
    "ENABLE_VPUX_DOCS": false,
    "ENABLE_TESTS": true,
    "ENABLE_FUNCTIONAL_TESTS": true,
    "LIT_TESTS_USE_LINKS": true,
    "ENABLE_SPLIT_DWARF": true
  }
}

Start Profiling

Note: Some of the following steps may change if you're running locally instead.

From your Windows instance of VTune, choose "Configure Analysis":

image

Choose "Remote Linux (SSH)" and enter the details of your VM:

image

Specify application binary and command-line options:

image

Note: Make sure to use absolute paths for all command-line options!

Most options can be left as default but take note of the following options:

image

Select an appropriate run time for the model you're compiling, you don't have to be exact but lower number means higher sampling frequency. This can influence the consistency of results for smaller models a fair amount.

image

Finalisation mode will affect how quickly metrics are generated but effectively reduces the number of samples it pulls from. Generally you'll want to use "Full" in most cases but again depends on size of model.

Lastly, make sure to use user-mode sampling and "Hotspots" analysis:

image

Hotspots analysis highlights the most time consuming functions in your application.

Interpreting Results

When you've finished collecting samples, you'll get a lot of data thrown at you but the most useful tabs are "Caller/Callee" and "Top-down Tree".

image

using top-down tree, we can drill down into the call stack can find our passes:

image

We want to filter CPU % to just those in the pipeline, choose "Filter In by Selection":

image

You can also filter using more options at the bottom of the window:

image

With DeepLabv3, we can see which passes are taking up the bulk of the runtime:

image

Comparing Results

You can compare the results of two runs by selecting them in the left-hand side panel:

image

Select compare results:

image

You can then take a look at "Caller/Callee" (or another tab like Top-Down Tree that we looked at before):

image

Remembering to filter:

image

We'll notice that there are a few columns to compare results, mostly we're interested in CPU%:

image

Now we can compare the performance characteristics between two profiles:

image

References