hipDNN Execution Provider for ONNXRuntime

An out-of-tree Execution Provider for ONNXRuntime that uses AMD's hipDNN library for accelerated inference on AMD GPUs.

Status

Work in Progress - This is a prototype implementation.

Supported Operations

Conv2D - via hipDNN graph API
MatMul/Gemm - via hipDNN graph API

Optional Features

hipBLAS-LT support - Currently disabled. When re-enabled, provides an alternative MatMul/Gemm backend via hipBLAS-LT.
Torch-MLIR integration - Experimental IR-based compilation pipeline. Enable with HIPDNN_EP_ENABLE_TORCH_MLIR=ON.

Tested Dependency Versions

Dependency	Commit
TheRock	`9639502b`
IREE	`db9d11e4`

Prerequisites

CMake 3.20+
Ninja build system
HIP SDK (from TheRock)
hipDNN library (from TheRock)
hipBLAS-LT (optional, from TheRock) - alternative MatMul/Gemm backend (currently disabled)
ONNXRuntime (source and built library)
iree-compile (required by hipDNN backend for code generation)
Python 3 with onnx package (for test model generation)

Building

1. Set Environment Variables

export THEROCK_DIST="/path/to/TheRock/build/dist/rocm"
export ONNXRUNTIME_ROOT="/path/to/onnxruntime"

2. Configure and Build

cd hipDNNEP

# Configure
cmake --preset RelWithDebInfo

# Build
cmake --build --preset RelWithDebInfo

3. Run Tests

Tests require iree-compile in PATH. The recommended approach is to create local test presets in CMakeUserPresets.json (git-ignored) that set up the environment.

Example CMakeUserPresets.json:

{
  "version": 4,
  "testPresets": [
    {
      "name": "RelWithDebInfo-local",
      "inherits": "RelWithDebInfo",
      "environment": {
        "PATH": "/path/to/iree/build/tools:$penv{PATH}"
      }
    }
  ]
}

Then run tests with the local preset:

ctest --preset RelWithDebInfo-local

Alternatively, set PATH manually before running tests:

export PATH="/path/to/iree/build/tools:$PATH"
ctest --preset RelWithDebInfo

4. Build with Torch-MLIR (Optional)

For the experimental IR-based compilation pipeline:

# First build torch-mlir (one-time setup, see CLAUDE.md for details)
# Then:
cmake --preset RelWithDebInfo-MLIR
cmake --build --preset RelWithDebInfo-MLIR
ctest --preset RelWithDebInfo-MLIR-local

Usage

Loading the EP in ONNXRuntime

#include <onnxruntime_cxx_api.h>

int main() {
    Ort::InitApi(OrtGetApiBase()->GetApi(ORT_API_VERSION));
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example");

    // Register the hipDNN EP library
    OrtStatus* status = Ort::GetApi().RegisterExecutionProviderLibrary(
        env, "HipDNN", "/path/to/libhipdnn_ep.so");

    if (status != nullptr) {
        // Handle error
        Ort::GetApi().ReleaseStatus(status);
        return 1;
    }

    // Get available EP devices
    std::vector<Ort::ConstEpDevice> devices = env.GetEpDevices();

    // Find HipDNN device
    const OrtEpDevice* hipdnn_device = nullptr;
    for (const auto& device : devices) {
        if (device.EpName() == "HipDNN") {
            hipdnn_device = static_cast<const OrtEpDevice*>(device);
            break;
        }
    }

    // Create session options and append EP
    Ort::GetApi().SessionOptionsAppendExecutionProvider_V2(
        session_options, env, &hipdnn_device, 1, nullptr, nullptr, 0);

    // Create session
    Ort::Session session(env, "model.onnx", session_options);

    // Run inference
    // ...

    return 0;
}

Architecture

This EP uses the ONNXRuntime Plugin EP V2 system, which allows:

Building as a separate shared library
Dynamic loading at runtime
No modifications to ONNXRuntime source

Key Components

EP Factory (HipDNNEpFactory): Creates EP instances and manages device discovery
EP (HipDNNEp): Main execution provider, handles graph partitioning and compilation
Kernel (Kernel): Routes to appropriate backend (hipDNN, hipBLAS-LT, or Torch-MLIR)
HipDNNGraph: Builds hipDNN graph from ONNX nodes for Conv2D
BlasGraph: Builds hipBLAS-LT operations for MatMul/Gemm (currently disabled)
IRBuilder: Torch-MLIR IR generation (experimental, when enabled)
NodeComputeInfo: ORT callback interface for kernel lifecycle
Allocator (HipDeviceAllocator): HIP device memory allocation
Data Transfer (HipDataTransfer): CPU <-> GPU data copies

Backend Selection

The Kernel class automatically selects the appropriate backend:

Torch-MLIR path (if enabled): Converts ONNX to Torch-MLIR IR for compilation
hipDNN graph API: Used for Conv2D and MatMul/Gemm operations
hipBLAS-LT (currently disabled): Alternative backend for MatMul/Gemm

Torch-MLIR Offload Pipeline

When enabled, the Torch-MLIR path runs a 9-step compilation pipeline (buildOffloadPipeline in passes.cc):

onnx-to-torch — Convert onnx.* ops to torch.aten.* ops
CSE — Deduplicate constants and identical list constructs
offload — Outline supported aten ops into hipdnn.graph regions
canonicalize + CSE — Clean up dead ops, deduplicate cloned constants
graph-to-executable — Compile hipdnn.graph regions via iree-compile, replace with hipdnn.executable ops
backend-legalize — Lower torch types to builtin tensors, convert hipdnn.executable to DPS hipdnn.execute
empty-tensor-elimination — Fold tensor.empty into DPS destinations
one-shot-bufferize — Convert tensor program to memref program
finalize-memrefs — Promote returned memref.alloc to function arguments (caller provides output buffers)

The final output is a function with memref-typed arguments for all inputs and outputs, containing hipdnn.execute ops that reference pre-compiled graphs.

hipDNN Integration

hipDNN uses a graph-based execution model:

Build operation graph from ONNX nodes (conv_fprop, etc.)
Validate and create execution plans
Execute with variant pack (tensor uid -> device pointer mapping)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
cmake		cmake
docs		docs
include		include
scripts		scripts
src		src
test		test
third_party		third_party
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hipDNN Execution Provider for ONNXRuntime

Status

Supported Operations

Optional Features

Tested Dependency Versions

Prerequisites

Building

1. Set Environment Variables

2. Configure and Build

3. Run Tests

4. Build with Torch-MLIR (Optional)

Usage

Loading the EP in ONNXRuntime

Architecture

Key Components

Backend Selection

Torch-MLIR Offload Pipeline

hipDNN Integration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

MaheshRavishankar/hipDNNEP

Folders and files

Latest commit

History

Repository files navigation

hipDNN Execution Provider for ONNXRuntime

Status

Supported Operations

Optional Features

Tested Dependency Versions

Prerequisites

Building

1. Set Environment Variables

2. Configure and Build

3. Run Tests

4. Build with Torch-MLIR (Optional)

Usage

Loading the EP in ONNXRuntime

Architecture

Key Components

Backend Selection

Torch-MLIR Offload Pipeline

hipDNN Integration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages