Skip to content

Releases: athrva98/polyinfer

v0.2.0 IREE Vulkan GPU-Specific Targets

26 Dec 19:50
e7316a8

Choose a tag to compare

Release Notes

v0.2.0

IREE Backend Enhancements

Vulkan GPU-Specific Targets

Added comprehensive Vulkan GPU target support for optimized inference across multiple vendors:

import polyinfer as pi

# AMD RX 7000 series
model = pi.load("model.onnx", device="vulkan", vulkan_target="rdna3")

# NVIDIA RTX 40 series
model = pi.load("model.onnx", device="vulkan", vulkan_target="ada")

# Intel Arc
model = pi.load("model.onnx", device="vulkan", vulkan_target="arc")

Supported GPU Targets:

Vendor Targets GPUs
AMD rdna3, rdna2, gfx1100, rx7900xtx RX 7000/6000 series
NVIDIA ada, ampere, turing, sm_89, rtx4090 RTX 40/30/20 series
Intel arc, arc_a770, arc_a750 Arc A-series
ARM valhall, valhall4, mali_g715 Mali GPUs
Qualcomm adreno Snapdragon mobile GPUs

New Compilation Options

Added IREECompileOptions dataclass for fine-grained control over compilation:

from polyinfer.backends.iree import IREECompileOptions

opts = IREECompileOptions(
    opt_level=3,           # Optimization level (0-3)
    strip_debug=True,      # Strip debug info for smaller binaries
    data_tiling=True,      # Enable cache-friendly tiling
    vulkan_target="rdna3", # GPU-specific optimizations
    opset_version=17,      # Upgrade ONNX opset
    extra_flags=["--custom-flag"],
)

Improved Error Handling

  • IREECompilationError now provides actionable suggestions for common failures
  • Proper exception chaining (from e) for better debugging
  • Clear error messages for missing dependencies and unsupported operators

MLIR Export with Options

Enhanced MLIR export workflow with compilation options:

# Export with opset upgrade
backend = pi.get_backend("iree")
mlir = backend.emit_mlir("model.onnx", "model.mlir", opset_version=17)

# Compile with GPU-specific target
vmfb = backend.compile_mlir(
    "model.mlir",
    device="vulkan",
    vulkan_target="rdna3",
    opt_level=3,
    data_tiling=True,
)

New Exports

The following are now exported from polyinfer.backends.iree:

  • VulkanTarget - Dataclass for Vulkan GPU targets
  • VulkanGPUVendor - Enum for GPU vendors (AMD, NVIDIA, Intel, ARM, Qualcomm)
  • VULKAN_TARGETS - Dict of all predefined GPU targets
  • IREECompileOptions - Compilation options dataclass
  • IREECompilationError - Exception with actionable suggestions

Testing

Added comprehensive test coverage for new IREE features:

  • TestIREECompileOptions - Compile options and flag generation
  • TestVulkanTargets - GPU target presets and resolution
  • TestIREEErrorHandling - Error handling and exception chaining
  • TestMLIRExportWithOptions - MLIR export with compilation options

Bug Fixes

  • Fixed nested if statements in Vulkan target resolution (ruff SIM102)
  • Added proper exception chaining in 4 compilation error handlers (ruff B904)
  • Fixed tempfile handling with context manager (ruff SIM115)
  • Fixed OpenVINO backend name matching in tests

Dependencies

No new dependencies required. Existing IREE installation (iree-base-compiler[onnx], iree-base-runtime) provides all functionality.

v0.1.0 - First ever release

20 Dec 19:57
adb423c

Choose a tag to compare

v0.1.0

PolyInfer is a unified ML inference library that automatically selects the fastest backend for your hardware.

Highlights

  • Zero Configuration: Just pip install polyinfer[nvidia] - CUDA, cuDNN, and TensorRT are auto-installed and configured
  • Auto Backend Selection: Automatically picks the fastest backend for your device
  • 450+ FPS on YOLOv8n: TensorRT acceleration with no manual optimization needed
  • Cross-Platform: Windows, Linux, WSL2, macOS, and Google Colab support

Features

Multi-Backend Support

  • ONNX Runtime: CPU, CUDA, TensorRT EP, DirectML, ROCm, CoreML
  • OpenVINO: Intel CPU, integrated GPU, NPU (AI Boost)
  • TensorRT: Native TensorRT for maximum NVIDIA performance
  • IREE: CPU, Vulkan, CUDA with MLIR export capability

Unified API

import polyinfer as pi

# Auto-select fastest backend
model = pi.load("model.onnx", device="cuda")
output = model(input_data)

# Compare all backends
pi.compare("model.onnx", input_shape=(1, 3, 640, 640))

Quantization Support

  • Dynamic/Static INT8, UINT8, INT4, FP16 quantization
  • ONNX Runtime and OpenVINO (NNCF) backends
  • TensorRT FP16/INT8 via load options
pi.quantize("model.onnx", "model_int8.onnx", method="dynamic")
pi.convert_to_fp16("model.onnx", "model_fp16.onnx")

MLIR Export

Export models to MLIR for custom hardware targets and advanced optimizations:

mlir = pi.export_mlir("model.onnx", "model.mlir")
vmfb = pi.compile_mlir("model.mlir", device="vulkan")

Structured Logging

Configurable verbosity levels for debugging and production use.

CLI Tools

polyinfer info           # Show system info and backends
polyinfer benchmark model.onnx --device tensorrt
polyinfer run model.onnx --device cuda

Installation

pip install polyinfer[nvidia]   # NVIDIA GPU
pip install polyinfer[intel]    # Intel CPU/GPU/NPU
pip install polyinfer[amd]      # AMD GPU (Windows DirectML)
pip install polyinfer[cpu]      # CPU only
pip install polyinfer[all]      # Everything

Performance

YOLOv8n @ 640x640 (RTX 5060)

Backend Latency FPS
TensorRT 2.2 ms 450
CUDA 6.6 ms 151
OpenVINO (CPU) 16.2 ms 62
ONNX Runtime (CPU) 22.6 ms 44

Requirements

  • Python 3.10, 3.11, or 3.12
  • numpy >= 1.24
  • onnx >= 1.14

What's Next

  • PyTorch model direct loading
  • More quantization options
  • Additional backend integrations

Full documentation: https://github.com/athrva98/polyinfer