Releases: athrva98/polyinfer
v0.2.0 IREE Vulkan GPU-Specific Targets
Release Notes
v0.2.0
IREE Backend Enhancements
Vulkan GPU-Specific Targets
Added comprehensive Vulkan GPU target support for optimized inference across multiple vendors:
import polyinfer as pi
# AMD RX 7000 series
model = pi.load("model.onnx", device="vulkan", vulkan_target="rdna3")
# NVIDIA RTX 40 series
model = pi.load("model.onnx", device="vulkan", vulkan_target="ada")
# Intel Arc
model = pi.load("model.onnx", device="vulkan", vulkan_target="arc")Supported GPU Targets:
| Vendor | Targets | GPUs |
|---|---|---|
| AMD | rdna3, rdna2, gfx1100, rx7900xtx |
RX 7000/6000 series |
| NVIDIA | ada, ampere, turing, sm_89, rtx4090 |
RTX 40/30/20 series |
| Intel | arc, arc_a770, arc_a750 |
Arc A-series |
| ARM | valhall, valhall4, mali_g715 |
Mali GPUs |
| Qualcomm | adreno |
Snapdragon mobile GPUs |
New Compilation Options
Added IREECompileOptions dataclass for fine-grained control over compilation:
from polyinfer.backends.iree import IREECompileOptions
opts = IREECompileOptions(
opt_level=3, # Optimization level (0-3)
strip_debug=True, # Strip debug info for smaller binaries
data_tiling=True, # Enable cache-friendly tiling
vulkan_target="rdna3", # GPU-specific optimizations
opset_version=17, # Upgrade ONNX opset
extra_flags=["--custom-flag"],
)Improved Error Handling
IREECompilationErrornow provides actionable suggestions for common failures- Proper exception chaining (
from e) for better debugging - Clear error messages for missing dependencies and unsupported operators
MLIR Export with Options
Enhanced MLIR export workflow with compilation options:
# Export with opset upgrade
backend = pi.get_backend("iree")
mlir = backend.emit_mlir("model.onnx", "model.mlir", opset_version=17)
# Compile with GPU-specific target
vmfb = backend.compile_mlir(
"model.mlir",
device="vulkan",
vulkan_target="rdna3",
opt_level=3,
data_tiling=True,
)New Exports
The following are now exported from polyinfer.backends.iree:
VulkanTarget- Dataclass for Vulkan GPU targetsVulkanGPUVendor- Enum for GPU vendors (AMD, NVIDIA, Intel, ARM, Qualcomm)VULKAN_TARGETS- Dict of all predefined GPU targetsIREECompileOptions- Compilation options dataclassIREECompilationError- Exception with actionable suggestions
Testing
Added comprehensive test coverage for new IREE features:
TestIREECompileOptions- Compile options and flag generationTestVulkanTargets- GPU target presets and resolutionTestIREEErrorHandling- Error handling and exception chainingTestMLIRExportWithOptions- MLIR export with compilation options
Bug Fixes
- Fixed nested if statements in Vulkan target resolution (ruff SIM102)
- Added proper exception chaining in 4 compilation error handlers (ruff B904)
- Fixed tempfile handling with context manager (ruff SIM115)
- Fixed OpenVINO backend name matching in tests
Dependencies
No new dependencies required. Existing IREE installation (iree-base-compiler[onnx], iree-base-runtime) provides all functionality.
v0.1.0 - First ever release
v0.1.0
PolyInfer is a unified ML inference library that automatically selects the fastest backend for your hardware.
Highlights
- Zero Configuration: Just
pip install polyinfer[nvidia]- CUDA, cuDNN, and TensorRT are auto-installed and configured - Auto Backend Selection: Automatically picks the fastest backend for your device
- 450+ FPS on YOLOv8n: TensorRT acceleration with no manual optimization needed
- Cross-Platform: Windows, Linux, WSL2, macOS, and Google Colab support
Features
Multi-Backend Support
- ONNX Runtime: CPU, CUDA, TensorRT EP, DirectML, ROCm, CoreML
- OpenVINO: Intel CPU, integrated GPU, NPU (AI Boost)
- TensorRT: Native TensorRT for maximum NVIDIA performance
- IREE: CPU, Vulkan, CUDA with MLIR export capability
Unified API
import polyinfer as pi
# Auto-select fastest backend
model = pi.load("model.onnx", device="cuda")
output = model(input_data)
# Compare all backends
pi.compare("model.onnx", input_shape=(1, 3, 640, 640))Quantization Support
- Dynamic/Static INT8, UINT8, INT4, FP16 quantization
- ONNX Runtime and OpenVINO (NNCF) backends
- TensorRT FP16/INT8 via load options
pi.quantize("model.onnx", "model_int8.onnx", method="dynamic")
pi.convert_to_fp16("model.onnx", "model_fp16.onnx")MLIR Export
Export models to MLIR for custom hardware targets and advanced optimizations:
mlir = pi.export_mlir("model.onnx", "model.mlir")
vmfb = pi.compile_mlir("model.mlir", device="vulkan")Structured Logging
Configurable verbosity levels for debugging and production use.
CLI Tools
polyinfer info # Show system info and backends
polyinfer benchmark model.onnx --device tensorrt
polyinfer run model.onnx --device cudaInstallation
pip install polyinfer[nvidia] # NVIDIA GPU
pip install polyinfer[intel] # Intel CPU/GPU/NPU
pip install polyinfer[amd] # AMD GPU (Windows DirectML)
pip install polyinfer[cpu] # CPU only
pip install polyinfer[all] # EverythingPerformance
YOLOv8n @ 640x640 (RTX 5060)
| Backend | Latency | FPS |
|---|---|---|
| TensorRT | 2.2 ms | 450 |
| CUDA | 6.6 ms | 151 |
| OpenVINO (CPU) | 16.2 ms | 62 |
| ONNX Runtime (CPU) | 22.6 ms | 44 |
Requirements
- Python 3.10, 3.11, or 3.12
- numpy >= 1.24
- onnx >= 1.14
What's Next
- PyTorch model direct loading
- More quantization options
- Additional backend integrations
Full documentation: https://github.com/athrva98/polyinfer