Skip to content

Release v3.6.0

Latest
Compare
Choose a tag to compare
@sa-faizal sa-faizal released this 21 Jul 21:37
· 148 commits to main since this release
af56a47

IREE Release v3.6.0

Compiler

Runtime

  • Added AMDGPU executable implementation with no-op cache, supporting verified, topology-wide loading and optimized kernel argument management for dispatches. (#21040)
  • Enabled auto torch input conversion triggered by function argument and result types to streamline input handling. (#21067)
  • Added rematerialize parallel ops support in the vector distribute pipeline to improve elementwise operation fusion. (#21073)
  • Introduced skeleton AMDGPU buffer handle and handle pool with external and transient buffer types supporting async allocations and device pointer resolution. (#21044)
  • Added support for group_any in iree_thread_affinity_t to assign threads to processor groups (e.g., NUMA nodes) instead of specific CPUs, aiding loosely coordinated thread pools. (#21089)
  • Added _base variants for all string view integer parsing functions, aligning with standard C APIs, and cleaned up HIP driver integer parsing code. (#21086)
  • Added iree_hal_amdgpu_system_t to manage shared HSA/topology/pools resources across physical devices in a logical device. (#21043)
  • Added device-side AMDGPU signal and queue utility headers derived from HSA spec and ROCR implementation. (#21042)
  • Implemented AMDGPU command buffer host-side and device-side, supporting recording, execution, and segmented command buffers with conditional branch groundwork. (#21123)
  • Added device->host service worker to mimic HSA/AQL queue semantics for hosting device communication, enabling future tooling compatibility. (#21094)
  • Added blit kernels and device-side enqueue support as initial implementations for copy operations, enabling CTS test passes. (#21057)
  • Added device-side tracing macros and ringbuffer trace buffer, laying groundwork for on-device tracing interoperable with host tooling like Tracy. (#21046)
  • Added AMDGPU semaphore allocation and pooling with host-side HAL support; device-side semaphore implementation and external semaphore imports are forthcoming. (#21201)
  • Enhanced loop fission pass (FissionTransferOpsInControlFlow) to support loops containing multiple transfer_read/write pairs, improving IR simplification with additional pattern application. (#21213)
  • Introduced IREE_ENABLE_RUNTIME_COVERAGE CMake mode to enable LLVM coverage for runtime libraries, test binaries, and tools, along with scripts to generate LCOV reports and IDE integration. (#21191)
  • Added iree-hal-drivers-amdgpu-tests target to enable building all AMDGPU HAL tests together easily via IDE actions. (#21389)
  • Implemented AMDGPU logical and physical devices with skeleton queues support, allowing multiple virtual queues per logical device and preparing for host- and device-side queue operations. (#21251)
  • Fixes and Stability Enhancements: (#21056, #21060, #21061, #21153, #21200)
  • Testing, Debuggability and Tooling: (#21046, #21191, #21389, #21094)

Change Log

Git History

What's Changed

New Contributors

Full Changelog: v3.5.0...v3.6.0