v0.3.0

vbharadwaj-bk released this 22 Jun 23:42

· 22 commits to main since this release

2dd1684

v0.3.0 (2025-06-22)

This release includes bugfixes and new opaque operations that compose with torch.compile for PT2.4-2.7. These will be unnecessary for PT2.8+.

Added:

Opaque variants of major operations via PyTorch custom_op declarations. These functions cannot be traced through and fail for JITScript / AOTI. They are shims that enable composition with torch.compile pre-PT2.8.
torch.load/torch.save functionality that, without torch.compile, is portable across GPU architectures.
.to() support to move TensorProduct and TensorProductConv between devices or change datatypes.

Fixed:

Gracefully records an error if libpython.so is not linked against C++ extension.
Resolves Kahan summation / various other bugs for HIP at O3 compiler-optimization level.
Removes multiple contexts spawning for GPU 0 when multiple devices are used.
Zero-initialized gradient buffers to prevent backward pass garbage accumulation.

Assets 2