v0.3.0
v0.3.0 (2025-06-22)
This release includes bugfixes and new opaque operations that compose with torch.compile for PT2.4-2.7. These will be unnecessary for PT2.8+.
Added:
- Opaque variants of major operations via PyTorch
custom_opdeclarations. These functions cannot be traced through and fail for JITScript / AOTI. They are shims that enable composition withtorch.compilepre-PT2.8. torch.load/torch.savefunctionality that, withouttorch.compile, is portable across GPU architectures..to()support to moveTensorProductandTensorProductConvbetween devices or change datatypes.
Fixed:
- Gracefully records an error if
libpython.sois not linked against C++ extension. - Resolves Kahan summation / various other bugs for HIP at O3 compiler-optimization level.
- Removes multiple contexts spawning for GPU 0 when multiple devices are used.
- Zero-initialized gradient buffers to prevent backward pass garbage accumulation.