v1.0.0
Release 1.0.0 marks a major update for MatX. 1.0.0 is the first version to require C++20 support for both the CUDA and host compilers. As a result, CUDA versions lower than 12.2.1 are not supported.
Among the major release highlights are:
- JIT Support
CUDA JIT support via a new CUDAJitExecutor. When used, this executor makes a second pass at the compilation and caches the resulting kernel to be used in the future. JIT allows MatX to convert many runtime parameters into compile-time parameters, thus reducing the computations needed in the kernel. It also optionally enables kernel fusion support of the NVIDIA MathDx libraries. When enabled, MatX can potentially fuse FFT and GEMM operations into other arithmetic expressions if certain criteria are met. Only FFT and BLAS fusion are supported now, but other MathDx libraries will be added in the future. For more information, see the docs. - Logging
Full logging support to stdout or to a file is supported. Logging is useful for seeing which code path MatX is taking, and dumping verbose information about each function. Note that logging requires the header, which is not available in all C++20 compilers. - Documentation
Added tag showing which version of MatX each operator was added in - Compile-time Properties
Specify compile-time properties on an operator for fine-grainer control of operation. For example, change the accumulation type of an operator.
Full Changelog
- CUDA JIT Support by @cliffburdick in #1071
- Add unsafe aliased memory checking system by @cliffburdick in #1079
- Add comprehensive logging system and exception disabling support by @cliffburdick in #1080
- Use new python dlpack interface, fixing warnings by @simonbyrne in #1082
- Replaced all uses of SFINAE with concepts for better error messages by @cliffburdick in #1081
- Added JIT capabilities into all operators except transform operators. by @cliffburdick in #1085
- Clang/nvc++ fixes by @cliffburdick in #1083
- Add -lineinfo/--extended-lambda to the MatX interface target by @tbensonatl in #1087
- Remove extraneous lambda capture by @cliffburdick in #1086
- Use native nvcc flag when architectures aren't specified by @cliffburdick in #1091
- Remove extra
thispointer from frexp lambda by @cliffburdick in #1090 - Use cudaMemcpyAsync rather than kernel when possible by @cliffburdick in #1088
- Add helper functions to clear MatX caches and allocations by @tbensonatl in #1092
- Use the underlying memory pointer to determine where memory resides in ToDlPack by @dylan-eustice in #1093
- Handle argmin/argmax tuple accumulators in CUB by @Aminsed in #1096
- Most non-transform operators working with JIT by @cliffburdick in #1094
- Add include for cinttypes in print.h by @cliffburdick in #1099
- Add MATX_EN_NVTIFF option by @tmartin-gh in #1101
- Changed cudaExecutor to be const& by @cliffburdick in #1104
- Use cuda::std::accumulate in tensor.h by @cliffburdick in #1102
- Disable if compiler doesn't support it by @cliffburdick in #1109
- Use
cuda::std::tupleinstead ofthrust::tupleby @miscco in #1110 - Add SAR backprojection transform by @tbensonatl in #1108
- Change to Rank() on Type by @cliffburdick in #1112
- Add helpers for compile-time operator properties by @tbensonatl in #1114
- Added version each operator was added to docs by @cliffburdick in #1116
- Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9 by @tbensonatl in #1120
- Add fltflt division and fltflt operator overloads by @tbensonatl in #1121
- Export pybind11 and remove visibility flag by @cliffburdick in #1111
- Add FMA function for the fltflt data type by @tbensonatl in #1123
- Tylera/gtc 2025 tutorials by @cliffburdick in #900
- cuBLASDx support by @cliffburdick in #1122
- Avoid warning about unused variables by @miscco in #1125
- Add nvbench-based benchmarks for the fltflt data types by @tbensonatl in #1124
- add link to ust blog post by @aartbik in #1126
- Allow host-pinned pointers in SetVals() by @cliffburdick in #1128
- Fix flags for aarch64 containers using FFTW by @cliffburdick in #1127
- Add fltflt rounding and fmod functions by @tbensonatl in #1129
- gcc16 warning patch on pybind by @cliffburdick in #1131
- Update CCCL and deprecate old CTK by @cliffburdick in #1130
- Added unwrap operator by @cliffburdick in #1133
New Contributors
Full Changelog: v0.9.4...v1.0.0