v1.0.0

Release 1.0.0 marks a major update for MatX. 1.0.0 is the first version to require C++20 support for both the CUDA and host compilers. As a result, CUDA versions lower than 12.2.1 are not supported.

Among the major release highlights are:

JIT Support
CUDA JIT support via a new CUDAJitExecutor. When used, this executor makes a second pass at the compilation and caches the resulting kernel to be used in the future. JIT allows MatX to convert many runtime parameters into compile-time parameters, thus reducing the computations needed in the kernel. It also optionally enables kernel fusion support of the NVIDIA MathDx libraries. When enabled, MatX can potentially fuse FFT and GEMM operations into other arithmetic expressions if certain criteria are met. Only FFT and BLAS fusion are supported now, but other MathDx libraries will be added in the future. For more information, see the docs.
Logging
Full logging support to stdout or to a file is supported. Logging is useful for seeing which code path MatX is taking, and dumping verbose information about each function. Note that logging requires the header, which is not available in all C++20 compilers.
Documentation
Added tag showing which version of MatX each operator was added in
Compile-time Properties
Specify compile-time properties on an operator for fine-grainer control of operation. For example, change the accumulation type of an operator.

Full Changelog

CUDA JIT Support by @cliffburdick in #1071
Add unsafe aliased memory checking system by @cliffburdick in #1079
Add comprehensive logging system and exception disabling support by @cliffburdick in #1080
Use new python dlpack interface, fixing warnings by @simonbyrne in #1082
Replaced all uses of SFINAE with concepts for better error messages by @cliffburdick in #1081
Added JIT capabilities into all operators except transform operators. by @cliffburdick in #1085
Clang/nvc++ fixes by @cliffburdick in #1083
Add -lineinfo/--extended-lambda to the MatX interface target by @tbensonatl in #1087
Remove extraneous lambda capture by @cliffburdick in #1086
Use native nvcc flag when architectures aren't specified by @cliffburdick in #1091
Remove extra this pointer from frexp lambda by @cliffburdick in #1090
Use cudaMemcpyAsync rather than kernel when possible by @cliffburdick in #1088
Add helper functions to clear MatX caches and allocations by @tbensonatl in #1092
Use the underlying memory pointer to determine where memory resides in ToDlPack by @dylan-eustice in #1093
Handle argmin/argmax tuple accumulators in CUB by @Aminsed in #1096
Most non-transform operators working with JIT by @cliffburdick in #1094
Add include for cinttypes in print.h by @cliffburdick in #1099
Add MATX_EN_NVTIFF option by @tmartin-gh in #1101
Changed cudaExecutor to be const& by @cliffburdick in #1104
Use cuda::std::accumulate in tensor.h by @cliffburdick in #1102
Disable if compiler doesn't support it by @cliffburdick in #1109
Use cuda::std::tuple instead of thrust::tuple by @miscco in #1110
Add SAR backprojection transform by @tbensonatl in #1108
Change to Rank() on Type by @cliffburdick in #1112
Add helpers for compile-time operator properties by @tbensonatl in #1114
Added version each operator was added to docs by @cliffburdick in #1116
Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9 by @tbensonatl in #1120
Add fltflt division and fltflt operator overloads by @tbensonatl in #1121
Export pybind11 and remove visibility flag by @cliffburdick in #1111
Add FMA function for the fltflt data type by @tbensonatl in #1123
Tylera/gtc 2025 tutorials by @cliffburdick in #900
cuBLASDx support by @cliffburdick in #1122
Avoid warning about unused variables by @miscco in #1125
Add nvbench-based benchmarks for the fltflt data types by @tbensonatl in #1124
add link to ust blog post by @aartbik in #1126
Allow host-pinned pointers in SetVals() by @cliffburdick in #1128
Fix flags for aarch64 containers using FFTW by @cliffburdick in #1127
Add fltflt rounding and fmod functions by @tbensonatl in #1129
gcc16 warning patch on pybind by @cliffburdick in #1131
Update CCCL and deprecate old CTK by @cliffburdick in #1130
Added unwrap operator by @cliffburdick in #1133

New Contributors

@Aminsed made their first contribution in #1096

Full Changelog: v0.9.4...v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v1.0.0

Full Changelog

New Contributors

Contributors

Uh oh!