I've possibly encountered a memory leak in cuTENSOR-based contractions. The memory allocated in each call does not seem to be fully freed even after explicit calls to the garbage collector. The same, or at least a similar amount of memory appears to accumulate on the both the CPU and GPU (seen with calls of CUDA.reclaim() + CUDA.used_memory()). I've had no luck playing around with where input/output/intermediate tensors are stored via CUDAAllocator.
I've constructed and attached a minimal working example of the issue. It may be worth mentioning that this leak was found during transition to TensorOperations v5.2.0 with cuTENSOR v2 from TensorOperations v3.2.5 & cuTENSOR v1.
Minimal Working Example
Output