Allocation goes up with integration time when using CUDA for mesolve

### Bug Description

The allocation goes up with integration time when using CUDA for mesolve.

### Code to Reproduce the Bug

```julia
using QuantumToolbox
using CUDA
CUDA.allowscalar(false) # Avoid unexpected scalar indexing
using BenchmarkTools 
##
N = 20 # cutoff of the Hilbert space dimension
ω = 1.0 # frequency of the harmonic oscillator
γ = 0.1 # damping rate

a_gpu = cu(destroy(N)) # The only difference in the code is the cu() function

H_gpu = ω * a_gpu' * a_gpu

ψ0_gpu = cu(fock(N, 3))

c_ops = [sqrt(γ) * a_gpu]
e_ops = [a_gpu' * a_gpu]

tlist = [0,100] # time list
sol = mesolve(H_gpu, ψ0_gpu, tlist, c_ops, e_ops = e_ops)
@benchmark mesolve($H_gpu, $ψ0_gpu, $tlist, $c_ops, e_ops = $e_ops,progress_bar=$Val(false))
```

### Code Output

```shell
BenchmarkTools.Trial: 34 samples with 1 evaluation per sample.
 Range (min … max):  117.296 ms … 212.421 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     144.434 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   149.778 ms ±  18.603 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                █▄▄▁  █              ▁                           
  ▆▁▆▁▁▁▁▆▁▆▁▆▁▆████▆▁█▆▆▁▆▁▁▁▁▁▆▁▆▆▁█▁▆▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  117 ms           Histogram: frequency by time          212 ms <

 Memory estimate: 2.93 MiB, allocs estimate: 125233.
```

### Expected Behaviour

The allocation should not go up with integration time when there is no additional saving.

### Your Environment

```shell
julia> QuantumToolbox.about()

 QuantumToolbox.jl: Quantum Toolbox in Julia
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
Copyright © QuTiP team 2022 and later.
Current admin team:
    Alberto Mercurio and Yi-Te Huang

Package information:
====================================
Julia              Ver. 1.11.6
QuantumToolbox     Ver. 0.34.0
SciMLOperators     Ver. 1.5.0
LinearSolve        Ver. 3.26.0
OrdinaryDiffEqCore Ver. 1.30.0

System information:
====================================
OS       : Linux (x86_64-linux-gnu)
CPU      : 12 × AMD Ryzen 5 2600X Six-Core Processor
Memory   : 7.726 GB
WORD_SIZE: 64
LIBM     : libopenlibm
LLVM     : libLLVM-16.0.6 (ORCJIT, znver1)
BLAS     : libopenblas64_.so (ilp64)
Threads  : 1 (on 12 virtual cores)

julia> CUDA.versioninfo()
CUDA toolchain: 
- runtime 13.0, local installation
- driver 580.97.0 for 13.0
- compiler 13.0

CUDA libraries: 
- CUBLAS: 13.0.0
- CURAND: 10.4.0
- CUFFT: 12.0.0
- CUSOLVER: 12.0.3
- CUSPARSE: 12.6.2
- CUPTI: 2025.3.0 (API 130000.0.0)
- NVML: 13.0.0+580.76.4

Julia packages: 
- CUDA: 5.8.3
- CUDA_Driver_jll: 13.0.0+0
- CUDA_Compiler_jll: 0.2.0+1
- CUDA_Runtime_jll: 0.19.0+0
- CUDA_Runtime_Discovery: 1.0.0

Toolchain:
- Julia: 1.11.6
- LLVM: 16.0.6

Preferences:
- CUDA_Runtime_jll.local: true

1 device:
  0: NVIDIA GeForce RTX 2060 (sm_75, 1.769 GiB / 6.000 GiB available)
```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allocation goes up with integration time when using CUDA for mesolve #533

Bug Description

Code to Reproduce the Bug

Code Output

Expected Behaviour

Your Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allocation goes up with integration time when using CUDA for mesolve #533

Description

Bug Description

Code to Reproduce the Bug

Code Output

Expected Behaviour

Your Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions