Skip to content

Data Leak in DataLoader ? #211

@camilodlt

Description

@camilodlt

The MLUtils.DataLoader does not free up memory even if the object is garbage collected ? :

➜  tmp julia --project -t 16
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org/
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.6 (2025-07-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using MLUtils

julia> function random_data(n=1000)
           return [rand(Float32, 1000,1000, 10) for i in 1:n]
       end
random_data (generic function with 2 methods)

julia> function test(i)
           data = random_data(i*10)
           dl = DataLoader(data; batchsize = 100)
           nothing
       end
test (generic function with 1 method)

julia> for i in 1:100
           @info "RAM left : $(Sys.free_memory()/(2^20))"
           test(i*10)
           GC.gc(true)
           GC.gc()
       end
[ Info: RAM left : 108832.73046875
[ Info: RAM left : 105014.9375
[ Info: RAM left : 100209.421875
[ Info: RAM left : 96386.6953125
[ Info: RAM left : 92556.125
[ Info: RAM left : 88755.32421875
[ Info: RAM left : 84891.40625
[ Info: RAM left : 82037.58984375
[ Info: RAM left : 77238.71875
[ Info: RAM left : 74415.28515625
[ Info: RAM left : 69477.1484375
[ Info: RAM left : 65713.12109375
[ Info: RAM left : 62047.74609375
[ Info: RAM left : 58244.33203125
[ Info: RAM left : 53576.16015625
[ Info: RAM left : 50848.9375

Running the GC after the loop can free all the ram.
Running the loop without the dataloader frees at every iteration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions