Skip to content

Remove setdiff in kfolds#209

Merged
CarloLucibello merged 1 commit intoJuliaML:mainfrom
alexiscltrn:main
Aug 10, 2025
Merged

Remove setdiff in kfolds#209
CarloLucibello merged 1 commit intoJuliaML:mainfrom
alexiscltrn:main

Conversation

@alexiscltrn
Copy link
Contributor

Rather than using setdiff to compute training indices (which can allocate large vectors on big datasets), compute them analytically from fold sizes and offsets. This reduces memory usage and improves performance.

kfolds with setdiff

julia> @benchmark kfolds(Int(1e5), 5) seconds = 600
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  22.937 ms  272.079 ms  ┊ GC (min  max): 0.00%  80.58%
 Time  (median):     25.023 ms               ┊ GC (median):    3.32%
 Time  (mean ± σ):   26.898 ms ±   9.199 ms  ┊ GC (mean ± σ):  5.33% ±  5.44%

  ▃▄█▇▆▆▅▄▄▄▄▃▂▂▂▁▁▁▁                                          ▂
  ████████████████████████▇▇▇▇▆▆▆▅▆▆▆▄▆▆▅▅▅▆▄▆▆▅▁▅▅▅▅▁▅▄▅▃▃▄▅▄ █
  22.9 ms       Histogram: log(frequency) by time      57.2 ms <

 Memory estimate: 20.39 MiB, allocs estimate: 130.

kfolds without setdiff

julia> @benchmark kfolds(Int(1e5), 5) seconds = 600
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  235.259 μs    7.080 ms  ┊ GC (min  max):  0.00%  88.45%
 Time  (median):     335.147 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   454.221 μs ± 296.795 μs  ┊ GC (mean ± σ):  15.21% ± 17.02%

     ▆█▇▅▃▃▂▂▂▂▁                                ▁▁▁ ▁           ▁
  ▂▂███████████████▇█▇▆▆▆▆▇▆▆▄▅▆▇▅▆▇▇▇▇▇▇▇████████████▇▇▇▇▇▇▇▇▆ █
  235 μs        Histogram: log(frequency) by time       1.43 ms <

 Memory estimate: 3.05 MiB, allocs estimate: 25.

Rather than using `setdiff` to compute training indices (which can allocate
large vectors on big datasets), compute them analytically from fold sizes and
offsets. This reduces memory usage and improves performance.
@CarloLucibello CarloLucibello merged commit 9207b0b into JuliaML:main Aug 10, 2025
6 of 7 checks passed
@CarloLucibello
Copy link
Member

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants