Skip to content

Sampling from Distributions.product_distribution allocates (a lot) #1954

@JADekker

Description

@JADekker

Hi, I noticed that when sampling a matrix of values from a Distributions.product_distribution, a lot of allocations are made (in my example, more allocations than sampled values!) if I call rand(..., N) directly on a product_distributions with different distributions. However, my naive implementations (rand_quick, rand_quick2) beat the default implementation by roughly a factor 10 on my device and cut allocations to a constant amount in the matrix case. rand_quick also provides a modest speed-up on my device in the vector case. Am I misusing product_distribution here, or is there room for improvement here? See this discourse thread for my initial question that led to this issue: https://discourse.julialang.org/t/product-distribution-allocates-a-lot/126771/2.

using Distributions, BenchmarkTools, Random
Random.seed!(42)

function rand_quick(d::Product) 
    N_out = Vector{Float64}(undef, length(d.v))
    for (i, dist) in enumerate(d.v)
        N_out[i] = rand(dist)
    end
    return N_out
end
rand_quick2(d::Product) = [rand(dist) for dist in d.v]
function rand_quick(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        N_out[:, i] .= rand(dist, N)
    end
    return permutedims(N_out)
end
rand_quick2(d::Product, N::Int64) = vcat([rand(dist, N)' for dist in d.v]...)

function run_tests(N::Int64)
    v_dists = [Exponential(1.0), Normal(0.0, 1.0), LogNormal(0.0, 1.0)]
    d = product_distribution(v_dists)
    display("Benchmarking single draws")
    display(@benchmark rand($d))
    display(@benchmark rand_quick($d))
    display(@benchmark rand_quick2($d))
    display("Benchmarking samples of size $N")
    display(@benchmark rand($d, $N))
    display(@benchmark rand_quick($d, $N))
    display(@benchmark rand_quick2($d, $N))
    return nothing
end

run_tests(1_000_000)

gives

"Benchmarking single draws"
BenchmarkTools.Trial: 10000 samples with 836 evaluations per sample.
 Range (min … max):  143.840 ns …  67.004 μs  ┊ GC (min … max): 0.00% … 99.73%
 Time  (median):     158.044 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   168.521 ns ± 669.044 ns  ┊ GC (mean ± σ):  4.36% ±  2.43%

       ▄▇▇█▇▃▂▁ ▁▁▃▄▄                                            
  ▁▂▃▆▇███████████████▆▄▄▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▁▂▁▁▁▁▁▁▁▁▁ ▃
  144 ns           Histogram: frequency by time          210 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.
BenchmarkTools.Trial: 10000 samples with 858 evaluations per sample.
 Range (min … max):  136.995 ns …  69.691 μs  ┊ GC (min … max): 0.00% … 99.72%
 Time  (median):     151.759 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   170.850 ns ± 724.923 ns  ┊ GC (mean ± σ):  4.52% ±  2.61%

     █▆▄▁ ▂▆▁                                                    
  ▁▂█████▇███▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  137 ns           Histogram: frequency by time          253 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.
BenchmarkTools.Trial: 10000 samples with 545 evaluations per sample.
 Range (min … max):  209.939 ns …  96.199 μs  ┊ GC (min … max): 0.00% … 99.72%
 Time  (median):     224.007 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   238.015 ns ± 960.152 ns  ┊ GC (mean ± σ):  4.21% ±  1.78%

   ▁▄▃▅▇▇▇▇███▇▆▆▆▆▄▃▃▃▃▂▂▂▁▁▁▁▁▁▁▁▁      ▁                     ▃
  ▆████████████████████████████████████████████▇▇█▇▇▇▇▇█▇▆▆▇▆▇▆ █
  210 ns        Histogram: log(frequency) by time        290 ns <

 Memory estimate: 160 bytes, allocs estimate: 6.
"Benchmarking samples of size 1000000"
BenchmarkTools.Trial: 31 samples with 1 evaluation per sample.
 Range (min … max):  149.399 ms … 194.403 ms  ┊ GC (min … max): 1.35% … 25.34%
 Time  (median):     156.147 ms               ┊ GC (median):    4.97%
 Time  (mean ± σ):   161.630 ms ±   9.801 ms  ┊ GC (mean ± σ):  9.54% ±  5.40%

    ▁▄  ▄▄▁              ▄ ▁▁ █                                  
  ▆▁██▆▁███▆▁▁▁▁▁▁▁▁▁▁▁▆▁█▆██▆█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  149 ms           Histogram: frequency by time          194 ms <

 Memory estimate: 114.43 MiB, allocs estimate: 5999491.
BenchmarkTools.Trial: 323 samples with 1 evaluation per sample.
 Range (min … max):  14.128 ms … 63.315 ms  ┊ GC (min … max): 4.72% … 77.16%
 Time  (median):     15.284 ms              ┊ GC (median):    6.86%
 Time  (mean ± σ):   15.520 ms ±  2.754 ms  ┊ GC (mean ± σ):  7.90% ±  4.68%

        ▁▁ ▁▄▃▃▄▄▄▄▄▇▃█▁▅▄▁▄ ▃▆ ▁                              
  ▃▁▁▅▃▇██▆████████████████████▇█▅▆▅█▄▃▆▆▃▁▃▄▅▁▁▃▆▃▄▃▄▁▃▁▃▃▃▃ ▅
  14.1 ms         Histogram: frequency by time        17.4 ms <

 Memory estimate: 68.67 MiB, allocs estimate: 22.
BenchmarkTools.Trial: 345 samples with 1 evaluation per sample.
 Range (min … max):  13.138 ms … 40.207 ms  ┊ GC (min … max): 0.00% … 66.52%
 Time  (median):     14.345 ms              ┊ GC (median):    6.93%
 Time  (mean ± σ):   14.497 ms ±  1.854 ms  ┊ GC (mean ± σ):  7.89% ±  4.86%

                  ▂   ▇█▄▄ ▄▁▇▅▆▇▄▂▂▄ ▂ ▁                      
  ▃▃▁▁▁▁▁▁▁▃▁▃▅▃▄███▇▇███████████████▆█▄█▆▅▃▅▄▃▃▃▃▃▃▃▃▃▁▁▁▁▁▃ ▄
  13.1 ms         Histogram: frequency by time        15.7 ms <

 Memory estimate: 45.78 MiB, allocs estimate: 20.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions