-
Notifications
You must be signed in to change notification settings - Fork 430
Description
Hi, I noticed that when sampling a matrix of values from a Distributions.product_distribution
, a lot of allocations are made (in my example, more allocations than sampled values!) if I call rand(..., N)
directly on a product_distributions with different distributions. However, my naive implementations (rand_quick
, rand_quick2
) beat the default implementation by roughly a factor 10 on my device and cut allocations to a constant amount in the matrix case. rand_quick
also provides a modest speed-up on my device in the vector case. Am I misusing product_distribution
here, or is there room for improvement here? See this discourse thread for my initial question that led to this issue: https://discourse.julialang.org/t/product-distribution-allocates-a-lot/126771/2.
using Distributions, BenchmarkTools, Random
Random.seed!(42)
function rand_quick(d::Product)
N_out = Vector{Float64}(undef, length(d.v))
for (i, dist) in enumerate(d.v)
N_out[i] = rand(dist)
end
return N_out
end
rand_quick2(d::Product) = [rand(dist) for dist in d.v]
function rand_quick(d::Product, N::Int64)
N_out = Matrix{Float64}(undef, N, length(d.v))
for (i, dist) in enumerate(d.v)
N_out[:, i] .= rand(dist, N)
end
return permutedims(N_out)
end
rand_quick2(d::Product, N::Int64) = vcat([rand(dist, N)' for dist in d.v]...)
function run_tests(N::Int64)
v_dists = [Exponential(1.0), Normal(0.0, 1.0), LogNormal(0.0, 1.0)]
d = product_distribution(v_dists)
display("Benchmarking single draws")
display(@benchmark rand($d))
display(@benchmark rand_quick($d))
display(@benchmark rand_quick2($d))
display("Benchmarking samples of size $N")
display(@benchmark rand($d, $N))
display(@benchmark rand_quick($d, $N))
display(@benchmark rand_quick2($d, $N))
return nothing
end
run_tests(1_000_000)
gives
"Benchmarking single draws"
BenchmarkTools.Trial: 10000 samples with 836 evaluations per sample.
Range (min … max): 143.840 ns … 67.004 μs ┊ GC (min … max): 0.00% … 99.73%
Time (median): 158.044 ns ┊ GC (median): 0.00%
Time (mean ± σ): 168.521 ns ± 669.044 ns ┊ GC (mean ± σ): 4.36% ± 2.43%
▄▇▇█▇▃▂▁ ▁▁▃▄▄
▁▂▃▆▇███████████████▆▄▄▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▂▁▂▁▁▁▁▁▁▁▁▁ ▃
144 ns Histogram: frequency by time 210 ns <
Memory estimate: 128 bytes, allocs estimate: 4.
BenchmarkTools.Trial: 10000 samples with 858 evaluations per sample.
Range (min … max): 136.995 ns … 69.691 μs ┊ GC (min … max): 0.00% … 99.72%
Time (median): 151.759 ns ┊ GC (median): 0.00%
Time (mean ± σ): 170.850 ns ± 724.923 ns ┊ GC (mean ± σ): 4.52% ± 2.61%
█▆▄▁ ▂▆▁
▁▂█████▇███▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
137 ns Histogram: frequency by time 253 ns <
Memory estimate: 128 bytes, allocs estimate: 4.
BenchmarkTools.Trial: 10000 samples with 545 evaluations per sample.
Range (min … max): 209.939 ns … 96.199 μs ┊ GC (min … max): 0.00% … 99.72%
Time (median): 224.007 ns ┊ GC (median): 0.00%
Time (mean ± σ): 238.015 ns ± 960.152 ns ┊ GC (mean ± σ): 4.21% ± 1.78%
▁▄▃▅▇▇▇▇███▇▆▆▆▆▄▃▃▃▃▂▂▂▁▁▁▁▁▁▁▁▁ ▁ ▃
▆████████████████████████████████████████████▇▇█▇▇▇▇▇█▇▆▆▇▆▇▆ █
210 ns Histogram: log(frequency) by time 290 ns <
Memory estimate: 160 bytes, allocs estimate: 6.
"Benchmarking samples of size 1000000"
BenchmarkTools.Trial: 31 samples with 1 evaluation per sample.
Range (min … max): 149.399 ms … 194.403 ms ┊ GC (min … max): 1.35% … 25.34%
Time (median): 156.147 ms ┊ GC (median): 4.97%
Time (mean ± σ): 161.630 ms ± 9.801 ms ┊ GC (mean ± σ): 9.54% ± 5.40%
▁▄ ▄▄▁ ▄ ▁▁ █
▆▁██▆▁███▆▁▁▁▁▁▁▁▁▁▁▁▆▁█▆██▆█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
149 ms Histogram: frequency by time 194 ms <
Memory estimate: 114.43 MiB, allocs estimate: 5999491.
BenchmarkTools.Trial: 323 samples with 1 evaluation per sample.
Range (min … max): 14.128 ms … 63.315 ms ┊ GC (min … max): 4.72% … 77.16%
Time (median): 15.284 ms ┊ GC (median): 6.86%
Time (mean ± σ): 15.520 ms ± 2.754 ms ┊ GC (mean ± σ): 7.90% ± 4.68%
▁▁ ▁▄▃▃▄▄▄▄▄▇▃█▁▅▄▁▄ ▃▆ ▁
▃▁▁▅▃▇██▆████████████████████▇█▅▆▅█▄▃▆▆▃▁▃▄▅▁▁▃▆▃▄▃▄▁▃▁▃▃▃▃ ▅
14.1 ms Histogram: frequency by time 17.4 ms <
Memory estimate: 68.67 MiB, allocs estimate: 22.
BenchmarkTools.Trial: 345 samples with 1 evaluation per sample.
Range (min … max): 13.138 ms … 40.207 ms ┊ GC (min … max): 0.00% … 66.52%
Time (median): 14.345 ms ┊ GC (median): 6.93%
Time (mean ± σ): 14.497 ms ± 1.854 ms ┊ GC (mean ± σ): 7.89% ± 4.86%
▂ ▇█▄▄ ▄▁▇▅▆▇▄▂▂▄ ▂ ▁
▃▃▁▁▁▁▁▁▁▃▁▃▅▃▄███▇▇███████████████▆█▄█▆▅▃▅▄▃▃▃▃▃▃▃▃▃▁▁▁▁▁▃ ▄
13.1 ms Histogram: frequency by time 15.7 ms <
Memory estimate: 45.78 MiB, allocs estimate: 20.