Dispatch for drawing multiples by mmikhasenko · Pull Request #1985 · JuliaStats/Distributions.jl

mmikhasenko · 2025-06-17T20:52:02Z

Closes #1984

Implementation

Dispatch for MixtureModel
Dispatch for Truncated

Checkpoints

Tests are passing
Docs are updated -- discussed that docstrings are placed to dispatched functions for consistency (Remove docstring for cdf(::Skellam, ::Real) #1986)

Sanity checks

Speed

For mixture model,

julia> using Distributions
julia> using BenchmarkTools

julia> d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
julia> @btime rand($d, 10_000);
  73.708 μs (7 allocations: 125.67 KiB)

julia> @btime [rand($d) for _ in 1:10_000];
  124.125 μs (3 allocations: 96.06 KiB)

For truncated

t = truncated(Normal(), 0, 3)
@btime rand($t, 10_000);  # 114.625 μs (6 allocations: 256.12 KiB)
@btime [rand($t) for _ in 1:10_000];  # 169.875 μs (3 allocations: 96.06 KiB)

Visual

d = MixtureModel([Normal(1,0.1), Normal(2,0.1), Normal(3,0.1)], [0.3,0.3,0.4])
s1 = [rand(d) for _ in 1:10000]
s2 = rand(d, 10000)
let
    plot(); 
    stephist!(s1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(s2, bins=range(0, 4, 100), lab="rand(d,n)")
end

t = truncated(Normal(), 0, 3)
r1 = [rand(t) for _ in 1:10000]
r2 = rand(t, 10000)
let
    plot(); 
    stephist!(t1, bins=range(0, 4, 100), lab="rand(d) n times")
    stephist!(t2, bins=range(0, 4, 100), lab="rand(d,n)")
end

codecov-commenter · 2025-06-17T21:06:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.42%. Comparing base (f1ff9e8) to head (ad53e2d).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1985      +/-   ##
==========================================
- Coverage   86.44%   86.42%   -0.03%     
==========================================
  Files         147      147              
  Lines        8838     8898      +60     
==========================================
+ Hits         7640     7690      +50     
- Misses       1198     1208      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/mixtures/mixturemodel.jl

src/truncate.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

…y - fallback to rand(d)

mmikhasenko · 2025-06-18T17:20:13Z

@devmotion thanks a lot for the review.

Questions should be addressed.
truncated of small mass: I've fixed broken pipeline for large truncated of small remaining volume - now, it falls back to rand(n) if needs to generate more than 10M, this cut value is somewhat arbitrary, so suggestions are welcome.
docstrings are removed.
made sanity checks of comparing samples of rand(d) and rand(d,n)

No docs are added

mmikhasenko · 2025-06-19T20:08:50Z

The last item that I think about is the 10M cap for switching between algorithm.
It's bad to switch between algorithms in general for predictability, so the design of Truncated with a switch is not ideal.
But we will not address it in this MR.

I'm thinking to proceed with a similar if/else of 0.25% of distribution. Similar (not-ideal) patterns will be easy to identify in feature and update.

src/mixtures/mixturemodel.jl

mmikhasenko · 2025-06-23T19:31:19Z

@devmotion thanks for the last comments.

the truncated was easy to fix,
I've modified the code rand + resize strategy for the mixture model

mmikhasenko · 2025-06-23T19:38:20Z

Re-evaluated benchmarks in the header,
the rand-n version is faster for both models

mmikhasenko · 2025-06-25T09:35:17Z

Comments are resolved, this PR is ready for review

devmotion · 2025-06-25T09:57:37Z

As a general comment before any detailed review:

the rand-n version is faster for both models

Can we test different number of samples? n = 10000 is a bit extreme. Can we check n = 1, n = 5, n = 10, n = 50, n = 100, n = 500, n = 1000 etc. as well?

mmikhasenko · 2026-02-13T21:27:08Z

It would be great to proceed with merging.
The issue is blocking usage of the package.

@devmotion could you please check, if there is anything essential to test/fix?

thanks

devmotion · 2026-02-13T22:40:50Z

src/mixtures/mixturemodel.jl

 rand(rng::AbstractRNG, d::MixtureModel{Univariate}) =
    rand(rng, component(d, rand(rng, d.prior)))

+function rand(rng::AbstractRNG, d::MixtureModel{Univariate}, n::Int)


This is the wrong dispatch, isn't it? If one wants to draw multiple samples from a distribution d, automatically Distributions dispatches to drawing samples with sampler(d). In the case of mixture models this is MixtureSampler. So this should actually be defined for MixtureSampler{Univariate}, not MixtureModel{Univariate} AFAICT?

However, generally for univariate distributions one also shouldn't define rand(rng, dist, n) but only the in-place method _rand!(rng, sampler(dist), out) (#1905 will fix possibly incorrectly allocated output arrays) if multiple samples can be generated more efficiently:

Distributions.jl/src/genericrand.jl

Line 35 in f1ff9e8

return rand!(rng, sampler(s), out)

Distributions.jl/src/univariates.jl

Lines 141 to 150 in f1ff9e8

function rand!(rng::AbstractRNG, s::Sampleable{Univariate}, A::AbstractArray{<:Real})

return _rand!(rng, sampler(s), A)

end

function _rand!(rng::AbstractRNG, sampler::Sampleable{Univariate}, A::AbstractArray{<:Real})

for i in eachindex(A)

A[i] = rand(rng, sampler)

end

return A

end

So AFAICT we should only define

Suggested change

function rand(rng::AbstractRNG, d::MixtureModel{Univariate}, n::Int)

function _rand!(rng::AbstractRNG, d::MixtureSampler{Univariate}, x::AbstractArray{<:Real})

Alternatively, if we never want to use MixtureSampler for sampling MixtureModel{Univariate}, we should define sampler(d::MixtureModel{Univariate}) = d (or limit the definition below to sampler(d::MixtureModel{Multivariate}) = MixtureSampler(d)) and here define

Suggested change

function rand(rng::AbstractRNG, d::MixtureModel{Univariate}, n::Int)

function _rand!(rng::AbstractRNG, d::MixtureModel{Univariate}, x::AbstractArray{<:Real})

devmotion · 2026-02-13T23:18:42Z

src/mixtures/mixturemodel.jl

+    # Find the component with the maximum count to minimize resizing
+    max_count_idx = argmax(counts)
+    max_count = counts[max_count_idx]
+
+    # Sample from the component with maximum count first and use it directly
+    x = rand(rng, component(d, max_count_idx), max_count)
+
+    # Resize to the full size and continue with other components
+    resize!(x, n)
+    offset = max_count
+
+    for i in eachindex(counts)
+        if i != max_count_idx
+            ni = counts[i]
+            if ni > 0
+                c = component(d, i)
+                last_offset = offset + ni - 1
+                rand!(rng, c, @view(x[(begin+offset):(begin+last_offset)]))
+                offset = last_offset + 1
+            end
+        end
+    end


For the in-place method, it seems this could be simplified to

Suggested change

# Find the component with the maximum count to minimize resizing

max_count_idx = argmax(counts)

max_count = counts[max_count_idx]

# Sample from the component with maximum count first and use it directly

x = rand(rng, component(d, max_count_idx), max_count)

# Resize to the full size and continue with other components

resize!(x, n)

offset = max_count

for i in eachindex(counts)

if i != max_count_idx

ni = counts[i]

if ni > 0

c = component(d, i)

last_offset = offset + ni - 1

rand!(rng, c, @view(x[(begin+offset):(begin+last_offset)]))

offset = last_offset + 1

end

end

end

offset = 0

for (c, ni) in zip(components(d), counts)

last_offset = offset + ni - 1

rand!(rng, c, @view(x[(begin+offset):(begin+last_offset)]))

offset = last_offset + 1

end

devmotion · 2026-02-13T23:23:35Z

src/truncate.jl

    end
 end

+function rand(rng::AbstractRNG, d::Truncated, n::Int)


Suggested change

function rand(rng::AbstractRNG, d::Truncated, n::Int)

function _rand!(rng::AbstractRNG, d::Truncated, x::AbstractArray{<:Real})

And if there's any precomputations that could be factored out (doesn't seem to be the case?), then we should think about defining a dedicated sampler.

src/truncate.jl

devmotion · 2026-02-13T23:25:01Z

src/truncate.jl

+    if tp > 0.25
+        # Regime 1: Rejection sampling with batch optimization
+        # Get the correct type and memory by sampling from the untruncated distribution
+        samples = rand(rng, d0, n)


Suggested change

samples = rand(rng, d0, n)

rand!(rng, d0, x)

devmotion · 2026-02-13T23:28:55Z

src/truncate.jl

+        samples = rand(rng, d0, n)
+        n_collected = 0
+        max_batch = 0
+        batch_buffer = Vector{eltype(samples)}()


A separate batch buffer seems unnecessary, in particular the resizing might be inefficient? Instead of copying from a separate vector we could just use the output vector and move samples around there and keep track of the last accepted index.

devmotion · 2026-02-13T23:30:00Z

src/truncate.jl

+        # Sample one value first to determine the correct type
+        sample_type = typeof(quantile(d0, d.lcdf + rand(rng) * d.tp))
+        samples = Vector{sample_type}(undef, n)


Suggested change

# Sample one value first to determine the correct type

sample_type = typeof(quantile(d0, d.lcdf + rand(rng) * d.tp))

samples = Vector{sample_type}(undef, n)

devmotion · 2026-02-13T23:32:10Z

src/truncate.jl

+        sample_type = typeof(quantile(d0, d.lcdf + rand(rng) * d.tp))
+        samples = Vector{sample_type}(undef, n)
+        for i in 1:n
+            samples[i] = quantile(d0, d.lcdf + rand(rng) * d.tp)


We should probably at least use a Random.Sampler for rand(rng), maybe it's even faster to call rand(rng, n) (despite the allocation.

devmotion · 2026-02-13T23:32:36Z

src/truncate.jl

+        # Sample one value first to determine the correct type
+        sample_type = typeof(invlogcdf(d0, logaddexp(d.loglcdf, d.logtp - randexp(rng))))
+        samples = Vector{sample_type}(undef, n)


Suggested change

# Sample one value first to determine the correct type

sample_type = typeof(invlogcdf(d0, logaddexp(d.loglcdf, d.logtp - randexp(rng))))

samples = Vector{sample_type}(undef, n)

Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>

Add dispatch for drawing multiple samples from UnivariateMixtureModel

d30e355

mmikhasenko marked this pull request as draft June 17, 2025 20:52

first implementation of rand-n for truncated

4aef890

mmikhasenko added 3 commits June 17, 2025 23:38

no-recursion in implementation of rand-n for truncated

3fc31ce

docstring for existing rand

5f406ab

minor update to rand-n truncated docstring

c10ef3e

devmotion reviewed Jun 18, 2025

View reviewed changes

Update src/mixtures/mixturemodel.jl

df17b50

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

devmotion mentioned this pull request Jun 18, 2025

Remove docstring for cdf(::Skellam, ::Real) #1986

Merged

mmikhasenko and others added 5 commits June 18, 2025 16:00

eltype to partype

7501a5a

remove rand truncated docstring

832519f

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

remove rand-n truncated docstring

ee17d46

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

in-place rand(rng::AbstractRNG, d::Truncated, n::Int)

8a9c15a

fix for small trunctated distributions with tiny remaining probabilit…

4fed744

…y - fallback to rand(d)

switch to three regimes

0d303ba

devmotion reviewed Jun 23, 2025

View reviewed changes

src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved

rand + resize strategy

4534327

mmikhasenko marked this pull request as ready for review June 23, 2025 19:31

Merge branch 'master' into rand-n-dispatch

97e3b25

Merge branch 'master' into rand-n-dispatch

cbe3df6

devmotion reviewed Feb 13, 2026

View reviewed changes

Update src/truncate.jl

ad53e2d

Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>

	function rand!(rng::AbstractRNG, s::Sampleable{Univariate}, A::AbstractArray{<:Real})
	return _rand!(rng, sampler(s), A)
	end

	function _rand!(rng::AbstractRNG, sampler::Sampleable{Univariate}, A::AbstractArray{<:Real})
	for i in eachindex(A)
	A[i] = rand(rng, sampler)
	end
	return A
	end

	function rand(rng::AbstractRNG, d::MixtureModel{Univariate}, n::Int)
	function _rand!(rng::AbstractRNG, d::MixtureSampler{Univariate}, x::AbstractArray{<:Real})

	function rand(rng::AbstractRNG, d::Truncated, n::Int)
	function _rand!(rng::AbstractRNG, d::Truncated, x::AbstractArray{<:Real})

	# Sample one value first to determine the correct type
	sample_type = typeof(quantile(d0, d.lcdf + rand(rng) * d.tp))
	samples = Vector{sample_type}(undef, n)

	# Sample one value first to determine the correct type
	sample_type = typeof(invlogcdf(d0, logaddexp(d.loglcdf, d.logtp - randexp(rng))))
	samples = Vector{sample_type}(undef, n)

Conversation

mmikhasenko commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Checkpoints

Sanity checks

Speed

Visual

Uh oh!

codecov-commenter commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmikhasenko commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmikhasenko commented Jun 19, 2025

Uh oh!

Uh oh!

mmikhasenko commented Jun 23, 2025

Uh oh!

mmikhasenko commented Jun 23, 2025

Uh oh!

mmikhasenko commented Jun 25, 2025

Uh oh!

devmotion commented Jun 25, 2025

Uh oh!

mmikhasenko commented Feb 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmikhasenko commented Jun 17, 2025 •

edited

Loading

codecov-commenter commented Jun 17, 2025 •

edited

Loading

mmikhasenko commented Jun 18, 2025 •

edited

Loading