Skip to content

Julia Hypergeometric sampler#2032

Open
oscardssmith wants to merge 6 commits intoJuliaStats:masterfrom
oscardssmith:os/native-Hypergeometric-Sampler
Open

Julia Hypergeometric sampler#2032
oscardssmith wants to merge 6 commits intoJuliaStats:masterfrom
oscardssmith:os/native-Hypergeometric-Sampler

Conversation

@oscardssmith
Copy link
Contributor

alternative to #2027 that implements the Sampler interface for faster repeated random numbers.

julia> d = Hypergeometric(60,80,40)
Hypergeometric(ns=60, nf=80, n=40)

julia> @benchmark spl=sampler($d)
BenchmarkTools.Trial: 10000 samples with 900 evaluations per sample.
 Range (min … max):  123.613 ns … 159.059 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     124.582 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   125.135 ns ±   2.068 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂▅▇▇▇█▆▄▂▁▁▁▁▁▁▁   ▁▁▁▁▂▁▁▁ ▁▁                               ▂
  ▅████████████████████████████████████▇▇▇▇▇▅▆▅▆▅▅▆▅▄▄▄▅▄▅▅▄▂▃▄ █
  124 ns        Histogram: log(frequency) by time        133 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand($spl)
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
 Range (min … max):  40.188 ns … 77.342 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.353 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.502 ns ±  1.749 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▁▁▃▅▅▆▇▇█▇▆▇▆▄▄▃▁▁                      
  ▂▁▁▂▂▁▂▂▂▂▂▂▃▃▃▄▄▅▅▇▇███████████████████▇▆▅▅▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂ ▄
  40.2 ns         Histogram: frequency by time        50.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0

So #2027 has 165ns rand times, while this has 125ns setup and 45ns for a 3.75x speedup when generating multiple numbers.

@devmotion
Copy link
Member

I would prefer continuing with #1994. There are a few things to address and improve in #1994 but the test setup in #1994 is superior to this PR. Moreover, by continuing with or basing a new PR on #1994 we can properly give credit to the original author of #1994. I think it's not fine to just copy (parts of) #1994 without giving credit (even though legally you can do that, of course).

@oscardssmith
Copy link
Contributor Author

I don't have a preference for how we do this so long as one of these PRs is merged (or some combination). I am not a maintainer of the repo, so I can not edit #1994 to make it address code review, so as I see it, there are ~3 options.

  1. Merge Implementing Hypergeometric sampling to remove RFunctions dependency #1994 as is and I followup with a PR to address review comments and impliment section 4.2 of the algorithm (which is a ~2x speedup in sampling in the cases I've tested)

  2. I can rebase this PR on top of rand(Normal(0, BigFloat) vs. rand(Truncated(Normal(0, BigFloat), -BigFloat, BigFloat) #1944 so that the commit history contains @JaoCR's contributions and test improvements

  3. If given permissions I could edit rand(Normal(0, BigFloat) vs. rand(Truncated(Normal(0, BigFloat), -BigFloat, BigFloat) #1944 to address code review (or if not, you or the original author could do so).

Which of these seems like the right approach to you? Happy do go with any of them.

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 97.50000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.58%. Comparing base (3d304c2) to head (4135f78).

Files with missing lines Patch % Lines
src/samplers/hypergeometric.jl 97.46% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2032      +/-   ##
==========================================
+ Coverage   86.44%   86.58%   +0.14%     
==========================================
  Files         147      148       +1     
  Lines        8837     8907      +70     
==========================================
+ Hits         7639     7712      +73     
+ Misses       1198     1195       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oscardssmith
Copy link
Contributor Author

@devmotion any thoughts on which of the above proposals seems best to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants