Skip to content

Conversation

@christiangnrd
Copy link
Member

Close #474

By my very unscientific tests (generating a bunch of values and plotting a histogram), quality seems similar to MPS, but without the NaN generation (#474).

In a future PR, if we really want to default to Apple-provided random number generation when supported, we could wrap the MPSGraph rng functionality but that may incur a performance hit.

@github-actions

This comment was marked as off-topic.

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.58%. Comparing base (6d1af96) to head (330289c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
- Coverage   82.62%   82.58%   -0.04%     
==========================================
  Files          62       62              
  Lines        2866     2860       -6     
==========================================
- Hits         2368     2362       -6     
  Misses        498      498              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: 330289c Previous: 6d1af96 Ratio
latency/precompile 25075880292 ns 25160140042 ns 1.00
latency/ttfp 2279012750 ns 2277207000 ns 1.00
latency/import 1450336541 ns 1442078125 ns 1.01
integration/metaldevrt 872083 ns 867958 ns 1.00
integration/byval/slices=1 1569667 ns 1566500 ns 1.00
integration/byval/slices=3 8935333 ns 8820791.5 ns 1.01
integration/byval/reference 1559042 ns 1558125 ns 1.00
integration/byval/slices=2 2602916 ns 2640459 ns 0.99
kernel/indexing 618583 ns 612292 ns 1.01
kernel/indexing_checked 612896 ns 616562.5 ns 0.99
kernel/launch 11542 ns 13250 ns 0.87
kernel/rand 571833 ns 569542 ns 1.00
array/construct 6375 ns 6667 ns 0.96
array/broadcast 604666.5 ns 611708 ns 0.99
array/random/randn/Float32 941916 ns 846208 ns 1.11
array/random/randn!/Float32 747792 ns 627625 ns 1.19
array/random/rand!/Int64 555104.5 ns 564792 ns 0.98
array/random/rand!/Float32 579958 ns 588250 ns 0.99
array/random/rand/Int64 776584 ns 757521 ns 1.03
array/random/rand/Float32 606208 ns 621000 ns 0.98
array/accumulate/Int64/1d 1249416 ns 1217584 ns 1.03
array/accumulate/Int64/dims=1 1828583 ns 1838104 ns 0.99
array/accumulate/Int64/dims=2 2153083 ns 2176020.5 ns 0.99
array/accumulate/Int64/dims=1L 11661062.5 ns 11497375 ns 1.01
array/accumulate/Int64/dims=2L 9777520.5 ns 9787125 ns 1.00
array/accumulate/Float32/1d 1124791.5 ns 1135000 ns 0.99
array/accumulate/Float32/dims=1 1558667 ns 1574459 ns 0.99
array/accumulate/Float32/dims=2 1878250 ns 1885208.5 ns 1.00
array/accumulate/Float32/dims=1L 9815521 ns 9860583 ns 1.00
array/accumulate/Float32/dims=2L 7210958.5 ns 7186084 ns 1.00
array/reductions/reduce/Int64/1d 1347500 ns 1380521 ns 0.98
array/reductions/reduce/Int64/dims=1 1083791 ns 1100083 ns 0.99
array/reductions/reduce/Int64/dims=2 1137333.5 ns 1249709 ns 0.91
array/reductions/reduce/Int64/dims=1L 2028125 ns 2008333 ns 1.01
array/reductions/reduce/Int64/dims=2L 4237250 ns 4231042 ns 1.00
array/reductions/reduce/Float32/1d 1005333 ns 1068333 ns 0.94
array/reductions/reduce/Float32/dims=1 829625 ns 841958 ns 0.99
array/reductions/reduce/Float32/dims=2 836000 ns 863417 ns 0.97
array/reductions/reduce/Float32/dims=1L 1326166.5 ns 1293958 ns 1.02
array/reductions/reduce/Float32/dims=2L 1798917 ns 1858375 ns 0.97
array/reductions/mapreduce/Int64/1d 1547500 ns 1378416 ns 1.12
array/reductions/mapreduce/Int64/dims=1 1090375 ns 1097479 ns 0.99
array/reductions/mapreduce/Int64/dims=2 1302625 ns 1146146 ns 1.14
array/reductions/mapreduce/Int64/dims=1L 1998396 ns 2047937 ns 0.98
array/reductions/mapreduce/Int64/dims=2L 3610958 ns 3644979.5 ns 0.99
array/reductions/mapreduce/Float32/1d 1047709 ns 1002770.5 ns 1.04
array/reductions/mapreduce/Float32/dims=1 826459 ns 843292 ns 0.98
array/reductions/mapreduce/Float32/dims=2 846542 ns 863750 ns 0.98
array/reductions/mapreduce/Float32/dims=1L 1327042 ns 1323667 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 1805875 ns 1826250 ns 0.99
array/private/copyto!/gpu_to_gpu 636000 ns 631000 ns 1.01
array/private/copyto!/cpu_to_gpu 801479.5 ns 766666 ns 1.05
array/private/copyto!/gpu_to_cpu 796083 ns 796292 ns 1.00
array/private/iteration/findall/int 1567458.5 ns 1584104.5 ns 0.99
array/private/iteration/findall/bool 1408333 ns 1407375 ns 1.00
array/private/iteration/findfirst/int 2084542 ns 2078145.5 ns 1.00
array/private/iteration/findfirst/bool 2046458.5 ns 2053875 ns 1.00
array/private/iteration/scalar 4152458 ns 4242792 ns 0.98
array/private/iteration/logical 2652395.5 ns 2673833.5 ns 0.99
array/private/iteration/findmin/1d 2511833 ns 2539375 ns 0.99
array/private/iteration/findmin/2d 1818542 ns 1823250 ns 1.00
array/private/copy 563333 ns 600125 ns 0.94
array/shared/copyto!/gpu_to_gpu 81833 ns 83916.5 ns 0.98
array/shared/copyto!/cpu_to_gpu 81292 ns 83208 ns 0.98
array/shared/copyto!/gpu_to_cpu 82875 ns 82667 ns 1.00
array/shared/iteration/findall/int 1533416 ns 1586791.5 ns 0.97
array/shared/iteration/findall/bool 1389958 ns 1416458 ns 0.98
array/shared/iteration/findfirst/int 1635166 ns 1674583 ns 0.98
array/shared/iteration/findfirst/bool 1628208 ns 1655708 ns 0.98
array/shared/iteration/scalar 191458.5 ns 208208 ns 0.92
array/shared/iteration/logical 2150604.5 ns 2272208 ns 0.95
array/shared/iteration/findmin/1d 2105125 ns 2145542 ns 0.98
array/shared/iteration/findmin/2d 1804042 ns 1829583 ns 0.99
array/shared/copy 260229 ns 248416 ns 1.05
array/permutedims/4d 2392625 ns 2399458 ns 1.00
array/permutedims/2d 1159334 ns 1179542 ns 0.98
array/permutedims/3d 1671250 ns 1689042 ns 0.99
metal/synchronization/stream 19375 ns 19104.5 ns 1.01
metal/synchronization/context 20042 ns 20334 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd
Copy link
Member Author

There seems to be a bit of a performance hit with uniform rand, maybe we keep using MPS for those since they weren't causing issues?

@rveltz
Copy link

rveltz commented Jan 9, 2026

Yes but the rand is buggy so is it a good comparison?

@christiangnrd
Copy link
Member Author

Yes but the rand is buggy so is it a good comparison?

I was under the assumption that the issue (yours #474) this is attempting to fix only happens with the normally distributed randn and not uniformly distributed rand.

@rveltz
Copy link

rveltz commented Jan 15, 2026

you are right sorry.

@christiangnrd christiangnrd force-pushed the defrand branch 2 times, most recently from bc43d2e to 379b7d3 Compare January 28, 2026 02:50
@christiangnrd christiangnrd merged commit 3ef42d9 into main Jan 29, 2026
17 checks passed
@christiangnrd christiangnrd deleted the defrand branch January 29, 2026 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal.randn! produces Nan

3 participants