-
Notifications
You must be signed in to change notification settings - Fork 51
Use Metal.jl native rand by default #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
==========================================
- Coverage 82.62% 82.58% -0.04%
==========================================
Files 62 62
Lines 2866 2860 -6
==========================================
- Hits 2368 2362 -6
Misses 498 498 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Details
| Benchmark suite | Current: 330289c | Previous: 6d1af96 | Ratio |
|---|---|---|---|
latency/precompile |
25075880292 ns |
25160140042 ns |
1.00 |
latency/ttfp |
2279012750 ns |
2277207000 ns |
1.00 |
latency/import |
1450336541 ns |
1442078125 ns |
1.01 |
integration/metaldevrt |
872083 ns |
867958 ns |
1.00 |
integration/byval/slices=1 |
1569667 ns |
1566500 ns |
1.00 |
integration/byval/slices=3 |
8935333 ns |
8820791.5 ns |
1.01 |
integration/byval/reference |
1559042 ns |
1558125 ns |
1.00 |
integration/byval/slices=2 |
2602916 ns |
2640459 ns |
0.99 |
kernel/indexing |
618583 ns |
612292 ns |
1.01 |
kernel/indexing_checked |
612896 ns |
616562.5 ns |
0.99 |
kernel/launch |
11542 ns |
13250 ns |
0.87 |
kernel/rand |
571833 ns |
569542 ns |
1.00 |
array/construct |
6375 ns |
6667 ns |
0.96 |
array/broadcast |
604666.5 ns |
611708 ns |
0.99 |
array/random/randn/Float32 |
941916 ns |
846208 ns |
1.11 |
array/random/randn!/Float32 |
747792 ns |
627625 ns |
1.19 |
array/random/rand!/Int64 |
555104.5 ns |
564792 ns |
0.98 |
array/random/rand!/Float32 |
579958 ns |
588250 ns |
0.99 |
array/random/rand/Int64 |
776584 ns |
757521 ns |
1.03 |
array/random/rand/Float32 |
606208 ns |
621000 ns |
0.98 |
array/accumulate/Int64/1d |
1249416 ns |
1217584 ns |
1.03 |
array/accumulate/Int64/dims=1 |
1828583 ns |
1838104 ns |
0.99 |
array/accumulate/Int64/dims=2 |
2153083 ns |
2176020.5 ns |
0.99 |
array/accumulate/Int64/dims=1L |
11661062.5 ns |
11497375 ns |
1.01 |
array/accumulate/Int64/dims=2L |
9777520.5 ns |
9787125 ns |
1.00 |
array/accumulate/Float32/1d |
1124791.5 ns |
1135000 ns |
0.99 |
array/accumulate/Float32/dims=1 |
1558667 ns |
1574459 ns |
0.99 |
array/accumulate/Float32/dims=2 |
1878250 ns |
1885208.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
9815521 ns |
9860583 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7210958.5 ns |
7186084 ns |
1.00 |
array/reductions/reduce/Int64/1d |
1347500 ns |
1380521 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
1083791 ns |
1100083 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
1137333.5 ns |
1249709 ns |
0.91 |
array/reductions/reduce/Int64/dims=1L |
2028125 ns |
2008333 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
4237250 ns |
4231042 ns |
1.00 |
array/reductions/reduce/Float32/1d |
1005333 ns |
1068333 ns |
0.94 |
array/reductions/reduce/Float32/dims=1 |
829625 ns |
841958 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
836000 ns |
863417 ns |
0.97 |
array/reductions/reduce/Float32/dims=1L |
1326166.5 ns |
1293958 ns |
1.02 |
array/reductions/reduce/Float32/dims=2L |
1798917 ns |
1858375 ns |
0.97 |
array/reductions/mapreduce/Int64/1d |
1547500 ns |
1378416 ns |
1.12 |
array/reductions/mapreduce/Int64/dims=1 |
1090375 ns |
1097479 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2 |
1302625 ns |
1146146 ns |
1.14 |
array/reductions/mapreduce/Int64/dims=1L |
1998396 ns |
2047937 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2L |
3610958 ns |
3644979.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
1047709 ns |
1002770.5 ns |
1.04 |
array/reductions/mapreduce/Float32/dims=1 |
826459 ns |
843292 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=2 |
846542 ns |
863750 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1L |
1327042 ns |
1323667 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
1805875 ns |
1826250 ns |
0.99 |
array/private/copyto!/gpu_to_gpu |
636000 ns |
631000 ns |
1.01 |
array/private/copyto!/cpu_to_gpu |
801479.5 ns |
766666 ns |
1.05 |
array/private/copyto!/gpu_to_cpu |
796083 ns |
796292 ns |
1.00 |
array/private/iteration/findall/int |
1567458.5 ns |
1584104.5 ns |
0.99 |
array/private/iteration/findall/bool |
1408333 ns |
1407375 ns |
1.00 |
array/private/iteration/findfirst/int |
2084542 ns |
2078145.5 ns |
1.00 |
array/private/iteration/findfirst/bool |
2046458.5 ns |
2053875 ns |
1.00 |
array/private/iteration/scalar |
4152458 ns |
4242792 ns |
0.98 |
array/private/iteration/logical |
2652395.5 ns |
2673833.5 ns |
0.99 |
array/private/iteration/findmin/1d |
2511833 ns |
2539375 ns |
0.99 |
array/private/iteration/findmin/2d |
1818542 ns |
1823250 ns |
1.00 |
array/private/copy |
563333 ns |
600125 ns |
0.94 |
array/shared/copyto!/gpu_to_gpu |
81833 ns |
83916.5 ns |
0.98 |
array/shared/copyto!/cpu_to_gpu |
81292 ns |
83208 ns |
0.98 |
array/shared/copyto!/gpu_to_cpu |
82875 ns |
82667 ns |
1.00 |
array/shared/iteration/findall/int |
1533416 ns |
1586791.5 ns |
0.97 |
array/shared/iteration/findall/bool |
1389958 ns |
1416458 ns |
0.98 |
array/shared/iteration/findfirst/int |
1635166 ns |
1674583 ns |
0.98 |
array/shared/iteration/findfirst/bool |
1628208 ns |
1655708 ns |
0.98 |
array/shared/iteration/scalar |
191458.5 ns |
208208 ns |
0.92 |
array/shared/iteration/logical |
2150604.5 ns |
2272208 ns |
0.95 |
array/shared/iteration/findmin/1d |
2105125 ns |
2145542 ns |
0.98 |
array/shared/iteration/findmin/2d |
1804042 ns |
1829583 ns |
0.99 |
array/shared/copy |
260229 ns |
248416 ns |
1.05 |
array/permutedims/4d |
2392625 ns |
2399458 ns |
1.00 |
array/permutedims/2d |
1159334 ns |
1179542 ns |
0.98 |
array/permutedims/3d |
1671250 ns |
1689042 ns |
0.99 |
metal/synchronization/stream |
19375 ns |
19104.5 ns |
1.01 |
metal/synchronization/context |
20042 ns |
20334 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
f1262da to
c3fe826
Compare
|
There seems to be a bit of a performance hit with uniform rand, maybe we keep using MPS for those since they weren't causing issues? |
|
Yes but the |
I was under the assumption that the issue (yours #474) this is attempting to fix only happens with the normally distributed |
|
you are right sorry. |
c5a5827 to
60e6b32
Compare
5c1c9a5 to
67a4f42
Compare
bc43d2e to
379b7d3
Compare
Close #474
By my very unscientific tests (generating a bunch of values and plotting a histogram), quality seems similar to MPS, but without the NaN generation (#474).
In a future PR, if we really want to default to Apple-provided random number generation when supported, we could wrap the MPSGraph rng functionality but that may incur a performance hit.