-
Notifications
You must be signed in to change notification settings - Fork 46
Revert "Synchronize using MTLSharedEvents"
#645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 1942968.
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/state.jl b/src/state.jl
index 3a0512e5..8f2c9399 100644
--- a/src/state.jl
+++ b/src/state.jl
@@ -65,7 +65,7 @@ Create a new MTLCommandBuffer from the global command queue, commit it to the qu
and simply wait for it to be completed. Since command buffers *should* execute in a
First-In-First-Out manner, this synchronizes the GPU.
"""
-@autoreleasepool function synchronize(queue::MTLCommandQueue=global_queue(device()))
+@autoreleasepool function synchronize(queue::MTLCommandQueue = global_queue(device()))
cmdbuf = MTLCommandBuffer(queue)
commit!(cmdbuf)
wait_completed(cmdbuf) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #645 +/- ##
==========================================
- Coverage 80.63% 80.61% -0.03%
==========================================
Files 61 61
Lines 2722 2713 -9
==========================================
- Hits 2195 2187 -8
+ Misses 527 526 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
| Benchmark suite | Current: a52e795 | Previous: 1942968 | Ratio |
|---|---|---|---|
latency/precompile |
9944983208 ns |
9844653958 ns |
1.01 |
latency/ttfp |
4027831958 ns |
3972040229 ns |
1.01 |
latency/import |
1293257604.5 ns |
1275530958.5 ns |
1.01 |
integration/metaldevrt |
906749.5 ns |
828500 ns |
1.09 |
integration/byval/slices=1 |
1646000 ns |
1536750 ns |
1.07 |
integration/byval/slices=3 |
19605000 ns |
9632625 ns |
2.04 |
integration/byval/reference |
1626604 ns |
1543583 ns |
1.05 |
integration/byval/slices=2 |
2759708 ns |
2621958.5 ns |
1.05 |
kernel/indexing |
525292 ns |
567792 ns |
0.93 |
kernel/indexing_checked |
557541 ns |
569292 ns |
0.98 |
kernel/launch |
9333 ns |
9208 ns |
1.01 |
array/construct |
6042 ns |
6625 ns |
0.91 |
array/broadcast |
529708 ns |
583375 ns |
0.91 |
array/random/randn/Float32 |
907500 ns |
784333 ns |
1.16 |
array/random/randn!/Float32 |
597958 ns |
623250 ns |
0.96 |
array/random/rand!/Int64 |
552875 ns |
547458 ns |
1.01 |
array/random/rand!/Float32 |
553250 ns |
585291 ns |
0.95 |
array/random/rand/Int64 |
1016375 ns |
771250 ns |
1.32 |
array/random/rand/Float32 |
786395.5 ns |
622687 ns |
1.26 |
array/accumulate/Int64/1d |
1382250 ns |
1277104.5 ns |
1.08 |
array/accumulate/Int64/dims=1 |
1936458.5 ns |
1868333 ns |
1.04 |
array/accumulate/Int64/dims=2 |
2322604.5 ns |
2183625 ns |
1.06 |
array/accumulate/Int64/dims=1L |
12289250 ns |
11737104 ns |
1.05 |
array/accumulate/Int64/dims=2L |
10154916.5 ns |
9771416.5 ns |
1.04 |
array/accumulate/Float32/1d |
1190125 ns |
1142833 ns |
1.04 |
array/accumulate/Float32/dims=1 |
1672625 ns |
1570458 ns |
1.07 |
array/accumulate/Float32/dims=2 |
2093771 ns |
1931625 ns |
1.08 |
array/accumulate/Float32/dims=1L |
10500250 ns |
9864375 ns |
1.06 |
array/accumulate/Float32/dims=2L |
7549250 ns |
7308021 ns |
1.03 |
array/reductions/reduce/Int64/1d |
1280583 ns |
1373353.5 ns |
0.93 |
array/reductions/reduce/Int64/dims=1 |
1176959 ns |
1069291.5 ns |
1.10 |
array/reductions/reduce/Int64/dims=2 |
1320709 ns |
1193292 ns |
1.11 |
array/reductions/reduce/Int64/dims=1L |
2223417 ns |
2113062.5 ns |
1.05 |
array/reductions/reduce/Int64/dims=2L |
3594292 ns |
3456458 ns |
1.04 |
array/reductions/reduce/Float32/1d |
772083 ns |
971625 ns |
0.79 |
array/reductions/reduce/Float32/dims=1 |
848833 ns |
808458 ns |
1.05 |
array/reductions/reduce/Float32/dims=2 |
760354 ns |
768979 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
1852834 ns |
1739041 ns |
1.07 |
array/reductions/reduce/Float32/dims=2L |
1904437.5 ns |
1772125 ns |
1.07 |
array/reductions/mapreduce/Int64/1d |
1302542 ns |
1456146 ns |
0.89 |
array/reductions/mapreduce/Int64/dims=1 |
1179292 ns |
1074875 ns |
1.10 |
array/reductions/mapreduce/Int64/dims=2 |
1302542 ns |
1206417 ns |
1.08 |
array/reductions/mapreduce/Int64/dims=1L |
2228459 ns |
2119292 ns |
1.05 |
array/reductions/mapreduce/Int64/dims=2L |
3585125 ns |
3444375 ns |
1.04 |
array/reductions/mapreduce/Float32/1d |
779687.5 ns |
990792 ns |
0.79 |
array/reductions/mapreduce/Float32/dims=1 |
855833.5 ns |
810062.5 ns |
1.06 |
array/reductions/mapreduce/Float32/dims=2 |
754750 ns |
761104 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
1856917 ns |
1740812.5 ns |
1.07 |
array/reductions/mapreduce/Float32/dims=2L |
1908416 ns |
1781292 ns |
1.07 |
array/private/copyto!/gpu_to_gpu |
563833 ns |
651375 ns |
0.87 |
array/private/copyto!/cpu_to_gpu |
705541.5 ns |
805542 ns |
0.88 |
array/private/copyto!/gpu_to_cpu |
764792 ns |
817667 ns |
0.94 |
array/private/iteration/findall/int |
1641312.5 ns |
1646500 ns |
1.00 |
array/private/iteration/findall/bool |
1558625 ns |
1444584 ns |
1.08 |
array/private/iteration/findfirst/int |
1793292 ns |
1754958.5 ns |
1.02 |
array/private/iteration/findfirst/bool |
1733750 ns |
1703625 ns |
1.02 |
array/private/iteration/scalar |
3396333 ns |
4772500 ns |
0.71 |
array/private/iteration/logical |
2720896 ns |
2536917 ns |
1.07 |
array/private/iteration/findmin/1d |
1772875 ns |
1815666 ns |
0.98 |
array/private/iteration/findmin/2d |
1549021 ns |
1431750 ns |
1.08 |
array/private/copy |
804542 ns |
538167 ns |
1.49 |
array/shared/copyto!/gpu_to_gpu |
84625 ns |
86375 ns |
0.98 |
array/shared/copyto!/cpu_to_gpu |
79584 ns |
86583 ns |
0.92 |
array/shared/copyto!/gpu_to_cpu |
80083 ns |
84833 ns |
0.94 |
array/shared/iteration/findall/int |
1645292 ns |
1609874.5 ns |
1.02 |
array/shared/iteration/findall/bool |
1549937.5 ns |
1464354 ns |
1.06 |
array/shared/iteration/findfirst/int |
1466208 ns |
1377750 ns |
1.06 |
array/shared/iteration/findfirst/bool |
1411916.5 ns |
1319166 ns |
1.07 |
array/shared/iteration/scalar |
160125 ns |
217500 ns |
0.74 |
array/shared/iteration/logical |
2508708.5 ns |
2288708.5 ns |
1.10 |
array/shared/iteration/findmin/1d |
1438625 ns |
1421750 ns |
1.01 |
array/shared/iteration/findmin/2d |
1546874.5 ns |
1430854.5 ns |
1.08 |
array/shared/copy |
210917 ns |
248666 ns |
0.85 |
array/permutedims/4d |
2597333 ns |
2438438 ns |
1.07 |
array/permutedims/2d |
1288417 ns |
1193250 ns |
1.08 |
array/permutedims/3d |
1945750.5 ns |
1768458 ns |
1.10 |
metal/synchronization/stream |
14875 ns |
19916 ns |
0.75 |
metal/synchronization/context |
15854.5 ns |
20375 ns |
0.78 |
This comment was automatically generated by workflow using github-action-benchmark.
Reverts #633
The MTLCaptureManager failures seems to be related. Not much info online about how
stopCaptureworks but it seems like the capture manager'sisCapturingproperty does not change to false until the command buffer status isMTLCommandBufferStatusCompleted, which the new synchronization via events method does not wait for.Will reland after determining how to deal with this for capturing.