-
Couldn't load subscription status.
- Fork 46
Test GPUArrays reverse
#648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/mtl/capture.jl b/lib/mtl/capture.jl
index c2c1a77a..c101c5b7 100644
--- a/lib/mtl/capture.jl
+++ b/lib/mtl/capture.jl
@@ -59,7 +59,8 @@ function MTLCaptureDescriptor()
end
# TODO: Add capture state
-function MTLCaptureDescriptor(obj::Union{MTLDevice,MTLCommandQueue,MTLCaptureScope},
+function MTLCaptureDescriptor(
+ obj::Union{MTLDevice, MTLCommandQueue, MTLCaptureScope},
destination::MTLCaptureDestination;
folder::String=nothing)
desc = MTLCaptureDescriptor()
@@ -110,7 +111,8 @@ end
Start GPU frame capture using the default capture object and specifying capture descriptor parameters directly.
"""
-function startCapture(obj::Union{MTLDevice,MTLCommandQueue,MTLCaptureScope},
+function startCapture(
+ obj::Union{MTLDevice, MTLCommandQueue, MTLCaptureScope},
destination::MTLCaptureDestination=MTLCaptureDestinationGPUTraceDocument;
folder::String=nothing)
if destination == MTLCaptureDestinationGPUTraceDocument && folder === nothing
diff --git a/perf/array.jl b/perf/array.jl
index 008ab4d6..b86a675e 100644
--- a/perf/array.jl
+++ b/perf/array.jl
@@ -63,12 +63,12 @@ gpu_vec_ints = reshape(gpu_mat_ints, length(gpu_mat_ints))
let group = addgroup!(group, "reverse")
group["1d"] = @benchmarkable Metal.@sync reverse($gpu_vec)
group["1dL"] = @benchmarkable Metal.@sync reverse($gpu_vec_long)
- group["2d"] = @benchmarkable Metal.@sync reverse($gpu_mat; dims=1)
- group["2dL"] = @benchmarkable Metal.@sync reverse($gpu_mat_long; dims=1)
+ group["2d"] = @benchmarkable Metal.@sync reverse($gpu_mat; dims = 1)
+ group["2dL"] = @benchmarkable Metal.@sync reverse($gpu_mat_long; dims = 1)
group["1d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_vec)
group["1dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_vec_long)
- group["2d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat; dims=1)
- group["2dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat_long; dims=2)
+ group["2d_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat; dims = 1)
+ group["2dL_inplace"] = @benchmarkable Metal.@sync reverse!($gpu_mat_long; dims = 2)
end
# 'evals=1' added to prevent hang when running benchmarks of CI
diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index 17bf4ea0..98aa3153 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,7 +1,7 @@
# benchmark suite execution and codespeed submission
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="reverse")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "reverse")
using Metal
diff --git a/test/runtests.jl b/test/runtests.jl
index 081fc280..42f00908 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -1,5 +1,5 @@
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="reverse")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "reverse")
using Distributed
using Dates |
4c15cc1 to
108f6d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
| Benchmark suite | Current: e3ce3ae | Previous: 18d5d95 | Ratio |
|---|---|---|---|
latency/precompile |
10708835208.5 ns |
10738355812.5 ns |
1.00 |
latency/ttfp |
5088232041.5 ns |
5093095000 ns |
1.00 |
latency/import |
1309493875 ns |
1307420042 ns |
1.00 |
integration/metaldevrt |
961416 ns |
916521 ns |
1.05 |
integration/byval/slices=1 |
1650541 ns |
1655958 ns |
1.00 |
integration/byval/slices=3 |
9014521 ns |
8745792 ns |
1.03 |
integration/byval/reference |
1635625 ns |
1624791 ns |
1.01 |
integration/byval/slices=2 |
2705666 ns |
2721500 ns |
0.99 |
kernel/indexing |
697500 ns |
696291 ns |
1.00 |
kernel/indexing_checked |
701625 ns |
696584 ns |
1.01 |
kernel/launch |
14208 ns |
12416 ns |
1.14 |
array/reverse/1d |
671833.5 ns |
||
array/reverse/2dL_inplace |
3033708.5 ns |
||
array/reverse/1dL |
2367708 ns |
||
array/reverse/2d |
1462187.5 ns |
||
array/reverse/1d_inplace |
728917 ns |
||
array/reverse/2d_inplace |
931084 ns |
||
array/reverse/2dL |
6684125 ns |
||
array/reverse/1dL_inplace |
1072542 ns |
||
array/construct |
6167 ns |
5792 ns |
1.06 |
array/broadcast |
681958 ns |
665584 ns |
1.02 |
array/accumulate/Int64/1d |
1379520.5 ns |
1360750 ns |
1.01 |
array/accumulate/Int64/dims=1 |
1926125 ns |
1916333 ns |
1.01 |
array/accumulate/Int64/dims=2 |
2291958 ns |
2278146 ns |
1.01 |
array/accumulate/Int64/dims=1L |
11902291 ns |
12001125 ns |
0.99 |
array/accumulate/Int64/dims=2L |
9814458 ns |
9901666 ns |
0.99 |
array/accumulate/Float32/1d |
1268500 ns |
1245417 ns |
1.02 |
array/accumulate/Float32/dims=1 |
1676771 ns |
1669792 ns |
1.00 |
array/accumulate/Float32/dims=2 |
2014833.5 ns |
2007458 ns |
1.00 |
array/accumulate/Float32/dims=1L |
10007083 ns |
9976625 ns |
1.00 |
array/accumulate/Float32/dims=2L |
7379750.5 ns |
7388583.5 ns |
1.00 |
array/random/randn/Float32 |
816333 ns |
864875 ns |
0.94 |
array/random/randn!/Float32 |
652459 ns |
625250 ns |
1.04 |
array/random/rand!/Int64 |
578437.5 ns |
565916.5 ns |
1.02 |
array/random/rand!/Float32 |
607375 ns |
583083 ns |
1.04 |
array/random/rand/Int64 |
780084 ns |
729917 ns |
1.07 |
array/random/rand/Float32 |
613833 ns |
597917 ns |
1.03 |
array/reductions/reduce/Int64/1d |
1383125 ns |
1339354 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
1167479.5 ns |
1166145.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
1331229.5 ns |
1307792 ns |
1.02 |
array/reductions/reduce/Int64/dims=1L |
2071458 ns |
2095291 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
3649895.5 ns |
3597937.5 ns |
1.01 |
array/reductions/reduce/Float32/1d |
1093458.5 ns |
986792 ns |
1.11 |
array/reductions/reduce/Float32/dims=1 |
907417 ns |
909041.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
788458 ns |
770916 ns |
1.02 |
array/reductions/reduce/Float32/dims=1L |
1415375 ns |
1410916.5 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1933916 ns |
1933084 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
1433562.5 ns |
1350250 ns |
1.06 |
array/reductions/mapreduce/Int64/dims=1 |
1173875 ns |
1211709 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=2 |
1341708 ns |
1314292 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1L |
2114542 ns |
2102020.5 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
3624312.5 ns |
3605750 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
1063146 ns |
1059291 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
907792 ns |
901500 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
787604.5 ns |
777833 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
1410687.5 ns |
1401750 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
1955167 ns |
1936959 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
681625 ns |
660167 ns |
1.03 |
array/private/copyto!/cpu_to_gpu |
827916 ns |
805666 ns |
1.03 |
array/private/copyto!/gpu_to_cpu |
833833 ns |
827000 ns |
1.01 |
array/private/iteration/findall/int |
1699916 ns |
1676666 ns |
1.01 |
array/private/iteration/findall/bool |
1494563 ns |
1481375 ns |
1.01 |
array/private/iteration/findfirst/int |
2047687.5 ns |
2050625 ns |
1.00 |
array/private/iteration/findfirst/bool |
1953937.5 ns |
1853416.5 ns |
1.05 |
array/private/iteration/scalar |
5613812.5 ns |
3817291 ns |
1.47 |
array/private/iteration/logical |
2852041 ns |
2757458 ns |
1.03 |
array/private/iteration/findmin/1d |
2056708 ns |
2049500 ns |
1.00 |
array/private/iteration/findmin/2d |
1638417 ns |
1633375 ns |
1.00 |
array/private/copy |
560834 ns |
561500 ns |
1.00 |
array/shared/copyto!/gpu_to_gpu |
83875 ns |
84416 ns |
0.99 |
array/shared/copyto!/cpu_to_gpu |
83250 ns |
82541 ns |
1.01 |
array/shared/copyto!/gpu_to_cpu |
89958 ns |
83917 ns |
1.07 |
array/shared/iteration/findall/int |
1677542 ns |
1689292 ns |
0.99 |
array/shared/iteration/findall/bool |
1406750 ns |
1503229 ns |
0.94 |
array/shared/iteration/findfirst/int |
1457917 ns |
1457104.5 ns |
1.00 |
array/shared/iteration/findfirst/bool |
1439916 ns |
1442750 ns |
1.00 |
array/shared/iteration/scalar |
161041 ns |
155375 ns |
1.04 |
array/shared/iteration/logical |
2408104 ns |
2449333.5 ns |
0.98 |
array/shared/iteration/findmin/1d |
1581708 ns |
1513312.5 ns |
1.05 |
array/shared/iteration/findmin/2d |
1646458 ns |
1636250 ns |
1.01 |
array/shared/copy |
243917 ns |
257896 ns |
0.95 |
array/permutedims/4d |
2558750 ns |
2525958 ns |
1.01 |
array/permutedims/2d |
1293708 ns |
1278583 ns |
1.01 |
array/permutedims/3d |
1853417 ns |
1824583.5 ns |
1.02 |
metal/synchronization/stream |
15084 ns |
14541 ns |
1.04 |
metal/synchronization/context |
15417 ns |
15042 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
8cde64c to
53a0c88
Compare
c1d78e5 to
543c8ee
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #648 +/- ##
=======================================
Coverage 80.53% 80.53%
=======================================
Files 61 61
Lines 2779 2779
=======================================
Hits 2238 2238
Misses 541 541 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Let's mark this as draft until it pulls from a dev branch on GPUArrays. |
No description provided.