-
Notifications
You must be signed in to change notification settings - Fork 248
Initial compatibility with CUDA 13 #2834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Your PR no longer requires formatting changes. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 55b4150 | Previous: 205c238 | Ratio |
---|---|---|---|
latency/precompile |
42965722378.5 ns |
42934926801 ns |
1.00 |
latency/ttfp |
6988682035 ns |
7008552789 ns |
1.00 |
latency/import |
3569251927 ns |
3569139582 ns |
1.00 |
integration/volumerhs |
9623821 ns |
9606581 ns |
1.00 |
integration/byval/slices=1 |
147113 ns |
147311 ns |
1.00 |
integration/byval/slices=3 |
426103.5 ns |
426127 ns |
1.00 |
integration/byval/reference |
144992 ns |
145282 ns |
1.00 |
integration/byval/slices=2 |
286580 ns |
286537 ns |
1.00 |
integration/cudadevrt |
103446 ns |
103674 ns |
1.00 |
kernel/indexing |
14148 ns |
14638.5 ns |
0.97 |
kernel/indexing_checked |
14760 ns |
15045 ns |
0.98 |
kernel/occupancy |
669.1572327044025 ns |
669.9465408805031 ns |
1.00 |
kernel/launch |
2136.222222222222 ns |
2202.4444444444443 ns |
0.97 |
kernel/rand |
18262 ns |
17466 ns |
1.05 |
array/reverse/1d |
19620 ns |
20143 ns |
0.97 |
array/reverse/2d |
24878 ns |
24692 ns |
1.01 |
array/reverse/1d_inplace |
10385 ns |
11332 ns |
0.92 |
array/reverse/2d_inplace |
12033 ns |
13662 ns |
0.88 |
array/copy |
21221 ns |
21281 ns |
1.00 |
array/iteration/findall/int |
157623.5 ns |
159966.5 ns |
0.99 |
array/iteration/findall/bool |
139804 ns |
141602 ns |
0.99 |
array/iteration/findfirst/int |
164485 ns |
163419 ns |
1.01 |
array/iteration/findfirst/bool |
166975 ns |
165377 ns |
1.01 |
array/iteration/scalar |
72738.5 ns |
76152 ns |
0.96 |
array/iteration/logical |
215702 ns |
219912.5 ns |
0.98 |
array/iteration/findmin/1d |
45956 ns |
47580 ns |
0.97 |
array/iteration/findmin/2d |
96547.5 ns |
97060 ns |
0.99 |
array/reductions/reduce/Int64/1d |
45046.5 ns |
43742.5 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
49930.5 ns |
47519.5 ns |
1.05 |
array/reductions/reduce/Int64/dims=2 |
62791 ns |
62503 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
88804 ns |
89134 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87453 ns |
87634.5 ns |
1.00 |
array/reductions/reduce/Float32/1d |
33964 ns |
35637 ns |
0.95 |
array/reductions/reduce/Float32/dims=1 |
49514 ns |
51967.5 ns |
0.95 |
array/reductions/reduce/Float32/dims=2 |
59373 ns |
59824 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52442 ns |
52680 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69888 ns |
70568 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
43216 ns |
43514 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
46828 ns |
46605.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
62231 ns |
62143.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
88831 ns |
89174 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
86455 ns |
87305.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34051 ns |
35464 ns |
0.96 |
array/reductions/mapreduce/Float32/dims=1 |
51567.5 ns |
42505.5 ns |
1.21 |
array/reductions/mapreduce/Float32/dims=2 |
59608 ns |
60252 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
52631 ns |
52803 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69982.5 ns |
70795 ns |
0.99 |
array/broadcast |
20298 ns |
20737 ns |
0.98 |
array/copyto!/gpu_to_gpu |
11138.5 ns |
13192 ns |
0.84 |
array/copyto!/cpu_to_gpu |
214056 ns |
217123 ns |
0.99 |
array/copyto!/gpu_to_cpu |
283193.5 ns |
287100 ns |
0.99 |
array/accumulate/Int64/1d |
124851.5 ns |
126109 ns |
0.99 |
array/accumulate/Int64/dims=1 |
83703.5 ns |
84201 ns |
0.99 |
array/accumulate/Int64/dims=2 |
158323 ns |
158968 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1710623 ns |
1710638 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966448.5 ns |
967410.5 ns |
1.00 |
array/accumulate/Float32/1d |
109017.5 ns |
109994 ns |
0.99 |
array/accumulate/Float32/dims=1 |
80827 ns |
81343 ns |
0.99 |
array/accumulate/Float32/dims=2 |
148200 ns |
148659 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1619508 ns |
1619411 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698507 ns |
699433 ns |
1.00 |
array/construct |
1296.6 ns |
1288.5 ns |
1.01 |
array/random/randn/Float32 |
43715 ns |
45344 ns |
0.96 |
array/random/randn!/Float32 |
24865 ns |
25330 ns |
0.98 |
array/random/rand!/Int64 |
27314 ns |
27554 ns |
0.99 |
array/random/rand!/Float32 |
8707 ns |
8908.333333333334 ns |
0.98 |
array/random/rand/Int64 |
29719 ns |
30218 ns |
0.98 |
array/random/rand/Float32 |
12933 ns |
13361 ns |
0.97 |
array/permutedims/4d |
59903.5 ns |
60397 ns |
0.99 |
array/permutedims/2d |
54172 ns |
54394 ns |
1.00 |
array/permutedims/3d |
55007 ns |
55362 ns |
0.99 |
array/sorting/1d |
2757365 ns |
2758561 ns |
1.00 |
array/sorting/by |
3343832 ns |
3368461 ns |
0.99 |
array/sorting/2d |
1080853 ns |
1089562 ns |
0.99 |
cuda/synchronization/stream/auto |
1031.2 ns |
1066.6 ns |
0.97 |
cuda/synchronization/stream/nonblocking |
7171.2 ns |
7691.3 ns |
0.93 |
cuda/synchronization/stream/blocking |
811.1578947368421 ns |
844.0121951219512 ns |
0.96 |
cuda/synchronization/context/auto |
1164.7 ns |
1211.4 ns |
0.96 |
cuda/synchronization/context/nonblocking |
7068 ns |
6881.1 ns |
1.03 |
cuda/synchronization/context/blocking |
890.0363636363636 ns |
924.7692307692307 ns |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
@maleadt It seems that NVIDIA broke a few headers with the release 13.0. If you find a way to update the wrappers, I can take care of the high-level interfaces. |
Closes #2831
NVTX disabled because of NVIDIA/NVTX#125