-
Notifications
You must be signed in to change notification settings - Fork 249
Update headers for CUDA 13. #2842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CUDA 5.8.3 supports the toolkit v13.0 and they removed all the legacy API of CUSOLVER. See JuliaGPU/CUDA.jl#2842
CUDA 5.8.3 supports the toolkit v13.0 and they removed all the legacy API of CUSOLVER. See JuliaGPU/CUDA.jl#2842
a3f8168
to
14d8969
Compare
14d8969
to
ec4b31b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2842 +/- ##
===========================================
- Coverage 89.64% 74.83% -14.82%
===========================================
Files 150 150
Lines 13229 13143 -86
===========================================
- Hits 11859 9835 -2024
- Misses 1370 3308 +1938 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ec7a874
to
5f95b39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 5f95b39 | Previous: 756ce54 | Ratio |
---|---|---|---|
latency/precompile |
43831307498.5 ns |
43117380449.5 ns |
1.02 |
latency/ttfp |
7094033713 ns |
7014904934 ns |
1.01 |
latency/import |
3642236512 ns |
3575439492 ns |
1.02 |
integration/volumerhs |
9610433.5 ns |
9615802.5 ns |
1.00 |
integration/byval/slices=1 |
147060 ns |
147023 ns |
1.00 |
integration/byval/slices=3 |
426048 ns |
425923 ns |
1.00 |
integration/byval/reference |
145057 ns |
145240 ns |
1.00 |
integration/byval/slices=2 |
286519 ns |
286571 ns |
1.00 |
integration/cudadevrt |
103588 ns |
103642 ns |
1.00 |
kernel/indexing |
14142 ns |
14460 ns |
0.98 |
kernel/indexing_checked |
14907 ns |
15152 ns |
0.98 |
kernel/occupancy |
669.7341772151899 ns |
668.3417721518987 ns |
1.00 |
kernel/launch |
2207.5555555555557 ns |
2221.6666666666665 ns |
0.99 |
kernel/rand |
14871 ns |
18586 ns |
0.80 |
array/reverse/1d |
19901.5 ns |
20105 ns |
0.99 |
array/reverse/2d |
25094 ns |
25222 ns |
0.99 |
array/reverse/1d_inplace |
10568 ns |
10753 ns |
0.98 |
array/reverse/2d_inplace |
12379 ns |
12440 ns |
1.00 |
array/copy |
20842 ns |
21263 ns |
0.98 |
array/iteration/findall/int |
157210 ns |
157788 ns |
1.00 |
array/iteration/findall/bool |
139030.5 ns |
139822 ns |
0.99 |
array/iteration/findfirst/int |
157488 ns |
165033 ns |
0.95 |
array/iteration/findfirst/bool |
158053 ns |
167860 ns |
0.94 |
array/iteration/scalar |
71112 ns |
74462 ns |
0.96 |
array/iteration/logical |
207092 ns |
216613 ns |
0.96 |
array/iteration/findmin/1d |
46638 ns |
47358 ns |
0.98 |
array/iteration/findmin/2d |
96764 ns |
97030 ns |
1.00 |
array/reductions/reduce/Int64/1d |
45630 ns |
44082.5 ns |
1.04 |
array/reductions/reduce/Int64/dims=1 |
49248.5 ns |
49423.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
63710.5 ns |
62870 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
89052 ns |
89243 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
89447.5 ns |
88709 ns |
1.01 |
array/reductions/reduce/Float32/1d |
34252 ns |
35342 ns |
0.97 |
array/reductions/reduce/Float32/dims=1 |
41759.5 ns |
52276 ns |
0.80 |
array/reductions/reduce/Float32/dims=2 |
59534 ns |
60280 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52407 ns |
52693 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
70314.5 ns |
70553.5 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
45363 ns |
44275 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1 |
50614 ns |
52605 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=2 |
62382.5 ns |
61775 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
89024 ns |
89085 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88006 ns |
87203 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
34346 ns |
35440 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=1 |
51585.5 ns |
42076.5 ns |
1.23 |
array/reductions/mapreduce/Float32/dims=2 |
59671 ns |
60334 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
52869 ns |
53110.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
70678.5 ns |
70458 ns |
1.00 |
array/broadcast |
20182 ns |
20315 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11290 ns |
12955 ns |
0.87 |
array/copyto!/cpu_to_gpu |
214786 ns |
217419 ns |
0.99 |
array/copyto!/gpu_to_cpu |
284493 ns |
283111 ns |
1.00 |
array/accumulate/Int64/1d |
125305.5 ns |
125814 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83877 ns |
84926 ns |
0.99 |
array/accumulate/Int64/dims=2 |
159257 ns |
159079.5 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1720678 ns |
1719956 ns |
1.00 |
array/accumulate/Int64/dims=2L |
967972 ns |
968078 ns |
1.00 |
array/accumulate/Float32/1d |
109759 ns |
109898 ns |
1.00 |
array/accumulate/Float32/dims=1 |
81027 ns |
81238 ns |
1.00 |
array/accumulate/Float32/dims=2 |
148485 ns |
148942.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1629583 ns |
1628483 ns |
1.00 |
array/accumulate/Float32/dims=2L |
701228 ns |
701941 ns |
1.00 |
array/construct |
1283.3 ns |
1308.95 ns |
0.98 |
array/random/randn/Float32 |
44309 ns |
45219 ns |
0.98 |
array/random/randn!/Float32 |
24914 ns |
25287 ns |
0.99 |
array/random/rand!/Int64 |
27544 ns |
27594 ns |
1.00 |
array/random/rand!/Float32 |
8810 ns |
8898.333333333334 ns |
0.99 |
array/random/rand/Int64 |
30125 ns |
38444 ns |
0.78 |
array/random/rand/Float32 |
13095 ns |
13410 ns |
0.98 |
array/permutedims/4d |
60327 ns |
60388 ns |
1.00 |
array/permutedims/2d |
54178.5 ns |
54648 ns |
0.99 |
array/permutedims/3d |
54798 ns |
55774.5 ns |
0.98 |
array/sorting/1d |
2756295 ns |
2757370 ns |
1.00 |
array/sorting/by |
3342522 ns |
3343262.5 ns |
1.00 |
array/sorting/2d |
1080089 ns |
1080991 ns |
1.00 |
cuda/synchronization/stream/auto |
1023.3 ns |
1045.4 ns |
0.98 |
cuda/synchronization/stream/nonblocking |
8331.4 ns |
7545.299999999999 ns |
1.10 |
cuda/synchronization/stream/blocking |
805.7765957446809 ns |
818.7 ns |
0.98 |
cuda/synchronization/context/auto |
1167.4 ns |
1196.3 ns |
0.98 |
cuda/synchronization/context/nonblocking |
7158.8 ns |
7154.1 ns |
1.00 |
cuda/synchronization/context/blocking |
891.9629629629629 ns |
929.3684210526316 ns |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
Passes CI locally, at least mostly. So we can go ahead with this. |
No description provided.