Open
Conversation
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 77d00a4 | Previous: ecb27a7 | Ratio |
|---|---|---|---|
latency/precompile |
4583534095.5 ns |
4095102458 ns |
1.12 |
latency/ttfp |
4409300069 ns |
14632855735 ns |
0.30 |
latency/import |
3825213452.5 ns |
3384844378 ns |
1.13 |
integration/volumerhs |
9435275.5 ns |
9443415 ns |
1.00 |
integration/byval/slices=1 |
145955 ns |
145838 ns |
1.00 |
integration/byval/slices=3 |
423262 ns |
422984 ns |
1.00 |
integration/byval/reference |
144037 ns |
144026 ns |
1.00 |
integration/byval/slices=2 |
284655 ns |
284568.5 ns |
1.00 |
integration/cudadevrt |
102687 ns |
102657 ns |
1.00 |
kernel/indexing |
13636 ns |
13364 ns |
1.02 |
kernel/indexing_checked |
14263 ns |
14057 ns |
1.01 |
kernel/occupancy |
730.0555555555557 ns |
669.0063694267516 ns |
1.09 |
kernel/launch |
2228.5555555555557 ns |
2129.7 ns |
1.05 |
kernel/rand |
15954 ns |
14468 ns |
1.10 |
array/reverse/1d |
18844 ns |
18380 ns |
1.03 |
array/reverse/2dL_inplace |
66117 ns |
65953 ns |
1.00 |
array/reverse/1dL |
69408 ns |
68908.5 ns |
1.01 |
array/reverse/2d |
21233 ns |
20404 ns |
1.04 |
array/reverse/1d_inplace |
10320.333333333334 ns |
8469.666666666666 ns |
1.22 |
array/reverse/2d_inplace |
11664 ns |
10199 ns |
1.14 |
array/reverse/2dL |
73235 ns |
72407 ns |
1.01 |
array/reverse/1dL_inplace |
66058 ns |
65821 ns |
1.00 |
array/copy |
19083 ns |
18385 ns |
1.04 |
array/iteration/findall/int |
150383.5 ns |
148396.5 ns |
1.01 |
array/iteration/findall/bool |
133286 ns |
131596 ns |
1.01 |
array/iteration/findfirst/int |
84538 ns |
82845 ns |
1.02 |
array/iteration/findfirst/bool |
82499 ns |
81148.5 ns |
1.02 |
array/iteration/scalar |
68951 ns |
68544 ns |
1.01 |
array/iteration/logical |
203658 ns |
197158 ns |
1.03 |
array/iteration/findmin/1d |
88631 ns |
86597.5 ns |
1.02 |
array/iteration/findmin/2d |
117501 ns |
116930 ns |
1.00 |
array/reductions/reduce/Int64/1d |
44048 ns |
42549 ns |
1.04 |
array/reductions/reduce/Int64/dims=1 |
43000.5 ns |
43265 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
60230 ns |
59433 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
88058 ns |
87460 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
85594 ns |
84755 ns |
1.01 |
array/reductions/reduce/Float32/1d |
35508 ns |
34245 ns |
1.04 |
array/reductions/reduce/Float32/dims=1 |
46980 ns |
39504 ns |
1.19 |
array/reductions/reduce/Float32/dims=2 |
57520 ns |
56666.5 ns |
1.02 |
array/reductions/reduce/Float32/dims=1L |
52341 ns |
51754 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
70395 ns |
69136 ns |
1.02 |
array/reductions/mapreduce/Int64/1d |
43667 ns |
42523 ns |
1.03 |
array/reductions/mapreduce/Int64/dims=1 |
42924.5 ns |
41960.5 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=2 |
60047 ns |
59584 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
88007 ns |
87504 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
85593 ns |
84897 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
35487 ns |
34058 ns |
1.04 |
array/reductions/mapreduce/Float32/dims=1 |
40739 ns |
39925.5 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=2 |
57473 ns |
56515 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1L |
52213 ns |
51712 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
70041 ns |
68949 ns |
1.02 |
array/broadcast |
20699 ns |
20526 ns |
1.01 |
array/copyto!/gpu_to_gpu |
11605 ns |
11259 ns |
1.03 |
array/copyto!/cpu_to_gpu |
217253 ns |
215190.5 ns |
1.01 |
array/copyto!/gpu_to_cpu |
286557 ns |
280520 ns |
1.02 |
array/accumulate/Int64/1d |
119099 ns |
118622 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80970 ns |
79915 ns |
1.01 |
array/accumulate/Int64/dims=2 |
157467 ns |
156043 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1695571 ns |
1705195.5 ns |
0.99 |
array/accumulate/Int64/dims=2L |
962881 ns |
961432 ns |
1.00 |
array/accumulate/Float32/1d |
102052 ns |
101378 ns |
1.01 |
array/accumulate/Float32/dims=1 |
77635 ns |
76864 ns |
1.01 |
array/accumulate/Float32/dims=2 |
144878 ns |
143449 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1586265 ns |
1592307.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
658698 ns |
660195 ns |
1.00 |
array/construct |
1330.8 ns |
1314.9 ns |
1.01 |
array/random/randn/Float32 |
38785 ns |
37667 ns |
1.03 |
array/random/randn!/Float32 |
31766.5 ns |
31476 ns |
1.01 |
array/random/rand!/Int64 |
34407 ns |
34492 ns |
1.00 |
array/random/rand!/Float32 |
8622.666666666666 ns |
8454.333333333334 ns |
1.02 |
array/random/rand/Int64 |
30001 ns |
37106 ns |
0.81 |
array/random/rand/Float32 |
13451 ns |
12822 ns |
1.05 |
array/permutedims/4d |
51855 ns |
52569 ns |
0.99 |
array/permutedims/2d |
52665 ns |
52223 ns |
1.01 |
array/permutedims/3d |
53154 ns |
52428.5 ns |
1.01 |
array/sorting/1d |
2737384 ns |
2743369 ns |
1.00 |
array/sorting/by |
3305916 ns |
3313654 ns |
1.00 |
array/sorting/2d |
1069615 ns |
1074587 ns |
1.00 |
cuda/synchronization/stream/auto |
1041.9 ns |
1044.4615384615386 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
7511 ns |
8092.2 ns |
0.93 |
cuda/synchronization/stream/blocking |
835.8018867924528 ns |
845.875 ns |
0.99 |
cuda/synchronization/context/auto |
1187.8 ns |
1177.7 ns |
1.01 |
cuda/synchronization/context/nonblocking |
8148.4 ns |
7929.299999999999 ns |
1.03 |
cuda/synchronization/context/blocking |
948.2 ns |
932.4857142857143 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before:
After:
Or, even more spectactular: the profiler
Before:
After:
Depends on JuliaGPU/GPUCompiler.jl#776