-
Notifications
You must be signed in to change notification settings - Fork 248
Simplify specifying benchmark output file #2814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 1f9b6e3 | Previous: e561e7a | Ratio |
---|---|---|---|
latency/precompile |
42928882636 ns |
43393378645 ns |
0.99 |
latency/ttfp |
6986973732 ns |
7099882121 ns |
0.98 |
latency/import |
3555865382 ns |
3463869374 ns |
1.03 |
integration/volumerhs |
9611650.5 ns |
9623663 ns |
1.00 |
integration/byval/slices=1 |
146998 ns |
146714 ns |
1.00 |
integration/byval/slices=3 |
426161 ns |
425787 ns |
1.00 |
integration/byval/reference |
145103 ns |
144967 ns |
1.00 |
integration/byval/slices=2 |
286545 ns |
286209 ns |
1.00 |
integration/cudadevrt |
103507 ns |
103426 ns |
1.00 |
kernel/indexing |
14323 ns |
14196 ns |
1.01 |
kernel/indexing_checked |
14916 ns |
14906 ns |
1.00 |
kernel/occupancy |
694.0466666666666 ns |
759.2189781021898 ns |
0.91 |
kernel/launch |
2142.6666666666665 ns |
2287.222222222222 ns |
0.94 |
kernel/rand |
18371 ns |
15792 ns |
1.16 |
array/reverse/1d |
19772 ns |
19624 ns |
1.01 |
array/reverse/2d |
24931 ns |
24928.5 ns |
1.00 |
array/reverse/1d_inplace |
10538 ns |
10448 ns |
1.01 |
array/reverse/2d_inplace |
12113 ns |
12006 ns |
1.01 |
array/copy |
20859 ns |
20990 ns |
0.99 |
array/iteration/findall/int |
158557 ns |
159128.5 ns |
1.00 |
array/iteration/findall/bool |
140476 ns |
139832 ns |
1.00 |
array/iteration/findfirst/int |
163335.5 ns |
162546 ns |
1.00 |
array/iteration/findfirst/bool |
166096 ns |
164393.5 ns |
1.01 |
array/iteration/scalar |
72107 ns |
72740 ns |
0.99 |
array/iteration/logical |
215497.5 ns |
216803.5 ns |
0.99 |
array/iteration/findmin/1d |
46807 ns |
45968 ns |
1.02 |
array/iteration/findmin/2d |
96415.5 ns |
96433 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43110 ns |
44555 ns |
0.97 |
array/reductions/reduce/Int64/dims=1 |
46734 ns |
48607 ns |
0.96 |
array/reductions/reduce/Int64/dims=2 |
62883 ns |
63682.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
89091 ns |
88842 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88266 ns |
89417.5 ns |
0.99 |
array/reductions/reduce/Float32/1d |
34797 ns |
34490 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
51815 ns |
50554 ns |
1.02 |
array/reductions/reduce/Float32/dims=2 |
59786 ns |
59726 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52383 ns |
52852 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
70338 ns |
70052.5 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43238.5 ns |
45547 ns |
0.95 |
array/reductions/mapreduce/Int64/dims=1 |
51677 ns |
48423.5 ns |
1.07 |
array/reductions/mapreduce/Int64/dims=2 |
62618.5 ns |
61443 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1L |
89035 ns |
88888 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
87315.5 ns |
87908.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34746 ns |
34245.5 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
41916.5 ns |
47287 ns |
0.89 |
array/reductions/mapreduce/Float32/dims=2 |
59891 ns |
59743 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52857.5 ns |
53154 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
70310 ns |
70503 ns |
1.00 |
array/broadcast |
19896 ns |
20866 ns |
0.95 |
array/copyto!/gpu_to_gpu |
11175 ns |
12817 ns |
0.87 |
array/copyto!/cpu_to_gpu |
215649.5 ns |
213873 ns |
1.01 |
array/copyto!/gpu_to_cpu |
283084 ns |
284406 ns |
1.00 |
array/accumulate/Int64/1d |
125491 ns |
125170 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83763 ns |
83519 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158115 ns |
158002 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1709771 ns |
1709945.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966596 ns |
966571 ns |
1.00 |
array/accumulate/Float32/1d |
109531 ns |
109737 ns |
1.00 |
array/accumulate/Float32/dims=1 |
81191 ns |
80823.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
148152 ns |
147778 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1619394 ns |
1619194 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698520 ns |
698530 ns |
1.00 |
array/construct |
1289.8 ns |
1279.85 ns |
1.01 |
array/random/randn/Float32 |
43954 ns |
47253.5 ns |
0.93 |
array/random/randn!/Float32 |
24863 ns |
24573 ns |
1.01 |
array/random/rand!/Int64 |
27237 ns |
27294 ns |
1.00 |
array/random/rand!/Float32 |
8784 ns |
8724.333333333334 ns |
1.01 |
array/random/rand/Int64 |
38266 ns |
29633 ns |
1.29 |
array/random/rand/Float32 |
13013 ns |
12902 ns |
1.01 |
array/permutedims/4d |
60387 ns |
61250.5 ns |
0.99 |
array/permutedims/2d |
54355.5 ns |
54865 ns |
0.99 |
array/permutedims/3d |
55314 ns |
55511 ns |
1.00 |
array/sorting/1d |
2758071 ns |
2757710 ns |
1.00 |
array/sorting/by |
3369468.5 ns |
3344132.5 ns |
1.01 |
array/sorting/2d |
1088835 ns |
1080389 ns |
1.01 |
cuda/synchronization/stream/auto |
1044.6666666666667 ns |
1015.8333333333334 ns |
1.03 |
cuda/synchronization/stream/nonblocking |
8191.4 ns |
7618.9 ns |
1.08 |
cuda/synchronization/stream/blocking |
843.5106382978723 ns |
799.1530612244898 ns |
1.06 |
cuda/synchronization/context/auto |
1161 ns |
1164.1 ns |
1.00 |
cuda/synchronization/context/nonblocking |
8420.4 ns |
7651.4 ns |
1.10 |
cuda/synchronization/context/blocking |
902.1276595744681 ns |
895.8490566037735 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
Error unrelated |
Your PR no longer requires formatting changes. Thank you for your contribution! |
Can you elaborate why this is needed? It seems like a very hacky way of feeding this into the script, why not parse an actual (but optional) CLI argument? |
Fair enough. l'll do that |
Benchmark output file is now an optional argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, it's good to use [ci skip]
or [only benchmarks]
or so to avoid kicking off a whole CI run.
Thank you!
I didn't bother because my latest push to #2815 kicked off a whole CI run anyway... |
I'm not sure how that's relevant? It's not because CI runs on another PR anyway, that it needs to run here needlessly. |
I should have been more clear in my last response. The linked PR has an |
I see. That's surprising, I'm not sure what regressed there. |
Makes no changes to CI but makes it easier to specify the benchmarks output filename without having to edit the file. Will open an equivalent PR for Metal if this is approved here.