Skip to content

Conversation

christiangnrd
Copy link
Member

Makes no changes to CI but makes it easier to specify the benchmarks output filename without having to edit the file. Will open an equivalent PR for Metal if this is approved here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 1f9b6e3 Previous: e561e7a Ratio
latency/precompile 42928882636 ns 43393378645 ns 0.99
latency/ttfp 6986973732 ns 7099882121 ns 0.98
latency/import 3555865382 ns 3463869374 ns 1.03
integration/volumerhs 9611650.5 ns 9623663 ns 1.00
integration/byval/slices=1 146998 ns 146714 ns 1.00
integration/byval/slices=3 426161 ns 425787 ns 1.00
integration/byval/reference 145103 ns 144967 ns 1.00
integration/byval/slices=2 286545 ns 286209 ns 1.00
integration/cudadevrt 103507 ns 103426 ns 1.00
kernel/indexing 14323 ns 14196 ns 1.01
kernel/indexing_checked 14916 ns 14906 ns 1.00
kernel/occupancy 694.0466666666666 ns 759.2189781021898 ns 0.91
kernel/launch 2142.6666666666665 ns 2287.222222222222 ns 0.94
kernel/rand 18371 ns 15792 ns 1.16
array/reverse/1d 19772 ns 19624 ns 1.01
array/reverse/2d 24931 ns 24928.5 ns 1.00
array/reverse/1d_inplace 10538 ns 10448 ns 1.01
array/reverse/2d_inplace 12113 ns 12006 ns 1.01
array/copy 20859 ns 20990 ns 0.99
array/iteration/findall/int 158557 ns 159128.5 ns 1.00
array/iteration/findall/bool 140476 ns 139832 ns 1.00
array/iteration/findfirst/int 163335.5 ns 162546 ns 1.00
array/iteration/findfirst/bool 166096 ns 164393.5 ns 1.01
array/iteration/scalar 72107 ns 72740 ns 0.99
array/iteration/logical 215497.5 ns 216803.5 ns 0.99
array/iteration/findmin/1d 46807 ns 45968 ns 1.02
array/iteration/findmin/2d 96415.5 ns 96433 ns 1.00
array/reductions/reduce/Int64/1d 43110 ns 44555 ns 0.97
array/reductions/reduce/Int64/dims=1 46734 ns 48607 ns 0.96
array/reductions/reduce/Int64/dims=2 62883 ns 63682.5 ns 0.99
array/reductions/reduce/Int64/dims=1L 89091 ns 88842 ns 1.00
array/reductions/reduce/Int64/dims=2L 88266 ns 89417.5 ns 0.99
array/reductions/reduce/Float32/1d 34797 ns 34490 ns 1.01
array/reductions/reduce/Float32/dims=1 51815 ns 50554 ns 1.02
array/reductions/reduce/Float32/dims=2 59786 ns 59726 ns 1.00
array/reductions/reduce/Float32/dims=1L 52383 ns 52852 ns 0.99
array/reductions/reduce/Float32/dims=2L 70338 ns 70052.5 ns 1.00
array/reductions/mapreduce/Int64/1d 43238.5 ns 45547 ns 0.95
array/reductions/mapreduce/Int64/dims=1 51677 ns 48423.5 ns 1.07
array/reductions/mapreduce/Int64/dims=2 62618.5 ns 61443 ns 1.02
array/reductions/mapreduce/Int64/dims=1L 89035 ns 88888 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87315.5 ns 87908.5 ns 0.99
array/reductions/mapreduce/Float32/1d 34746 ns 34245.5 ns 1.01
array/reductions/mapreduce/Float32/dims=1 41916.5 ns 47287 ns 0.89
array/reductions/mapreduce/Float32/dims=2 59891 ns 59743 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52857.5 ns 53154 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 70310 ns 70503 ns 1.00
array/broadcast 19896 ns 20866 ns 0.95
array/copyto!/gpu_to_gpu 11175 ns 12817 ns 0.87
array/copyto!/cpu_to_gpu 215649.5 ns 213873 ns 1.01
array/copyto!/gpu_to_cpu 283084 ns 284406 ns 1.00
array/accumulate/Int64/1d 125491 ns 125170 ns 1.00
array/accumulate/Int64/dims=1 83763 ns 83519 ns 1.00
array/accumulate/Int64/dims=2 158115 ns 158002 ns 1.00
array/accumulate/Int64/dims=1L 1709771 ns 1709945.5 ns 1.00
array/accumulate/Int64/dims=2L 966596 ns 966571 ns 1.00
array/accumulate/Float32/1d 109531 ns 109737 ns 1.00
array/accumulate/Float32/dims=1 81191 ns 80823.5 ns 1.00
array/accumulate/Float32/dims=2 148152 ns 147778 ns 1.00
array/accumulate/Float32/dims=1L 1619394 ns 1619194 ns 1.00
array/accumulate/Float32/dims=2L 698520 ns 698530 ns 1.00
array/construct 1289.8 ns 1279.85 ns 1.01
array/random/randn/Float32 43954 ns 47253.5 ns 0.93
array/random/randn!/Float32 24863 ns 24573 ns 1.01
array/random/rand!/Int64 27237 ns 27294 ns 1.00
array/random/rand!/Float32 8784 ns 8724.333333333334 ns 1.01
array/random/rand/Int64 38266 ns 29633 ns 1.29
array/random/rand/Float32 13013 ns 12902 ns 1.01
array/permutedims/4d 60387 ns 61250.5 ns 0.99
array/permutedims/2d 54355.5 ns 54865 ns 0.99
array/permutedims/3d 55314 ns 55511 ns 1.00
array/sorting/1d 2758071 ns 2757710 ns 1.00
array/sorting/by 3369468.5 ns 3344132.5 ns 1.01
array/sorting/2d 1088835 ns 1080389 ns 1.01
cuda/synchronization/stream/auto 1044.6666666666667 ns 1015.8333333333334 ns 1.03
cuda/synchronization/stream/nonblocking 8191.4 ns 7618.9 ns 1.08
cuda/synchronization/stream/blocking 843.5106382978723 ns 799.1530612244898 ns 1.06
cuda/synchronization/context/auto 1161 ns 1164.1 ns 1.00
cuda/synchronization/context/nonblocking 8420.4 ns 7651.4 ns 1.10
cuda/synchronization/context/blocking 902.1276595744681 ns 895.8490566037735 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd
Copy link
Member Author

Error unrelated

Copy link
Contributor

github-actions bot commented Jul 22, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!

@maleadt
Copy link
Member

maleadt commented Jul 23, 2025

Can you elaborate why this is needed? It seems like a very hacky way of feeding this into the script, why not parse an actual (but optional) CLI argument?

@christiangnrd
Copy link
Member Author

Can you elaborate why this is needed? It seems like a very hacky way of feeding this into the script, why not parse an actual (but optional) CLI argument?

Fair enough. l'll do that

@christiangnrd
Copy link
Member Author

Benchmark output file is now an optional argument

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it's good to use [ci skip] or [only benchmarks] or so to avoid kicking off a whole CI run.

@maleadt maleadt merged commit 4f38802 into JuliaGPU:master Jul 23, 2025
1 of 3 checks passed
@christiangnrd christiangnrd deleted the bencheasy branch July 23, 2025 19:49
@christiangnrd
Copy link
Member Author

christiangnrd commented Jul 23, 2025

Thank you!

FYI, it's good to use [ci skip] or [only benchmarks] or so to avoid kicking off a whole CI run.

I didn't bother because my latest push to #2815 kicked off a whole CI run anyway...

@maleadt
Copy link
Member

maleadt commented Jul 23, 2025

I didn't bother because my latest push to #2815 kicked off a whole CI run anyway...

I'm not sure how that's relevant? It's not because CI runs on another PR anyway, that it needs to run here needlessly.

@christiangnrd
Copy link
Member Author

I should have been more clear in my last response. The linked PR has an [only benchmarks] in the commit description but full CI ran anyway, so I assumed that the CI config would also ignore it here. In the future I'll make sure to properly tag my commits for CI regardless.

@maleadt
Copy link
Member

maleadt commented Jul 23, 2025

I see. That's surprising, I'm not sure what regressed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants