Skip to content

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 6, 2025

Closes #2831

NVTX disabled because of NVIDIA/NVTX#125

Copy link
Contributor

github-actions bot commented Aug 7, 2025

Your PR no longer requires formatting changes. Thank you for your contribution!

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 55b4150 Previous: 205c238 Ratio
latency/precompile 42965722378.5 ns 42934926801 ns 1.00
latency/ttfp 6988682035 ns 7008552789 ns 1.00
latency/import 3569251927 ns 3569139582 ns 1.00
integration/volumerhs 9623821 ns 9606581 ns 1.00
integration/byval/slices=1 147113 ns 147311 ns 1.00
integration/byval/slices=3 426103.5 ns 426127 ns 1.00
integration/byval/reference 144992 ns 145282 ns 1.00
integration/byval/slices=2 286580 ns 286537 ns 1.00
integration/cudadevrt 103446 ns 103674 ns 1.00
kernel/indexing 14148 ns 14638.5 ns 0.97
kernel/indexing_checked 14760 ns 15045 ns 0.98
kernel/occupancy 669.1572327044025 ns 669.9465408805031 ns 1.00
kernel/launch 2136.222222222222 ns 2202.4444444444443 ns 0.97
kernel/rand 18262 ns 17466 ns 1.05
array/reverse/1d 19620 ns 20143 ns 0.97
array/reverse/2d 24878 ns 24692 ns 1.01
array/reverse/1d_inplace 10385 ns 11332 ns 0.92
array/reverse/2d_inplace 12033 ns 13662 ns 0.88
array/copy 21221 ns 21281 ns 1.00
array/iteration/findall/int 157623.5 ns 159966.5 ns 0.99
array/iteration/findall/bool 139804 ns 141602 ns 0.99
array/iteration/findfirst/int 164485 ns 163419 ns 1.01
array/iteration/findfirst/bool 166975 ns 165377 ns 1.01
array/iteration/scalar 72738.5 ns 76152 ns 0.96
array/iteration/logical 215702 ns 219912.5 ns 0.98
array/iteration/findmin/1d 45956 ns 47580 ns 0.97
array/iteration/findmin/2d 96547.5 ns 97060 ns 0.99
array/reductions/reduce/Int64/1d 45046.5 ns 43742.5 ns 1.03
array/reductions/reduce/Int64/dims=1 49930.5 ns 47519.5 ns 1.05
array/reductions/reduce/Int64/dims=2 62791 ns 62503 ns 1.00
array/reductions/reduce/Int64/dims=1L 88804 ns 89134 ns 1.00
array/reductions/reduce/Int64/dims=2L 87453 ns 87634.5 ns 1.00
array/reductions/reduce/Float32/1d 33964 ns 35637 ns 0.95
array/reductions/reduce/Float32/dims=1 49514 ns 51967.5 ns 0.95
array/reductions/reduce/Float32/dims=2 59373 ns 59824 ns 0.99
array/reductions/reduce/Float32/dims=1L 52442 ns 52680 ns 1.00
array/reductions/reduce/Float32/dims=2L 69888 ns 70568 ns 0.99
array/reductions/mapreduce/Int64/1d 43216 ns 43514 ns 0.99
array/reductions/mapreduce/Int64/dims=1 46828 ns 46605.5 ns 1.00
array/reductions/mapreduce/Int64/dims=2 62231 ns 62143.5 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 88831 ns 89174 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 86455 ns 87305.5 ns 0.99
array/reductions/mapreduce/Float32/1d 34051 ns 35464 ns 0.96
array/reductions/mapreduce/Float32/dims=1 51567.5 ns 42505.5 ns 1.21
array/reductions/mapreduce/Float32/dims=2 59608 ns 60252 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 52631 ns 52803 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69982.5 ns 70795 ns 0.99
array/broadcast 20298 ns 20737 ns 0.98
array/copyto!/gpu_to_gpu 11138.5 ns 13192 ns 0.84
array/copyto!/cpu_to_gpu 214056 ns 217123 ns 0.99
array/copyto!/gpu_to_cpu 283193.5 ns 287100 ns 0.99
array/accumulate/Int64/1d 124851.5 ns 126109 ns 0.99
array/accumulate/Int64/dims=1 83703.5 ns 84201 ns 0.99
array/accumulate/Int64/dims=2 158323 ns 158968 ns 1.00
array/accumulate/Int64/dims=1L 1710623 ns 1710638 ns 1.00
array/accumulate/Int64/dims=2L 966448.5 ns 967410.5 ns 1.00
array/accumulate/Float32/1d 109017.5 ns 109994 ns 0.99
array/accumulate/Float32/dims=1 80827 ns 81343 ns 0.99
array/accumulate/Float32/dims=2 148200 ns 148659 ns 1.00
array/accumulate/Float32/dims=1L 1619508 ns 1619411 ns 1.00
array/accumulate/Float32/dims=2L 698507 ns 699433 ns 1.00
array/construct 1296.6 ns 1288.5 ns 1.01
array/random/randn/Float32 43715 ns 45344 ns 0.96
array/random/randn!/Float32 24865 ns 25330 ns 0.98
array/random/rand!/Int64 27314 ns 27554 ns 0.99
array/random/rand!/Float32 8707 ns 8908.333333333334 ns 0.98
array/random/rand/Int64 29719 ns 30218 ns 0.98
array/random/rand/Float32 12933 ns 13361 ns 0.97
array/permutedims/4d 59903.5 ns 60397 ns 0.99
array/permutedims/2d 54172 ns 54394 ns 1.00
array/permutedims/3d 55007 ns 55362 ns 0.99
array/sorting/1d 2757365 ns 2758561 ns 1.00
array/sorting/by 3343832 ns 3368461 ns 0.99
array/sorting/2d 1080853 ns 1089562 ns 0.99
cuda/synchronization/stream/auto 1031.2 ns 1066.6 ns 0.97
cuda/synchronization/stream/nonblocking 7171.2 ns 7691.3 ns 0.93
cuda/synchronization/stream/blocking 811.1578947368421 ns 844.0121951219512 ns 0.96
cuda/synchronization/context/auto 1164.7 ns 1211.4 ns 0.96
cuda/synchronization/context/nonblocking 7068 ns 6881.1 ns 1.03
cuda/synchronization/context/blocking 890.0363636363636 ns 924.7692307692307 ns 0.96

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit c05359d into master Aug 8, 2025
3 checks passed
@maleadt maleadt deleted the tb/cuda13 branch August 8, 2025 06:04
@amontoison
Copy link
Member

amontoison commented Aug 8, 2025

@maleadt It seems that NVIDIA broke a few headers with the release 13.0.
I wanted to start to update the interface of cusparse, cusolver, cublas, etc... But unable to regenerate the Julia wrappers from the scripts in res/wrap.

If you find a way to update the wrappers, I can take care of the high-level interfaces.

@maleadt
Copy link
Member Author

maleadt commented Aug 11, 2025

#2842

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for CUDA 13
2 participants