Skip to content

Simplify sparse type mappings#3076

Open
kshyatt wants to merge 1 commit intomasterfrom
ksh/sparse
Open

Simplify sparse type mappings#3076
kshyatt wants to merge 1 commit intomasterfrom
ksh/sparse

Conversation

@kshyatt
Copy link
Copy Markdown
Member

@kshyatt kshyatt commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 35efee6 Previous: a9a687c Ratio
array/accumulate/Float32/1d 101257 ns 102250.5 ns 0.99
array/accumulate/Float32/dims=1 76950 ns 77483 ns 0.99
array/accumulate/Float32/dims=1L 1583804 ns 1593535 ns 0.99
array/accumulate/Float32/dims=2 143805.5 ns 144723 ns 0.99
array/accumulate/Float32/dims=2L 657210 ns 661407 ns 0.99
array/accumulate/Int64/1d 118672 ns 120091 ns 0.99
array/accumulate/Int64/dims=1 80367 ns 81265.5 ns 0.99
array/accumulate/Int64/dims=1L 1694731 ns 1707476 ns 0.99
array/accumulate/Int64/dims=2 155985 ns 158147 ns 0.99
array/accumulate/Int64/dims=2L 961924 ns 962833.5 ns 1.00
array/broadcast 20770 ns 20861 ns 1.00
array/construct 1388.2 ns 1305.3 ns 1.06
array/copy 18725 ns 18977 ns 0.99
array/copyto!/cpu_to_gpu 218074 ns 218984.5 ns 1.00
array/copyto!/gpu_to_cpu 283862 ns 288110 ns 0.99
array/copyto!/gpu_to_gpu 11475 ns 11568 ns 0.99
array/iteration/findall/bool 132338.5 ns 133871.5 ns 0.99
array/iteration/findall/int 149890 ns 150576 ns 1.00
array/iteration/findfirst/bool 81470.5 ns 82372 ns 0.99
array/iteration/findfirst/int 83673 ns 84621.5 ns 0.99
array/iteration/findmin/1d 89238 ns 85921.5 ns 1.04
array/iteration/findmin/2d 117077 ns 117665 ns 1.00
array/iteration/logical 198877 ns 204118.5 ns 0.97
array/iteration/scalar 68356 ns 69670.5 ns 0.98
array/permutedims/2d 52308 ns 53206.5 ns 0.98
array/permutedims/3d 52757.5 ns 53542.5 ns 0.99
array/permutedims/4d 52335 ns 52933.5 ns 0.99
array/random/rand/Float32 13095 ns 13218 ns 0.99
array/random/rand/Int64 37392 ns 34787 ns 1.07
array/random/rand!/Float32 8582.333333333334 ns 8525.333333333334 ns 1.01
array/random/rand!/Int64 34425 ns 34352 ns 1.00
array/random/randn/Float32 43847 ns 38965 ns 1.13
array/random/randn!/Float32 31665 ns 31554 ns 1.00
array/reductions/mapreduce/Float32/1d 35002 ns 35525 ns 0.99
array/reductions/mapreduce/Float32/dims=1 40238 ns 40453.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 51798 ns 52015 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56726 ns 56943 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69040 ns 69861 ns 0.99
array/reductions/mapreduce/Int64/1d 43445 ns 43677 ns 0.99
array/reductions/mapreduce/Int64/dims=1 44863 ns 42176 ns 1.06
array/reductions/mapreduce/Int64/dims=1L 87853 ns 87856 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59601 ns 59673 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 85042.5 ns 84933 ns 1.00
array/reductions/reduce/Float32/1d 34990 ns 35822.5 ns 0.98
array/reductions/reduce/Float32/dims=1 44183 ns 42636 ns 1.04
array/reductions/reduce/Float32/dims=1L 51921.5 ns 51953 ns 1.00
array/reductions/reduce/Float32/dims=2 56857 ns 57183 ns 0.99
array/reductions/reduce/Float32/dims=2L 69514 ns 69983 ns 0.99
array/reductions/reduce/Int64/1d 43111 ns 43946 ns 0.98
array/reductions/reduce/Int64/dims=1 44815.5 ns 52864 ns 0.85
array/reductions/reduce/Int64/dims=1L 87592 ns 88045 ns 0.99
array/reductions/reduce/Int64/dims=2 59547 ns 59735 ns 1.00
array/reductions/reduce/Int64/dims=2L 84652 ns 84891 ns 1.00
array/reverse/1d 18693 ns 18427 ns 1.01
array/reverse/1dL 69235 ns 68976.5 ns 1.00
array/reverse/1dL_inplace 66105 ns 66026 ns 1.00
array/reverse/1d_inplace 8618 ns 10297.333333333334 ns 0.84
array/reverse/2d 20891 ns 20813 ns 1.00
array/reverse/2dL 72885 ns 72948 ns 1.00
array/reverse/2dL_inplace 66111 ns 66137 ns 1.00
array/reverse/2d_inplace 10379 ns 11426 ns 0.91
array/sorting/1d 2735855 ns 2736028 ns 1.00
array/sorting/2d 1068748 ns 1076314.5 ns 0.99
array/sorting/by 3304911.5 ns 3305633 ns 1.00
cuda/synchronization/context/auto 1197.6 ns 1177.1 ns 1.02
cuda/synchronization/context/blocking 937.9428571428572 ns 930.1428571428571 ns 1.01
cuda/synchronization/context/nonblocking 8403.2 ns 7222.2 ns 1.16
cuda/synchronization/stream/auto 1037.9 ns 1023.8235294117648 ns 1.01
cuda/synchronization/stream/blocking 788.2277227722773 ns 829.5588235294117 ns 0.95
cuda/synchronization/stream/nonblocking 7547.1 ns 7804.5 ns 0.97
integration/byval/reference 144068 ns 144003 ns 1.00
integration/byval/slices=1 145857 ns 145940 ns 1.00
integration/byval/slices=2 284769 ns 284688 ns 1.00
integration/byval/slices=3 423136.5 ns 423167.5 ns 1.00
integration/cudadevrt 102708 ns 102604 ns 1.00
integration/volumerhs 9430527 ns 9452453.5 ns 1.00
kernel/indexing 13544.5 ns 13481 ns 1.00
kernel/indexing_checked 14227 ns 14205 ns 1.00
kernel/launch 2210.5555555555557 ns 2129 ns 1.04
kernel/occupancy 663.6863354037267 ns 660.2115384615385 ns 1.01
kernel/rand 14679 ns 18430 ns 0.80
latency/import 3805555922 ns 3804590639.5 ns 1.00
latency/precompile 4587970948 ns 4586818051.5 ns 1.00
latency/ttfp 4385584841.5 ns 4387745362.5 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant