Skip to content

Fix compat in profile#3074

Open
vchuravy wants to merge 1 commit intomasterfrom
vc/fix_compat
Open

Fix compat in profile#3074
vchuravy wants to merge 1 commit intomasterfrom
vc/fix_compat

Conversation

@vchuravy
Copy link
Copy Markdown
Member

@vchuravy vchuravy commented Apr 1, 2026

No description provided.

Copy link
Copy Markdown
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it need an additional empty line too?

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 3151c6c Previous: 0b05451 Ratio
latency/precompile 44931145637 ns 4612556023 ns 9.74
latency/ttfp 12943499837 ns 4408562143 ns 2.94
latency/import 3563511009 ns 3821839641 ns 0.93
integration/volumerhs 9440699.5 ns 9448669.5 ns 1.00
integration/byval/slices=1 146095 ns 145839 ns 1.00
integration/byval/slices=3 423324 ns 423425 ns 1.00
integration/byval/reference 144115 ns 144031 ns 1.00
integration/byval/slices=2 285137 ns 284559 ns 1.00
integration/cudadevrt 102657.5 ns 102661 ns 1.00
kernel/indexing 13221 ns 13693 ns 0.97
kernel/indexing_checked 13979 ns 14309 ns 0.98
kernel/occupancy 668.2151898734177 ns 663.8086419753087 ns 1.01
kernel/launch 2156.3333333333335 ns 2210.6666666666665 ns 0.98
kernel/rand 14582 ns 16005 ns 0.91
array/reverse/1d 18785 ns 18568 ns 1.01
array/reverse/2dL_inplace 66191 ns 66179 ns 1.00
array/reverse/1dL 69282 ns 69124 ns 1.00
array/reverse/2d 21586 ns 21081 ns 1.02
array/reverse/1d_inplace 10600.666666666666 ns 8671.333333333334 ns 1.22
array/reverse/2d_inplace 10583 ns 10405 ns 1.02
array/reverse/2dL 73490 ns 73273 ns 1.00
array/reverse/1dL_inplace 66213 ns 66158 ns 1.00
array/copy 18706 ns 18801 ns 0.99
array/iteration/findall/int 149764 ns 149677 ns 1.00
array/iteration/findall/bool 132247 ns 132815 ns 1.00
array/iteration/findfirst/int 84317 ns 84077 ns 1.00
array/iteration/findfirst/bool 82632 ns 81834 ns 1.01
array/iteration/scalar 67579 ns 67842 ns 1.00
array/iteration/logical 201911 ns 205668 ns 0.98
array/iteration/findmin/1d 89632 ns 88143.5 ns 1.02
array/iteration/findmin/2d 117317 ns 117602 ns 1.00
array/reductions/reduce/Int64/1d 43074 ns 43879 ns 0.98
array/reductions/reduce/Int64/dims=1 42319 ns 42363.5 ns 1.00
array/reductions/reduce/Int64/dims=2 59769 ns 59729 ns 1.00
array/reductions/reduce/Int64/dims=1L 87694 ns 87783 ns 1.00
array/reductions/reduce/Int64/dims=2L 84746 ns 84600.5 ns 1.00
array/reductions/reduce/Float32/1d 34775.5 ns 35554 ns 0.98
array/reductions/reduce/Float32/dims=1 39790 ns 40471.5 ns 0.98
array/reductions/reduce/Float32/dims=2 57018 ns 57348 ns 0.99
array/reductions/reduce/Float32/dims=1L 51911 ns 52155 ns 1.00
array/reductions/reduce/Float32/dims=2L 69921 ns 70051 ns 1.00
array/reductions/mapreduce/Int64/1d 42828 ns 43658 ns 0.98
array/reductions/mapreduce/Int64/dims=1 42715 ns 42850 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59449 ns 60283 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 87721 ns 87856 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84538 ns 84886 ns 1.00
array/reductions/mapreduce/Float32/1d 34528 ns 35439 ns 0.97
array/reductions/mapreduce/Float32/dims=1 40228 ns 49602 ns 0.81
array/reductions/mapreduce/Float32/dims=2 56891 ns 57112 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51793 ns 52066 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 69287 ns 70220.5 ns 0.99
array/broadcast 20728 ns 20835 ns 0.99
array/copyto!/gpu_to_gpu 11282 ns 11409 ns 0.99
array/copyto!/cpu_to_gpu 215496.5 ns 217482 ns 0.99
array/copyto!/gpu_to_cpu 283460.5 ns 284547 ns 1.00
array/accumulate/Int64/1d 118849 ns 119536 ns 0.99
array/accumulate/Int64/dims=1 79803 ns 80581.5 ns 0.99
array/accumulate/Int64/dims=2 155916 ns 156671 ns 1.00
array/accumulate/Int64/dims=1L 1694625.5 ns 1706437 ns 0.99
array/accumulate/Int64/dims=2L 961590 ns 962104 ns 1.00
array/accumulate/Float32/1d 101671 ns 102057 ns 1.00
array/accumulate/Float32/dims=1 76866 ns 77830 ns 0.99
array/accumulate/Float32/dims=2 143826 ns 144868 ns 0.99
array/accumulate/Float32/dims=1L 1585623 ns 1586085 ns 1.00
array/accumulate/Float32/dims=2L 657591 ns 661758 ns 0.99
array/construct 1291.55 ns 1349.1 ns 0.96
array/random/randn/Float32 38465 ns 42622 ns 0.90
array/random/randn!/Float32 31529 ns 31841 ns 0.99
array/random/rand!/Int64 34185 ns 34412 ns 0.99
array/random/rand!/Float32 8498.666666666666 ns 8601 ns 0.99
array/random/rand/Int64 37008.5 ns 37538 ns 0.99
array/random/rand/Float32 13002.5 ns 13018 ns 1.00
array/permutedims/4d 52636.5 ns 52619.5 ns 1.00
array/permutedims/2d 52787 ns 52940 ns 1.00
array/permutedims/3d 52988 ns 53310 ns 0.99
array/sorting/1d 2735841 ns 2735465.5 ns 1.00
array/sorting/by 3304400 ns 3315555 ns 1.00
array/sorting/2d 1069201 ns 1072718 ns 1.00
cuda/synchronization/stream/auto 1036.6470588235295 ns 1042.1 ns 0.99
cuda/synchronization/stream/nonblocking 7682 ns 7867.299999999999 ns 0.98
cuda/synchronization/stream/blocking 836.8019801980198 ns 828.4299065420561 ns 1.01
cuda/synchronization/context/auto 1203 ns 1198.8 ns 1.00
cuda/synchronization/context/nonblocking 7482.1 ns 7295 ns 1.03
cuda/synchronization/context/blocking 954.5142857142857 ns 929.6410256410256 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants