Skip to content

Conversation

@vchuravy
Copy link
Member

No description provided.

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy marked this pull request as ready for review February 17, 2025 09:38
@vchuravy vchuravy merged commit da32c2b into main Feb 17, 2025
28 of 35 checks passed
@vchuravy vchuravy deleted the users/vchuravy/02-17-switch_format.yml_to_cuda.jl_style branch February 17, 2025 09:43
@github-actions
Copy link
Contributor

Benchmark Results

main 0c3bc1c... main/0c3bc1cc6b22bf...
saxpy/default/Float16/1024 0.738 ± 0.007 μs 0.738 ± 0.0093 μs 1
saxpy/default/Float16/1048576 0.176 ± 0.0084 ms 0.174 ± 0.008 ms 1.02
saxpy/default/Float16/16384 3.35 ± 0.053 μs 3.34 ± 0.025 μs 1
saxpy/default/Float16/2048 0.914 ± 0.011 μs 0.914 ± 0.011 μs 1
saxpy/default/Float16/256 0.596 ± 0.0053 μs 0.592 ± 0.0077 μs 1.01
saxpy/default/Float16/262144 0.0444 ± 0.00091 ms 0.0442 ± 0.00043 ms 1
saxpy/default/Float16/32768 6.04 ± 0.097 μs 6.01 ± 0.052 μs 1
saxpy/default/Float16/4096 1.32 ± 0.026 μs 1.32 ± 0.025 μs 1
saxpy/default/Float16/512 0.654 ± 0.0068 μs 0.649 ± 0.0091 μs 1.01
saxpy/default/Float16/64 0.565 ± 0.005 μs 0.559 ± 0.0062 μs 1.01
saxpy/default/Float16/65536 11.7 ± 0.18 μs 11.7 ± 0.12 μs 1
saxpy/default/Float32/1024 0.638 ± 0.011 μs 0.646 ± 0.012 μs 0.988
saxpy/default/Float32/1048576 0.209 ± 0.038 ms 0.238 ± 0.019 ms 0.875
saxpy/default/Float32/16384 2.84 ± 0.36 μs 2.78 ± 0.19 μs 1.02
saxpy/default/Float32/2048 0.761 ± 0.027 μs 0.76 ± 0.055 μs 1
saxpy/default/Float32/256 0.566 ± 0.0057 μs 0.577 ± 0.0077 μs 0.982
saxpy/default/Float32/262144 0.0448 ± 0.0041 ms 0.0577 ± 0.0034 ms 0.777
saxpy/default/Float32/32768 5.5 ± 0.81 μs 5.28 ± 0.29 μs 1.04
saxpy/default/Float32/4096 1.13 ± 0.064 μs 1.15 ± 0.11 μs 0.982
saxpy/default/Float32/512 0.602 ± 0.0082 μs 0.609 ± 0.011 μs 0.988
saxpy/default/Float32/64 0.558 ± 0.0051 μs 0.564 ± 0.0056 μs 0.989
saxpy/default/Float32/65536 12.2 ± 1.2 μs 12.4 ± 0.77 μs 0.983
saxpy/default/Float64/1024 0.742 ± 0.026 μs 0.753 ± 0.072 μs 0.986
saxpy/default/Float64/1048576 0.488 ± 0.054 ms 0.523 ± 0.037 ms 0.934
saxpy/default/Float64/16384 5.4 ± 0.73 μs 5.29 ± 0.34 μs 1.02
saxpy/default/Float64/2048 1.13 ± 0.081 μs 1.15 ± 0.1 μs 0.976
saxpy/default/Float64/256 0.571 ± 0.0078 μs 0.575 ± 0.0066 μs 0.993
saxpy/default/Float64/262144 0.0915 ± 0.01 ms 0.0907 ± 0.012 ms 1.01
saxpy/default/Float64/32768 12.3 ± 1.1 μs 12.4 ± 0.73 μs 0.985
saxpy/default/Float64/4096 1.73 ± 0.23 μs 1.69 ± 0.15 μs 1.02
saxpy/default/Float64/512 0.624 ± 0.01 μs 0.626 ± 0.01 μs 0.997
saxpy/default/Float64/64 0.549 ± 0.0065 μs 0.556 ± 0.0065 μs 0.988
saxpy/default/Float64/65536 23.7 ± 2.5 μs 28.6 ± 1.5 μs 0.829
saxpy/static workgroup=(1024,)/Float16/1024 2.17 ± 0.027 μs 2.19 ± 0.028 μs 0.994
saxpy/static workgroup=(1024,)/Float16/1048576 0.158 ± 0.0088 ms 0.158 ± 0.0088 ms 0.998
saxpy/static workgroup=(1024,)/Float16/16384 4.44 ± 0.13 μs 4.43 ± 0.072 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.35 ± 0.032 μs 2.36 ± 0.03 μs 0.996
saxpy/static workgroup=(1024,)/Float16/256 2.81 ± 0.033 μs 2.81 ± 0.031 μs 1
saxpy/static workgroup=(1024,)/Float16/262144 0.0426 ± 0.0028 ms 0.0427 ± 0.0015 ms 0.998
saxpy/static workgroup=(1024,)/Float16/32768 6.86 ± 0.22 μs 6.9 ± 0.2 μs 0.994
saxpy/static workgroup=(1024,)/Float16/4096 2.67 ± 0.046 μs 2.67 ± 0.041 μs 1
saxpy/static workgroup=(1024,)/Float16/512 3.25 ± 0.036 μs 3.26 ± 0.036 μs 0.999
saxpy/static workgroup=(1024,)/Float16/64 2.51 ± 0.21 μs 2.51 ± 0.21 μs 1
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.31 μs 12.7 ± 0.37 μs 0.981
saxpy/static workgroup=(1024,)/Float32/1024 2.21 ± 0.035 μs 2.21 ± 0.03 μs 1
saxpy/static workgroup=(1024,)/Float32/1048576 0.204 ± 0.025 ms 0.245 ± 0.017 ms 0.836
saxpy/static workgroup=(1024,)/Float32/16384 4.55 ± 0.71 μs 4.39 ± 0.33 μs 1.04
saxpy/static workgroup=(1024,)/Float32/2048 2.36 ± 0.04 μs 2.37 ± 0.059 μs 0.997
saxpy/static workgroup=(1024,)/Float32/256 2.66 ± 0.058 μs 2.68 ± 0.046 μs 0.993
saxpy/static workgroup=(1024,)/Float32/262144 0.0493 ± 0.0048 ms 0.0613 ± 0.0029 ms 0.805
saxpy/static workgroup=(1024,)/Float32/32768 7.52 ± 0.71 μs 7.35 ± 0.61 μs 1.02
saxpy/static workgroup=(1024,)/Float32/4096 2.65 ± 0.062 μs 2.66 ± 0.096 μs 0.995
saxpy/static workgroup=(1024,)/Float32/512 2.67 ± 0.037 μs 2.69 ± 0.032 μs 0.992
saxpy/static workgroup=(1024,)/Float32/64 2.7 ± 5.7 μs 2.72 ± 5.3 μs 0.994
saxpy/static workgroup=(1024,)/Float32/65536 14.4 ± 1.3 μs 15.3 ± 0.97 μs 0.941
saxpy/static workgroup=(1024,)/Float64/1024 2.32 ± 0.062 μs 2.31 ± 0.074 μs 1
saxpy/static workgroup=(1024,)/Float64/1048576 0.537 ± 0.051 ms 0.516 ± 0.023 ms 1.04
saxpy/static workgroup=(1024,)/Float64/16384 7.23 ± 0.61 μs 7.28 ± 0.46 μs 0.994
saxpy/static workgroup=(1024,)/Float64/2048 2.59 ± 0.08 μs 2.61 ± 0.1 μs 0.994
saxpy/static workgroup=(1024,)/Float64/256 2.63 ± 0.053 μs 2.65 ± 0.063 μs 0.996
saxpy/static workgroup=(1024,)/Float64/262144 0.1 ± 0.014 ms 0.118 ± 0.007 ms 0.849
saxpy/static workgroup=(1024,)/Float64/32768 14.6 ± 1.5 μs 15.2 ± 0.98 μs 0.962
saxpy/static workgroup=(1024,)/Float64/4096 3.2 ± 0.24 μs 3.17 ± 0.24 μs 1.01
saxpy/static workgroup=(1024,)/Float64/512 2.65 ± 0.055 μs 2.65 ± 0.065 μs 1
saxpy/static workgroup=(1024,)/Float64/64 2.6 ± 0.065 μs 2.59 ± 0.066 μs 1
saxpy/static workgroup=(1024,)/Float64/65536 26.5 ± 2.5 μs 31.2 ± 1.4 μs 0.849
time_to_load 0.325 ± 0.0018 s 0.321 ± 0.0058 s 1.01

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants