Skip to content

Conversation

@vchuravy
Copy link
Member

No description provided.

@github-actions
Copy link
Contributor

Benchmark Results

main 6bf407b... main/6bf407b1bfc3fa...
saxpy/default/Float16/1024 0.721 ± 0.0065 μs 0.756 ± 0.0078 μs 0.954
saxpy/default/Float16/1048576 0.173 ± 0.0082 ms 0.174 ± 0.0084 ms 0.994
saxpy/default/Float16/16384 3.32 ± 0.025 μs 3.34 ± 0.028 μs 0.993
saxpy/default/Float16/2048 0.895 ± 0.01 μs 0.934 ± 0.012 μs 0.958
saxpy/default/Float16/256 0.575 ± 0.0045 μs 0.603 ± 0.0055 μs 0.954
saxpy/default/Float16/262144 0.0437 ± 0.00032 ms 0.0439 ± 0.00061 ms 0.996
saxpy/default/Float16/32768 6 ± 0.059 μs 6.02 ± 0.055 μs 0.996
saxpy/default/Float16/4096 1.29 ± 0.024 μs 1.32 ± 0.026 μs 0.975
saxpy/default/Float16/512 0.637 ± 0.0056 μs 0.671 ± 0.0067 μs 0.95
saxpy/default/Float16/64 0.55 ± 0.0042 μs 0.576 ± 0.0069 μs 0.956
saxpy/default/Float16/65536 11.6 ± 0.093 μs 11.6 ± 0.1 μs 0.997
saxpy/default/Float32/1024 0.636 ± 0.014 μs 0.662 ± 0.011 μs 0.961
saxpy/default/Float32/1048576 0.192 ± 0.033 ms 0.197 ± 0.03 ms 0.973
saxpy/default/Float32/16384 2.8 ± 0.43 μs 2.88 ± 0.31 μs 0.971
saxpy/default/Float32/2048 0.741 ± 0.026 μs 0.769 ± 0.072 μs 0.963
saxpy/default/Float32/256 0.564 ± 0.0086 μs 0.581 ± 0.0083 μs 0.971
saxpy/default/Float32/262144 0.0448 ± 0.0042 ms 0.045 ± 0.0042 ms 0.996
saxpy/default/Float32/32768 5.35 ± 0.61 μs 5.49 ± 0.44 μs 0.974
saxpy/default/Float32/4096 1.12 ± 0.08 μs 1.18 ± 0.12 μs 0.95
saxpy/default/Float32/512 0.602 ± 0.0093 μs 0.623 ± 0.0097 μs 0.966
saxpy/default/Float32/64 0.555 ± 0.0057 μs 0.569 ± 0.007 μs 0.975
saxpy/default/Float32/65536 12 ± 1.3 μs 12.3 ± 1.1 μs 0.975
saxpy/default/Float64/1024 0.739 ± 0.027 μs 0.769 ± 0.077 μs 0.96
saxpy/default/Float64/1048576 0.49 ± 0.048 ms 0.483 ± 0.044 ms 1.02
saxpy/default/Float64/16384 5.33 ± 0.57 μs 5.45 ± 0.41 μs 0.978
saxpy/default/Float64/2048 1.12 ± 0.1 μs 1.2 ± 0.13 μs 0.936
saxpy/default/Float64/256 0.567 ± 0.008 μs 0.59 ± 0.0066 μs 0.961
saxpy/default/Float64/262144 0.0901 ± 0.0094 ms 0.0891 ± 0.0077 ms 1.01
saxpy/default/Float64/32768 12 ± 1.2 μs 12.1 ± 0.99 μs 0.991
saxpy/default/Float64/4096 1.7 ± 0.25 μs 1.73 ± 0.14 μs 0.98
saxpy/default/Float64/512 0.618 ± 0.012 μs 0.647 ± 0.0085 μs 0.955
saxpy/default/Float64/64 0.545 ± 0.0058 μs 0.566 ± 0.007 μs 0.963
saxpy/default/Float64/65536 23.3 ± 2.4 μs 24.2 ± 2.7 μs 0.964
saxpy/static workgroup=(1024,)/Float16/1024 2.2 ± 0.024 μs 2.2 ± 0.029 μs 1
saxpy/static workgroup=(1024,)/Float16/1048576 0.159 ± 0.0092 ms 0.159 ± 0.0095 ms 0.997
saxpy/static workgroup=(1024,)/Float16/16384 4.44 ± 0.077 μs 4.43 ± 0.072 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.36 ± 0.025 μs 2.37 ± 0.03 μs 0.999
saxpy/static workgroup=(1024,)/Float16/256 2.78 ± 0.033 μs 2.84 ± 0.037 μs 0.981
saxpy/static workgroup=(1024,)/Float16/262144 0.0418 ± 0.001 ms 0.0418 ± 0.0015 ms 0.999
saxpy/static workgroup=(1024,)/Float16/32768 6.89 ± 0.16 μs 6.85 ± 0.17 μs 1.01
saxpy/static workgroup=(1024,)/Float16/4096 2.68 ± 0.035 μs 2.69 ± 0.04 μs 0.998
saxpy/static workgroup=(1024,)/Float16/512 3.24 ± 0.037 μs 3.28 ± 0.043 μs 0.987
saxpy/static workgroup=(1024,)/Float16/64 2.48 ± 0.21 μs 2.53 ± 0.22 μs 0.98
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.27 μs 12.5 ± 0.35 μs 1
saxpy/static workgroup=(1024,)/Float32/1024 2.2 ± 0.031 μs 2.23 ± 0.033 μs 0.986
saxpy/static workgroup=(1024,)/Float32/1048576 0.2 ± 0.024 ms 0.237 ± 0.02 ms 0.842
saxpy/static workgroup=(1024,)/Float32/16384 4.36 ± 0.29 μs 4.41 ± 0.27 μs 0.988
saxpy/static workgroup=(1024,)/Float32/2048 2.36 ± 0.063 μs 2.39 ± 0.059 μs 0.988
saxpy/static workgroup=(1024,)/Float32/256 2.68 ± 0.055 μs 2.69 ± 0.052 μs 0.995
saxpy/static workgroup=(1024,)/Float32/262144 0.0479 ± 0.0049 ms 0.0606 ± 0.0044 ms 0.791
saxpy/static workgroup=(1024,)/Float32/32768 7.3 ± 0.46 μs 7.46 ± 0.45 μs 0.979
saxpy/static workgroup=(1024,)/Float32/4096 2.65 ± 0.094 μs 2.68 ± 0.086 μs 0.988
saxpy/static workgroup=(1024,)/Float32/512 2.69 ± 0.031 μs 2.71 ± 0.037 μs 0.993
saxpy/static workgroup=(1024,)/Float32/64 2.69 ± 4.6 μs 2.73 ± 5.5 μs 0.988
saxpy/static workgroup=(1024,)/Float32/65536 14.5 ± 1.9 μs 15.4 ± 1.1 μs 0.941
saxpy/static workgroup=(1024,)/Float64/1024 2.3 ± 0.075 μs 2.35 ± 0.076 μs 0.979
saxpy/static workgroup=(1024,)/Float64/1048576 0.506 ± 0.05 ms 0.504 ± 0.054 ms 1
saxpy/static workgroup=(1024,)/Float64/16384 7.15 ± 0.4 μs 7.25 ± 0.44 μs 0.986
saxpy/static workgroup=(1024,)/Float64/2048 2.57 ± 0.074 μs 2.63 ± 0.089 μs 0.976
saxpy/static workgroup=(1024,)/Float64/256 2.62 ± 0.057 μs 2.67 ± 0.058 μs 0.982
saxpy/static workgroup=(1024,)/Float64/262144 0.0974 ± 0.0091 ms 0.1 ± 0.014 ms 0.972
saxpy/static workgroup=(1024,)/Float64/32768 14.3 ± 1.5 μs 14.6 ± 1.5 μs 0.981
saxpy/static workgroup=(1024,)/Float64/4096 3.12 ± 0.2 μs 3.17 ± 0.14 μs 0.983
saxpy/static workgroup=(1024,)/Float64/512 2.64 ± 0.081 μs 2.69 ± 0.062 μs 0.984
saxpy/static workgroup=(1024,)/Float64/64 2.58 ± 0.066 μs 2.63 ± 0.063 μs 0.98
saxpy/static workgroup=(1024,)/Float64/65536 26.2 ± 2.7 μs 26.4 ± 2.7 μs 0.99
time_to_load 0.319 ± 0.0024 s 0.321 ± 0.0043 s 0.995

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy merged commit fd14499 into main Jan 23, 2025
34 of 37 checks passed
@vchuravy vchuravy deleted the vc/add_type_to_base_function branch January 23, 2025 09:27
vchuravy added a commit that referenced this pull request Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants