Skip to content

Conversation

@vchuravy
Copy link
Member

No description provided.

@vchuravy vchuravy marked this pull request as ready for review February 14, 2025 09:24
Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vchuravy vchuravy requested a review from leios February 14, 2025 09:25
@github-actions
Copy link
Contributor

Benchmark Results

main dce672d... main/dce672dbd5e0bb...
saxpy/default/Float16/1024 0.753 ± 0.0076 μs 0.74 ± 0.0073 μs 1.02
saxpy/default/Float16/1048576 0.173 ± 0.0018 ms 0.173 ± 0.0012 ms 1
saxpy/default/Float16/16384 3.35 ± 0.028 μs 3.34 ± 0.026 μs 1.01
saxpy/default/Float16/2048 0.931 ± 0.01 μs 0.916 ± 0.011 μs 1.02
saxpy/default/Float16/256 0.603 ± 0.0058 μs 0.591 ± 0.0053 μs 1.02
saxpy/default/Float16/262144 0.0439 ± 0.00039 ms 0.044 ± 0.00025 ms 0.998
saxpy/default/Float16/32768 6.04 ± 0.062 μs 6.01 ± 0.057 μs 1
saxpy/default/Float16/4096 1.33 ± 0.024 μs 1.31 ± 0.027 μs 1.01
saxpy/default/Float16/512 0.666 ± 0.0062 μs 0.652 ± 0.0057 μs 1.02
saxpy/default/Float16/64 0.572 ± 0.0046 μs 0.563 ± 0.0046 μs 1.02
saxpy/default/Float16/65536 11.7 ± 0.091 μs 11.6 ± 0.11 μs 1.01
saxpy/default/Float32/1024 0.634 ± 0.013 μs 0.643 ± 0.012 μs 0.987
saxpy/default/Float32/1048576 0.228 ± 0.021 ms 0.229 ± 0.012 ms 0.995
saxpy/default/Float32/16384 2.8 ± 0.2 μs 2.78 ± 0.14 μs 1.01
saxpy/default/Float32/2048 0.753 ± 0.072 μs 0.76 ± 0.071 μs 0.991
saxpy/default/Float32/256 0.561 ± 0.0063 μs 0.574 ± 0.0073 μs 0.977
saxpy/default/Float32/262144 0.0555 ± 0.0034 ms 0.0568 ± 0.003 ms 0.977
saxpy/default/Float32/32768 5.33 ± 0.38 μs 5.34 ± 0.38 μs 0.998
saxpy/default/Float32/4096 1.11 ± 0.08 μs 1.15 ± 0.11 μs 0.963
saxpy/default/Float32/512 0.597 ± 0.0092 μs 0.61 ± 0.011 μs 0.979
saxpy/default/Float32/64 0.549 ± 0.0046 μs 0.566 ± 0.0053 μs 0.971
saxpy/default/Float32/65536 12.4 ± 0.66 μs 12.3 ± 0.61 μs 1.01
saxpy/default/Float64/1024 0.741 ± 0.04 μs 0.75 ± 0.073 μs 0.989
saxpy/default/Float64/1048576 0.491 ± 0.022 ms 0.495 ± 0.022 ms 0.992
saxpy/default/Float64/16384 5.32 ± 0.38 μs 5.38 ± 0.54 μs 0.989
saxpy/default/Float64/2048 1.13 ± 0.092 μs 1.14 ± 0.098 μs 0.991
saxpy/default/Float64/256 0.564 ± 0.0076 μs 0.574 ± 0.0075 μs 0.983
saxpy/default/Float64/262144 0.114 ± 0.0067 ms 0.114 ± 0.0045 ms 1
saxpy/default/Float64/32768 12.5 ± 0.63 μs 12.5 ± 0.66 μs 0.998
saxpy/default/Float64/4096 1.69 ± 0.16 μs 1.67 ± 0.1 μs 1.01
saxpy/default/Float64/512 0.616 ± 0.013 μs 0.625 ± 0.01 μs 0.986
saxpy/default/Float64/64 0.543 ± 0.0039 μs 0.547 ± 0.0076 μs 0.992
saxpy/default/Float64/65536 28.5 ± 1.4 μs 28.5 ± 1.4 μs 0.998
saxpy/static workgroup=(1024,)/Float16/1024 2.2 ± 0.024 μs 2.19 ± 0.025 μs 1.01
saxpy/static workgroup=(1024,)/Float16/1048576 0.157 ± 0.0024 ms 0.157 ± 0.0029 ms 0.999
saxpy/static workgroup=(1024,)/Float16/16384 4.45 ± 0.086 μs 4.43 ± 0.073 μs 1
saxpy/static workgroup=(1024,)/Float16/2048 2.38 ± 0.023 μs 2.36 ± 0.029 μs 1.01
saxpy/static workgroup=(1024,)/Float16/256 2.84 ± 0.033 μs 2.79 ± 0.031 μs 1.02
saxpy/static workgroup=(1024,)/Float16/262144 0.0416 ± 0.0012 ms 0.042 ± 0.0015 ms 0.991
saxpy/static workgroup=(1024,)/Float16/32768 6.86 ± 0.18 μs 6.87 ± 0.16 μs 0.999
saxpy/static workgroup=(1024,)/Float16/4096 2.71 ± 0.032 μs 2.67 ± 0.039 μs 1.01
saxpy/static workgroup=(1024,)/Float16/512 3.29 ± 0.036 μs 3.24 ± 0.033 μs 1.01
saxpy/static workgroup=(1024,)/Float16/64 2.54 ± 0.22 μs 2.49 ± 0.22 μs 1.02
saxpy/static workgroup=(1024,)/Float16/65536 12.5 ± 0.21 μs 12.4 ± 0.24 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1024 2.23 ± 0.029 μs 2.19 ± 0.028 μs 1.01
saxpy/static workgroup=(1024,)/Float32/1048576 0.233 ± 0.018 ms 0.237 ± 0.011 ms 0.98
saxpy/static workgroup=(1024,)/Float32/16384 4.4 ± 0.3 μs 4.37 ± 0.29 μs 1.01
saxpy/static workgroup=(1024,)/Float32/2048 2.37 ± 0.051 μs 2.36 ± 0.07 μs 1.01
saxpy/static workgroup=(1024,)/Float32/256 2.67 ± 0.041 μs 2.68 ± 0.039 μs 0.997
saxpy/static workgroup=(1024,)/Float32/262144 0.0588 ± 0.004 ms 0.0606 ± 0.0023 ms 0.97
saxpy/static workgroup=(1024,)/Float32/32768 7.42 ± 0.37 μs 7.46 ± 0.51 μs 0.994
saxpy/static workgroup=(1024,)/Float32/4096 2.66 ± 0.079 μs 2.66 ± 0.084 μs 0.997
saxpy/static workgroup=(1024,)/Float32/512 2.69 ± 0.029 μs 2.71 ± 0.033 μs 0.993
saxpy/static workgroup=(1024,)/Float32/64 2.69 ± 4.5 μs 2.71 ± 5.3 μs 0.993
saxpy/static workgroup=(1024,)/Float32/65536 15.1 ± 0.81 μs 15.3 ± 0.86 μs 0.986
saxpy/static workgroup=(1024,)/Float64/1024 2.41 ± 0.064 μs 2.31 ± 0.085 μs 1.04
saxpy/static workgroup=(1024,)/Float64/1048576 0.506 ± 0.021 ms 0.493 ± 0.023 ms 1.03
saxpy/static workgroup=(1024,)/Float64/16384 7.32 ± 0.4 μs 7.33 ± 0.44 μs 0.999
saxpy/static workgroup=(1024,)/Float64/2048 2.65 ± 0.078 μs 2.6 ± 0.083 μs 1.02
saxpy/static workgroup=(1024,)/Float64/256 2.67 ± 0.051 μs 2.64 ± 0.053 μs 1.01
saxpy/static workgroup=(1024,)/Float64/262144 0.118 ± 0.0068 ms 0.117 ± 0.0045 ms 1
saxpy/static workgroup=(1024,)/Float64/32768 15.2 ± 0.86 μs 15.2 ± 0.86 μs 1
saxpy/static workgroup=(1024,)/Float64/4096 3.19 ± 0.19 μs 3.14 ± 0.14 μs 1.02
saxpy/static workgroup=(1024,)/Float64/512 2.68 ± 0.064 μs 2.65 ± 0.066 μs 1.01
saxpy/static workgroup=(1024,)/Float64/64 2.62 ± 0.065 μs 2.6 ± 0.072 μs 1.01
saxpy/static workgroup=(1024,)/Float64/65536 31 ± 1.4 μs 31.1 ± 1.2 μs 0.997
time_to_load 0.315 ± 0.0027 s 0.314 ± 0.00099 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@vchuravy vchuravy merged commit 1516dce into main Feb 14, 2025
36 of 39 checks passed
@vchuravy vchuravy deleted the 02-14-fix_indicies-_indices_typo_everywhere branch February 14, 2025 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants