new saturated kernels #808

Ka-zam · 2025-12-26T17:16:15Z

New saturated sum kernels

marcusmueller

should be named saturated_add (or add_saturated) to keep in style with the existing add-named kernels (we use sum only for "complicated" polynomial sums, not element-wise stuff)

kernels/volk/volk_16i_x2_saturated_sum_16i.h

Ka-zam · 2025-12-26T20:23:57Z

#807 handles this log2 failure

jdemel

LGTM

I'm impressed by how many kernels you implement. Thanks a lot.

jdemel · 2026-01-15T20:09:06Z

Can you rebase this PR on latest main? The error should be fixed but I'd like to make sure before I merge anything. Thanks!

Signed-off-by: Magnus Lundmark <[email protected]>

Ka-zam · 2026-01-16T09:30:05Z

LGTM

I'm impressed by how many kernels you implement. Thanks a lot.

We should be in a better place now! 😄

jdemel · 2026-01-16T20:41:03Z

There's still @marcusmueller comment on naming.
To stick with our naming convention and thus go the way of least surprise, the kernels should be named add_saturated. We use the term sum like e.g. NumPy does when all elements in a vector get summed up.

Signed-off-by: Magnus Lundmark <[email protected]>

Ka-zam · 2026-01-17T12:00:47Z

There's still @marcusmueller comment on naming. To stick with our naming convention and thus go the way of least surprise, the kernels should be named add_saturated. We use the term sum like e.g. NumPy does when all elements in a vector get summed up.

Oops, forgot to remove the unused kernels. Should be good now.

$ volk_profile -R satu

RUN_VOLK_TESTS: volk_8i_x2_add_saturated_8i(vlen=131071, iter=1987, tol=1e-06)
arch                       |         time |     throughput |    max_abs |
---------------------------+--------------+----------------+------------+
generic                    |    7.6207 ms |  102529.3 MB/s |          - |
u_sse2                     |    8.3234 ms |   93873.1 MB/s |    0.0e+00 |
u_avx2                     |    6.2392 ms |  125231.5 MB/s |    0.0e+00 | *
a_sse2                     |    7.7166 ms |  101255.1 MB/s |    0.0e+00 |
a_avx2                     |    6.2236 ms |  125545.4 MB/s |    0.0e+00 | *
Best aligned arch          | a_avx2 (1.22x)
Best unaligned arch        | u_avx2 (1.22x)
--------------------------------------------------------------------------------

RUN_VOLK_TESTS: volk_8u_x2_add_saturated_8u(vlen=131071, iter=1987, tol=1e-06)
arch                       |         time |     throughput |    max_abs |
---------------------------+--------------+----------------+------------+
generic                    |    6.2801 ms |  124415.0 MB/s |          - |
u_sse2                     |    6.0746 ms |  128624.6 MB/s |    0.0e+00 | *
u_avx2                     |    6.1945 ms |  126134.9 MB/s |    0.0e+00 |
a_sse2                     |    5.9526 ms |  131260.8 MB/s |    0.0e+00 |
a_avx2                     |    5.9446 ms |  131437.5 MB/s |    0.0e+00 | *
Best aligned arch          | a_avx2 (1.06x)
Best unaligned arch        | u_sse2 (1.03x)
--------------------------------------------------------------------------------

RUN_VOLK_TESTS: volk_16i_x2_add_saturated_16i(vlen=131071, iter=1987, tol=1e-06)
arch                       |         time |     throughput |    max_abs |
---------------------------+--------------+----------------+------------+
generic                    |   13.6151 ms |  114775.7 MB/s |          - | *
u_sse2                     |   16.0701 ms |   97241.9 MB/s |    0.0e+00 |
u_avx2                     |   13.9325 ms |  112161.2 MB/s |    0.0e+00 |
a_sse2                     |   16.0001 ms |   97667.6 MB/s |    0.0e+00 |
a_avx2                     |   14.4234 ms |  108344.3 MB/s |    0.0e+00 |
Best aligned arch          | generic
Best unaligned arch        | generic
--------------------------------------------------------------------------------

RUN_VOLK_TESTS: volk_16u_x2_add_saturated_16u(vlen=131071, iter=1987, tol=1e-06)
arch                       |         time |     throughput |    max_abs |
---------------------------+--------------+----------------+------------+
generic                    |   14.0463 ms |  111253.0 MB/s |          - |
u_sse2                     |   16.9538 ms |   92173.2 MB/s |    0.0e+00 |
u_avx2                     |   13.7991 ms |  113246.0 MB/s |    0.0e+00 | *
a_sse2                     |   13.6123 ms |  114800.1 MB/s |    0.0e+00 | *
a_avx2                     |   13.9583 ms |  111954.2 MB/s |    0.0e+00 |
Best aligned arch          | a_sse2 (1.03x)
Best unaligned arch        | u_avx2 (1.02x)
--------------------------------------------------------------------------------

Session summary (4 kernels):
  Average speedup vs generic: 1.08x
  Max speedup vs generic: 1.22x (volk_8i_x2_add_saturated_8i)

marcusmueller requested changes Dec 26, 2025

View reviewed changes

kernels/volk/volk_16i_x2_saturated_sum_16i.h Outdated Show resolved Hide resolved

Ka-zam force-pushed the saturated_sum_kernels branch 3 times, most recently from 73647f0 to 0d71ec1 Compare December 26, 2025 20:05

jdemel approved these changes Jan 15, 2026

View reviewed changes

add saturated sum kernels

8487ce2

Signed-off-by: Magnus Lundmark <[email protected]>

Ka-zam force-pushed the saturated_sum_kernels branch from 0d71ec1 to 8487ce2 Compare January 16, 2026 08:37

deleted unused kernels

382ac93

Signed-off-by: Magnus Lundmark <[email protected]>

marcusmueller self-requested a review January 17, 2026 12:59

marcusmueller approved these changes Jan 17, 2026

View reviewed changes

jdemel merged commit f921334 into gnuradio:main Jan 18, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new saturated kernels #808

new saturated kernels #808

Uh oh!

Ka-zam commented Dec 26, 2025

Uh oh!

marcusmueller left a comment

Uh oh!

Uh oh!

Ka-zam commented Dec 26, 2025

Uh oh!

jdemel left a comment

Uh oh!

jdemel commented Jan 15, 2026

Uh oh!

Ka-zam commented Jan 16, 2026

Uh oh!

jdemel commented Jan 16, 2026

Uh oh!

Ka-zam commented Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

new saturated kernels #808

new saturated kernels #808

Uh oh!

Conversation

Ka-zam commented Dec 26, 2025

Uh oh!

marcusmueller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ka-zam commented Dec 26, 2025

Uh oh!

jdemel left a comment

Choose a reason for hiding this comment

Uh oh!

jdemel commented Jan 15, 2026

Uh oh!

Ka-zam commented Jan 16, 2026

Uh oh!

jdemel commented Jan 16, 2026

Uh oh!

Ka-zam commented Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants