Skip to content

Replace the "clever trick" with the popcount intrinsic#19

Open
quad wants to merge 1 commit intowrog:mainfrom
quad:push-skwwtmnmntmz
Open

Replace the "clever trick" with the popcount intrinsic#19
quad wants to merge 1 commit intowrog:mainfrom
quad:push-skwwtmnmntmz

Conversation

@quad
Copy link
Copy Markdown

@quad quad commented Oct 6, 2025

Both GCC and clang have long supported the __builtin_popcount family of bit operation builtins.

Let's use them!

(One C23 day, we could #include <stdbit.h> and stdc_count_ones.)


I ran this test to confirm identical behaviour:

#include <assert.h>
#include <stdint.h>
#include <stdio.h>

static int
old_count_set_bits(uint32_t x)
{
    register uint32_t i = x;	/* take no chances! */

    /* clever trick for adding bits together in parallel to count them */
    i = ((i & 0xAAAAAAAA) >> 1) + (i & ~0xAAAAAAAA);
    i = ((i & 0xCCCCCCCC) >> 2) + (i & ~0xCCCCCCCC);
    i = ((i & 0xF0F0F0F0) >> 4) + (i & ~0xF0F0F0F0);
    i = ((i & 0xFF00FF00) >> 8) + (i & ~0xFF00FF00);
    i = ((i & 0xFFFF0000) >> 16) + (i & ~0xFFFF0000);

    return i;
}

static int
new_count_set_bits(uint32_t x)
{
    return __builtin_popcountg(x);
}

int
main(void)
{
    for (uint32_t x = 0; x < UINT32_MAX; x++)
    {
        assert(old_count_set_bits(x) == new_count_set_bits(x));

        if (0 == (x & (UINT32_MAX >> 4)))
        {
            printf(
                "%llu%% (%x / %x)\n",
                ((uint64_t) x * 100) / UINT32_MAX,
                x,
                UINT32_MAX
            );
        }
    }
}

It's still magic that computers can now count to UINT32_MAX, in seconds!

Both [GCC][1] and [clang][2] have long supported the `__builtin_popcount` family of bit operation builtins.

Let's use them!

(One [C23][3] day, we could `#include <stdbit.h>` and `stdc_count_ones`.)

[1]: https://gcc.gnu.org/onlinedocs/gcc/Bit-Operation-Builtins.html#index-_005f_005fbuiltin_005fpopcountg "Bit Operation Builtins (GCC)"
[2]: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-popcountg "Clang Language Extensions - __builtin_popcountg"
[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf "C International Standard (ISO/IEC 9899:2024)"

---

I ran this test to confirm identical behaviour:

```c
#include <assert.h>
#include <stdint.h>
#include <stdio.h>

static int
old_count_set_bits(uint32_t x)
{
    register uint32_t i = x;	/* take no chances! */

    /* clever trick for adding bits together in parallel to count them */
    i = ((i & 0xAAAAAAAA) >> 1) + (i & ~0xAAAAAAAA);
    i = ((i & 0xCCCCCCCC) >> 2) + (i & ~0xCCCCCCCC);
    i = ((i & 0xF0F0F0F0) >> 4) + (i & ~0xF0F0F0F0);
    i = ((i & 0xFF00FF00) >> 8) + (i & ~0xFF00FF00);
    i = ((i & 0xFFFF0000) >> 16) + (i & ~0xFFFF0000);

    return i;
}

static int
new_count_set_bits(uint32_t x)
{
    return __builtin_popcountg(x);
}

int
main(void)
{
    for (uint32_t x = 0; x < UINT32_MAX; x++)
    {
        assert(old_count_set_bits(x) == new_count_set_bits(x));

        if (0 == (x & (UINT32_MAX >> 4)))
        {
            printf(
                "%llu%% (%x / %x)\n",
                ((uint64_t) x * 100) / UINT32_MAX,
                x,
                UINT32_MAX
            );
        }
    }
}
```

It's still magic that computers can now count to UINT32_MAX, in seconds!
@wrog
Copy link
Copy Markdown
Owner

wrog commented Oct 14, 2025

It's still magic that computers can now count to UINT32_MAX, in seconds

Yeah, sometimes it's nice to be living in The Future.

@wrog
Copy link
Copy Markdown
Owner

wrog commented Oct 14, 2025

So, for this, I would want the autoconf thingie that checks whether the compiler can actually do this (is this in C11?) on the platform in question -- currently we have something known to work everywhere and I'd like to preserve that state of affairs, even if we almost certainly want to use the builtin if it's available -- and then be #ifdeffing out the guts of count_set_bits (and maybe making it conditionally inline cf. how Unicode routines in utf.h/utf-ctype.h work) rather than replacing it everywhere.

Also now wondering about similar issues in bf_random() (see numbers.c) which might then make this worth a new fake system header...

@quad
Copy link
Copy Markdown
Author

quad commented Oct 14, 2025 via email

@wrog
Copy link
Copy Markdown
Owner

wrog commented Oct 14, 2025

Right now we're at C99. I think I could be talked into C11 (since that's been 15 years and we've actually already got one POSIX 2007ism that I know of [the %ms in scxnf]) but I at least need to see what's in there. Baby steps :-)

@quad
Copy link
Copy Markdown
Author

quad commented Jan 31, 2026

I will never get around to taking a crack at the m4 madness because, as it turns out, becoming a father has made me bereft of free moments. (unsubscribing!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants