Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Nov 26, 2025

At the cost of having a larger roaring.h file, this PR significantly improves the performance of roaring_bitmap_contains.

Before

-----------------------------------------------------------------------
Benchmark             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------
RandomAccess       5359 ns         5351 ns       128009 GHz=3.25364 cycles=10.955k instructions=33.398k

After

-----------------------------------------------------------------------
Benchmark             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------
RandomAccess       2758 ns         2754 ns       203315 GHz=3.2813 cycles=7.652k instructions=22.296k

By allowing more inlining of container_iterator_next and container_iterator_prev, it also helps the performance of other functions. I get that IterateAll64 is faster now.

Before

-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
IterateAll64    3045904 ns      3034746 ns          244

After

-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
IterateAll64    2700777 ns      2698000 ns          277

It also seems to help the RandomAccess64Cpp benchmark, but not that RandomAccess64 results. So roaring64_bitmap_contains is NOT improved much (if at all) by this PR.

Before:

------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
RandomAccess64          3086 ns         3083 ns       228588
RandomAccess64Cpp       2110 ns         2109 ns       329877

After:

------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
RandomAccess64          2970 ns         2967 ns       221617
RandomAccess64Cpp       1801 ns         1799 ns       409146

Fixes: #755

@lemire
Copy link
Member Author

lemire commented Nov 27, 2025

cc @madscientist

Copy link
Contributor

@SLieve SLieve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@lemire
Copy link
Member Author

lemire commented Dec 1, 2025

@Dr-Emann ? I'd like to have some kind of community consensus before merging this as it is quite a change.

@lemire lemire merged commit 89ad03d into master Dec 5, 2025
33 of 34 checks passed
@lemire
Copy link
Member Author

lemire commented Dec 5, 2025

@Dr-Emann Thanks for the excellent review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve inlining of some key 32-bit functions

4 participants