Skip to content

INLINABLE insert and delete, bench, small doc additions.#19

Open
robinp wants to merge 7 commits intolarskuhtz:masterfrom
robinp:master
Open

INLINABLE insert and delete, bench, small doc additions.#19
robinp wants to merge 7 commits intolarskuhtz:masterfrom
robinp:master

Conversation

@robinp
Copy link

@robinp robinp commented Aug 29, 2025

Added benchmarks for the main public interface ops, also marked insert and delete INLINABLE, so the callsite can optimize further. This had some small but noticable impact on the benchmark.

insert: 16ms -> 11ms
delete: 26ms -> 16ms

Robin Palotai added 5 commits August 29, 2025 19:34
So callsite can specialize or inline them as it wills. Some noticable
imprevement on benchmarks:

```
insert: 16ms -> 11ms
delete: 26ms -> 16ms

```

Also added the splitCuckooFilter function so benchmark can start from a
pre-filled CF. Could have made it that either it only returns a new CF,
or returns a pair whose fst is the original CF (with the same ByteArray)
but with the split gen.. but those leave too much opportunity for
accidental misuse.
Copy link
Owner

@larskuhtz larskuhtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR. The perf improvements look really nice.

I added a couple suggestions in the review. I also like to understand why the split is needed in the delete benchmark.

deleteBench label n cf0 xs = bench (label <> " - " <> show n) $ whnfIO f
where
f = do
(cf, _) <- splitCuckooFilter cf0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of splitting the filter here before doing the deletes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to pre-create the filter once, and then the benchmark would only time the deletions (not the filter creation). The split was just a way to prevent the original filter from being mutated across the different bench iterations.

But maybe there are other ways to get this, without split - I learned there is perRunEnv in criterion that lets one create the input separately from the measurement. That would let one get rid of the split functionality.

I'm open to whichever way is preferred, or alternatives.

Comment on lines +380 to +384
-- Note: if the potentially inserted elements have many repetitions, and you
-- don't intend to delete items, rather just use the filter as a set, then it is
-- advised to perform a 'member' check before insert. Otherwise repeated items
-- will take up slots and cause high load or even premature insertion failure.
--
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

robinp and others added 2 commits September 10, 2025 18:50
Co-authored-by: Lars Kuhtz <lakuhtz@gmail.com>
Co-authored-by: Lars Kuhtz <lakuhtz@gmail.com>
@robinp
Copy link
Author

robinp commented Sep 10, 2025

@larskuhtz thank you for the comments! Oops, sorry for the random typo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants