Skip to content

Conversation

@wenkokke
Copy link
Collaborator

This PR adds salt to the Bloom filter API.

I expect the CI to break, since I have not modified the usage of the bloomfilter package in lsm-tree.

Copy link
Collaborator

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far.

A few minor suggestions:

  • arg order
  • do we really need a class for hashing with salt, it's just used internally
  • where the Salt type lives

And then we can move on to the uses in lsm-tree.

@wenkokke wenkokke force-pushed the wenkokke/bloomfilter-salt branch from a0d8b27 to caeb000 Compare May 29, 2025 11:22
@wenkokke wenkokke force-pushed the wenkokke/bloomfilter-salt branch 4 times, most recently from 2fc1622 to f13d95b Compare June 6, 2025 13:05
@wenkokke wenkokke force-pushed the wenkokke/bloomfilter-salt branch from f13d95b to fe4ec82 Compare June 16, 2025 12:45
@jorisdral jorisdral force-pushed the wenkokke/bloomfilter-salt branch from fe4ec82 to 6b676f4 Compare June 23, 2025 14:37
@jorisdral jorisdral marked this pull request as draft June 24, 2025 16:20
@jorisdral jorisdral self-assigned this Jun 24, 2025
@jorisdral jorisdral force-pushed the wenkokke/bloomfilter-salt branch from 642aff3 to 7ff8e01 Compare June 25, 2025 09:43
jorisdral and others added 2 commits June 25, 2025 11:44
Making the hash salt configurable improves security, because bloom filter hashes
are not cryptographic hashes.

The hash salt can be configured on a session-wide basis only. That is, all bloom
filters for all tables in a single session use the same salt. As a result, batch
lookup performance is not impacted by the salt. The performance currently relies
on being able to compute a hash only once for multiple bloom filters, which is
only possible if the salt for each bloom filter is the same.

For now the user has the responsibility of passing in the same salt each time a
session is restored. We will change this in the next few commits.

Co-authored-by: Wen Kokke <[email protected]>
@jorisdral jorisdral force-pushed the wenkokke/bloomfilter-salt branch from 7ff8e01 to 72e0b7f Compare June 25, 2025 09:45
This helped troubleshoot a bug that we will address in the next commit.
`HasBlockIO` is passed into `openSession` by the user, so it should also be
closed by the user. Otherwise, it would prevent reuse of `HasBlockIO`.

This bug popped up in the test we add in the next commit.
…the time)

If a session is restored with a salt that was different from the salt it was
created with initially, then lookups will often return incorrect results. We
will fix this in the next few commits.
... to `withKeepSessionOpen` and `withKeepTableOpen` respectively. This is to
avoid name conflicts in the next few commits.
Only the former requires a salt, and the latter does not require it. We still
keep a version of `openSession` that defers to `newSession` or `restoreSession`
based on the session directory contents, because we expect it to be useful for
users.

In `newSession`, the input salt is written to a new metadata file in the session
directory. In `restoreSession`, we read this metadata file back into memory to
extract the salt.
…salt

This prevents opening snapshots with the wrong salt, which would lead to
incorrect lookup results.
@jorisdral jorisdral force-pushed the wenkokke/bloomfilter-salt branch 2 times, most recently from 4cecb21 to 5e7cf18 Compare June 25, 2025 13:28
@jorisdral jorisdral marked this pull request as ready for review June 25, 2025 13:28
@jorisdral jorisdral requested a review from dcoutts June 25, 2025 14:05
@jorisdral jorisdral enabled auto-merge June 26, 2025 07:33
@jorisdral jorisdral dismissed dcoutts’s stale review June 26, 2025 07:34

comments resolved

@jorisdral jorisdral added this pull request to the merge queue Jun 26, 2025
Merged via the queue into main with commit 6276783 Jun 26, 2025
30 checks passed
@jorisdral jorisdral deleted the wenkokke/bloomfilter-salt branch June 26, 2025 08:13
jorisdral added a commit that referenced this pull request Jul 1, 2025
There are a few places where functions in the public API create a `HasBlockIO`
for the user, but since #742 when a session is closed the `HasBlockIO` it
contains does not get closed automatically anymore. So in the public API, we
have to be extra careful to do so when the public API opens a `HasBlockIO`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants