|
1 | | -# A fast, space efficient Bloom filter implementation |
| 1 | +# bloomfilter-blocked |
2 | 2 |
|
3 | | -Copyright 2008, 2009, 2010, 2011 Bryan O'Sullivan <[email protected]>. |
| 3 | +`bloomfilter-blocked` is a Haskell library providing multiple fast and efficient |
| 4 | +implementations of [bloom filters][bloom-filter:wiki]. It is a full rewrite of |
| 5 | +the [`bloomfilter`][bloomfilter:hackage] package, originally authored by Bryan |
| 6 | + |
4 | 7 |
|
5 | | -This package provides both mutable and immutable Bloom filter data |
6 | | -types, along with a family of hash function and an easy-to-use |
7 | | -interface. |
| 8 | +A bloom filter is a space-efficient data structure representing a set that can |
| 9 | +be probablistically queried for set membership. The set membership query returns |
| 10 | +no false negatives, but it might return false positives. That is, if an element |
| 11 | +was added to a bloom filter, then a subsequent query definitely returns `True`. |
| 12 | +If an element was *not* added to a filter, then a subsequent query may still |
| 13 | +return `True` if `False` would be the correct answer. The probabiliy of false |
| 14 | +positives -- the false positive rate (FPR) -- is configurable, as we will |
| 15 | +describe later. |
8 | 16 |
|
9 | | -To build: |
| 17 | +The library includes two implementations of bloom filters: classic, and blocked. |
10 | 18 |
|
11 | | - cabal install bloomfilter |
| 19 | +* **Classic** bloom filters, found in the `Data.BloomFilter.Classic` module: a |
| 20 | + default implementation that is faithful to the canonical description of a |
| 21 | + bloom filter data structure. |
12 | 22 |
|
13 | | -For examples of usage, see the Haddock documentation and the files in |
14 | | -the examples directory. |
| 23 | +* **Blocked** floom filters, found in the `Data.BloomFilter.Blocked` module: an |
| 24 | + implementation that optimises the memory layout of a classic bloom filter for |
| 25 | + speed (cheaper CPU cache reads), at the cost of a slightly higher FPR for the |
| 26 | + same amount of assigned memory. |
15 | 27 |
|
| 28 | +The FPR scales inversely with how much memory is assigned to the filter. It also |
| 29 | +scales inversely with how many elements are added to the set. The user can |
| 30 | +configure how much memory is asisgned to a filter, and the user also controls |
| 31 | +how many elements are added to a set. Each implementation comes with helper |
| 32 | +functions, like `sizeForFPR` and `sizeForBits`, that the user can leverage to |
| 33 | +configure filters. |
16 | 34 |
|
17 | | -# Get involved! |
| 35 | +Both immutable (`Bloom`) and mutable (`MBloom`) bloom filters, including |
| 36 | +functions to convert between the two, are provided for each implementation. Note |
| 37 | +however that a (mutable) bloom filter can not be resized once created, and that |
| 38 | +elements can not be deleted once inserted. |
18 | 39 |
|
19 | | -Please report bugs via the |
20 | | -[github issue tracker](https://github.com/haskell-pkg-janitors/bloomfilter). |
| 40 | +For more information about the library and examples of how to use it, see the |
| 41 | +Haddock documentation of the different modules. |
21 | 42 |
|
22 | | -Master [git repository](https://github.com/haskell-pkg-janitors/bloomfilter): |
| 43 | +# Usage notes |
23 | 44 |
|
24 | | -* `git clone git://github.com/haskell-pkg-janitors/bloomfilter.git` |
| 45 | +User should take into account the following: |
25 | 46 |
|
| 47 | +* This package is not supported on 32bit systems. |
26 | 48 |
|
27 | | -# Authors |
| 49 | +# Differences from the `bloomfilter` package |
28 | 50 |
|
29 | | -This library is written by Bryan O'Sullivan, <[email protected]>. |
| 51 | +The library is a full rewrite of the [`bloomfilter`][bloomfilter:hackage] |
| 52 | +package, originally authored by Bryan O'Sullivan <[email protected]>. The main |
| 53 | +differences are: |
| 54 | + |
| 55 | +* `bloomfilter-blocked` supports both classic and blocked bloom filters, whereas |
| 56 | + `bloomfilter` only supports the former. |
| 57 | +* `bloomfilter-blocked` supports bloom filters of arbitrary sizes, whereas |
| 58 | + `bloomfilter` limits the sizes to powers of two. |
| 59 | +* `bloomfilter-blocked` supports sizes up to `2^48` for classic bloom filters |
| 60 | + and up to `2^41` for blocked bloom filters, instead of `2^32`. |
| 61 | +* In `bloomfilter-blocked`, the `Bloom` and `MBloom` types are parameterised |
| 62 | + over a `Hashable` type class, instead of having a `a -> [Hash]` typed field. |
| 63 | + This separation in `bloomfilter-blocked` allows clean (de-)serialisation of |
| 64 | + filters as the hashing scheme is static. |
| 65 | +* `bloomfilter-blocked` uses `XXH3` for hashing instead of the Jenkins' |
| 66 | + `lookup3` that `bloomfilter` uses. |
| 67 | + |
| 68 | + |
| 69 | +<!-- Sources --> |
| 70 | + |
| 71 | +[bloom-filter:wiki]: https://en.wikipedia.org/wiki/Bloom_filter |
| 72 | +[bloomfilter:hackage]: https://hackage.haskell.org/package/bloomfilter |
0 commit comments