A high-performance, well-documented Bloom Filter implementation in Rust with configurable parameters, comprehensive error handling, and advanced features.
A Bloom Filter is a space-efficient probabilistic data structure designed to test whether an element is a member of a set. It returns either "possibly in the set" or "definitely not in the set". False positive matches are possible, but false negatives are not.
- Efficient membership tests with no false negatives
- Configurable false positive probability
- Comprehensive error handling
- Optimized hash function implementation using double hashing
- Well-documented API with usage examples
- Extensive test suite
- Support for any hashable type
- Builder pattern for flexible initialization
- Serialization/deserialization support (optional)
- Ability to clear the filter without reallocation
- Merge operation for combining compatible filters
- Element count and false positive rate estimation
Add this to your Cargo.toml:
[dependencies]
frbf = "0.1.0"To create a new Bloom Filter:
use frbf::BloomFilter;
let bloom_filter_result = BloomFilter::new(1000, 0.01);
match bloom_filter_result {
Ok(mut bloom_filter) => {
// Use the bloom filter...
},
Err(e) => {
println!("Error creating Bloom Filter: {}", e);
}
}This attempts to create a new Bloom Filter optimized for 1000 items and a 1% false positive probability. The implementation automatically calculates the optimal number of bits and hash functions based on these parameters.
To add an item to the Bloom Filter:
bloom_filter.add(&"hello");You can add any type that implements the Hash trait:
// Add integers
bloom_filter.add(&42);
// Add tuples
bloom_filter.add(&(1, "test"));
// Add custom structs that implement Hash
#[derive(Hash)]
struct User {
id: u64,
name: String,
}
let user = User { id: 1, name: "Alice".to_string() };
bloom_filter.add(&user);To check if an item is present in the Bloom Filter:
if bloom_filter.check(&"hello") {
println!("'hello' might be in the set!");
} else {
println!("'hello' is definitely not in the set.");
}You can access filter properties:
// Get the bit array size
let bit_size = bloom_filter.bit_size();
// Get the number of hash functions
let hash_count = bloom_filter.hash_count();
// Get the expected number of elements
let expected = bloom_filter.expected_elements();
// Get the number of elements that have been added
let inserted = bloom_filter.inserted_elements();The Bloom Filter is optimized for space efficiency and performance:
- Time complexity for both add and check operations: O(k) where k is the number of hash functions
- Space complexity: O(m) where m is the number of bits in the filter
For optimal performance, choose the expected number of elements and false positive probability carefully:
- Higher number of expected elements or lower false positive probability will result in a larger bit array
- Very low false positive probabilities may result in a high number of hash functions, impacting performance
The implementation returns a Result type when creating a new Bloom Filter or performing operations like merging, which can contain the following errors:
BloomFilterError::InvalidProbability- When the false positive probability is not between 0 and 1BloomFilterError::MemoryAllocationFailed- When memory allocation for the bit array failsBloomFilterError::IncompatibleFilters- When attempting to merge filters with different bit sizes or hash counts
For more flexible initialization, you can use the builder pattern:
use frbf::BloomFilterBuilder;
let mut filter = BloomFilterBuilder::new(1000)
.false_positive_probability(0.001)
.hash_seeds([42, 43, 44, 45], [99, 98, 97, 96]) // Custom hash seeds
.build()
.unwrap();
filter.add(&"hello");
assert!(filter.check(&"hello"));You can clear the filter without reallocating memory:
// Add some items
filter.add(&"hello");
filter.add(&"world");
// Clear the filter
filter.clear();
// The filter is now empty
assert!(!filter.check(&"hello"));
assert!(!filter.check(&"world"));You can merge two compatible bloom filters:
let mut filter1 = BloomFilter::new(1000, 0.01).unwrap();
let mut filter2 = BloomFilter::new(1000, 0.01).unwrap();
filter1.add(&"hello");
filter2.add(&"world");
// Merge filter2 into filter1
filter1.merge(&filter2).unwrap();
// filter1 now contains items from both filters
assert!(filter1.check(&"hello"));
assert!(filter1.check(&"world"));You can estimate the number of elements in the filter and the current false positive rate:
// Add some elements
for i in 0..1000 {
filter.add(&i);
}
// Estimate the number of elements
let estimated_count = filter.estimate_elements();
println!("Estimated element count: {}", estimated_count);
// Estimate the current false positive rate
let estimated_fpp = filter.estimate_false_positive_rate();
println!("Estimated false positive rate: {}", estimated_fpp);To enable serialization support, add the serde-support feature to your Cargo.toml:
[dependencies]
frbf = { version = "0.1.0", features = ["serde-support"] }Then you can serialize and deserialize the bloom filter:
use serde_json;
let mut filter = BloomFilter::new(1000, 0.01).unwrap();
filter.add(&"hello");
// Serialize
let serialized = serde_json::to_string(&filter).unwrap();
// Deserialize
let deserialized: BloomFilter = serde_json::from_str(&serialized).unwrap();
// The deserialized filter contains the same elements
assert!(deserialized.check(&"hello"));You can verify the actual false positive rate of your filter:
let n = 10000;
let expected_fpp = 0.01;
let mut bloom_filter = BloomFilter::new(n, expected_fpp).unwrap();
// Add n items
for i in 0..n {
bloom_filter.add(&i);
}
// Check for false positives with another n items that weren't added
let mut false_positives = 0;
for i in n..(2 * n) {
if bloom_filter.check(&i) {
false_positives += 1;
}
}
let actual_fpp = false_positives as f64 / n as f64;
println!("Expected FPP: {}, Actual FPP: {}", expected_fpp, actual_fpp);- Once an element is added to the Bloom Filter, it cannot be removed
- The false positive rate increases as more elements are added to the filter
- Very large expected element counts may result in memory allocation failures
This project is licensed under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.