Skip to content

Commit 4d54d5c

Browse files
committed
lots of work
1 parent 018c66d commit 4d54d5c

File tree

12 files changed

+1792
-43
lines changed

12 files changed

+1792
-43
lines changed

.github/workflows/pytests.yml

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
name: Python Tests
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
test:
7+
runs-on: ${{ matrix.os }}
8+
strategy:
9+
fail-fast: false
10+
matrix:
11+
os: [ubuntu-latest, macos-latest, windows-latest]
12+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
13+
14+
steps:
15+
- uses: actions/checkout@v4
16+
with:
17+
submodules: recursive
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Install system dependencies (Linux)
25+
if: runner.os == 'Linux'
26+
run: |
27+
sudo apt-get update
28+
sudo apt-get install -y python3-dev
29+
30+
- name: Install system dependencies (macOS)
31+
if: runner.os == 'macOS'
32+
run: |
33+
# Xcode command line tools should be available by default
34+
35+
- name: Install system dependencies (Windows)
36+
if: runner.os == 'Windows'
37+
run: |
38+
# Windows typically has MSVC available
39+
40+
- name: Install Python dependencies
41+
run: |
42+
python -m pip install --upgrade pip
43+
python -m pip install setuptools wheel cffi xxhash pytest
44+
45+
- name: Build CFFI extension
46+
run: |
47+
python setup.py build_ext
48+
49+
- name: Install package
50+
run: |
51+
python setup.py install
52+
53+
- name: Run tests
54+
run: |
55+
python -m pytest tests/ -v --tb=short
56+
57+
- name: Test import and basic functionality
58+
run: |
59+
python -c "
60+
import pyfusefilter
61+
print('✓ Import successful')
62+
63+
# Test basic functionality
64+
f = pyfusefilter.Xor8([1, 2, 3, 4, 5])
65+
print(f'✓ Filter created: {f}')
66+
print(f'✓ Contains 3: {f.contains(3)}')
67+
print(f'✓ Does not contain 10: {not f.contains(10)}')
68+
69+
# Test serialization
70+
serialized = f.serialize()
71+
recovered = pyfusefilter.Xor8.deserialize(serialized)
72+
print(f'✓ Serialization works: {recovered.contains(3)}')
73+
"
74+
75+
build-wheel:
76+
runs-on: ubuntu-latest
77+
steps:
78+
- uses: actions/checkout@v4
79+
with:
80+
submodules: recursive
81+
82+
- name: Set up Python
83+
uses: actions/setup-python@v4
84+
with:
85+
python-version: "3.11"
86+
87+
- name: Install build dependencies
88+
run: |
89+
python -m pip install --upgrade pip
90+
python -m pip install build setuptools wheel cffi
91+
92+
- name: Install system dependencies
93+
run: |
94+
sudo apt-get update
95+
sudo apt-get install -y python3-dev
96+
97+
- name: Build wheel
98+
run: |
99+
python -m build --wheel
100+
101+
- name: Store wheel
102+
uses: actions/upload-artifact@v3
103+
with:
104+
name: wheel
105+
path: dist/*.whl

README.md

Lines changed: 69 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,26 @@
33
Python bindings for [C](https://github.com/FastFilter/xor_singleheader) implementation of [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters](https://arxiv.org/abs/1912.08258)
44
and of [Binary Fuse Filters: Fast and Smaller Than Xor Filters](https://arxiv.org/abs/2201.01174).
55

6+
If you have sets using much memory (e.g., thousands or millions of URLs) and you want to
7+
quickly filter out elements that are not in the set, these filters offer both great
8+
performance, and a small memory usage.
69

710

811
## Installation
912
`pip install pyfusefilter`
10-
### From Source
11-
```
12-
git clone --recurse-submodules https://github.com/glitzflitz/pyfusefilter
13-
cd pyfusefilter
14-
python setup.py build_ext
15-
python setup.py install
16-
```
13+
14+
15+
16+
17+
1718
## Usage
1819

20+
21+
See our [API documentation](docs/index.html).
22+
1923
The filters Xor8 and Fuse8 use slightly over a byte of memory per entry, with a false positive rate of about 0.39%.
20-
The filters Xor16 and Fuse16 use slightly over two bytes of memory per entry, with a false positive rate of about 0.0015%.
24+
The filters Xor16 and Fuse16 use slightly over two bytes of memory per entry, with a false positive rate of about 0.0015%. For large sets, Fuse8 and Fuse16 filters use slightly more memory and they can be built
25+
faster.
2126

2227

2328

@@ -26,8 +31,7 @@ The filters Xor16 and Fuse16 use slightly over two bytes of memory per entry, wi
2631
>>>
2732
>>> #Supports unicode strings and heterogeneous types
2833
>>> test_str = ["","", 51, 0.0, 12.3]
29-
>>> filter = Xor8(len(test_str)) #or Xor16(size)
30-
>>> filter.populate(test_str)
34+
>>> filter = Xor8(test_str)
3135
True
3236
>>> filter.contains("")
3337
True
@@ -41,6 +45,10 @@ False
4145
60
4246
```
4347

48+
49+
The `size_in_bytes()` function gives the memory usage of the filter itself. It does not count
50+
the Python overhead which adds a few bytes to the actual memory usage.
51+
4452
You can serialize a filter with the `serialize()` method which returns a buffer, and you can recover the filter with the `deserialize(buffer)` method, which returns a filter:
4553

4654
```py
@@ -50,20 +58,25 @@ You can serialize a filter with the `serialize()` method which returns a buffer,
5058
> recoverfilter = Xor8.deserialize(open('/tmp/output', 'rb').read())
5159
```
5260

61+
The serialization format is as concise as possible and will typically use a few bytes
62+
less than `size_in_bytes()`.
63+
5364
## Measuring data usage
5465

55-
The `size_in_bytes()` function gives the memory usage of the filter itself. The actual memory usage is slightly higher (there is a small constant overhead) due to
66+
The actual memory usage is slightly higher (there is a small constant overhead) due to
5667
Python metadata.
5768

5869
```python
59-
from pyfusefilter import Xor8, Fuse8
70+
from pyfusefilter import Xor8, Fuse8
6071

61-
N = 100
62-
while (N < 10000000):
63-
filter = Xor8(len(data))
64-
fusefilter = Fuse8(len(data))
65-
print(N, filter.size_in_bytes()/N, fusefilter.size_in_bytes()/N)
66-
N *= 10
72+
N = 100
73+
while (N < 10000000):
74+
# filters can be initialized with an integer, the memory is allocated, but unused.
75+
# call 'populate' to fill them with data.
76+
filter = Xor8(len(data))
77+
fusefilter = Fuse8(len(data))
78+
print(N, filter.size_in_bytes()/N, fusefilter.size_in_bytes()/N)
79+
N *= 10
6780

6881
```
6982

@@ -82,6 +95,44 @@ For large sets (contain millions of keys), Fuse8/Fuse16 filters are faster and s
8295
1130536
8396
```
8497

98+
### From Source
99+
100+
Assuming that your Python interpreter is called `python`.
101+
102+
```bash
103+
# Clone the repository with submodules
104+
git clone --recurse-submodules https://github.com/glitzflitz/pyfusefilter
105+
cd pyfusefilter
106+
107+
# If you forgot --recurse-submodules, initialize submodules now
108+
git submodule update --init --recursive
109+
110+
# Create and activate virtual environment
111+
python -m venv pyfuseenv
112+
source pyfuseenv/bin/activate # On Windows: pyfuseenv\Scripts\activate
113+
114+
# Install build dependencies
115+
python -m pip install setuptools wheel cffi xxhash
116+
117+
# Build the CFFI extension
118+
python setup.py build_ext
119+
120+
# Install the package
121+
python setup.py install
122+
123+
# Optional: Run tests to verify installation
124+
python -m pip install pytest
125+
python -m pytest tests/ -v
126+
127+
# Generate documentation
128+
python -m pip install pdoc
129+
python -m pdoc pyfusefilter --output-dir docs
130+
```
131+
132+
**Notes:**
133+
- The build process compiles C code using your system's C compiler
134+
- On macOS, you may need to install Xcode command line tools: `xcode-select --install`
135+
- On Linux, install development headers: `apt-get install python3-dev` (Ubuntu/Debian) or `yum install python3-devel` (CentOS/RHEL)
85136

86137

87138
## References

docs/index.html

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<!doctype html>
2+
<html>
3+
<head>
4+
<meta charset="utf-8">
5+
<meta http-equiv="refresh" content="0; url=./pyfusefilter.html"/>
6+
</head>
7+
</html>

docs/pyfusefilter.html

Lines changed: 1315 additions & 0 deletions
Large diffs are not rendered by default.

docs/search.js

Lines changed: 46 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)