Skip to content

Commit e642adb

Browse files
Update README.md
1 parent a1997be commit e642adb

File tree

1 file changed

+49
-2
lines changed

1 file changed

+49
-2
lines changed

README.md

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,49 @@
1-
# fastfilter_java
2-
Fast Approximate Membership Filters (Java)
1+
Fast Approximate Membership Filters (Java).
2+
3+
The following filter types are currently implemented:
4+
* Bloom filter: the 'standard' algorithm
5+
* Blocked Bloom filter: faster than regular Bloom filters, but need a bit more space
6+
* Counting Bloom filter: allow removing entries, but need 4 times more space
7+
* Succinct counting Bloom filter: about half the space of regular counting Bloom filters; faster lookup but slower add / remove
8+
* Succinct counting blocked Bloom filter: same lookup speed as blocked Bloom filter
9+
* Cuckoo filter: 8 and 16 bit variants; uses cuckoo hashing to store fingerprints
10+
* Cuckoo+ filter: 8 and 16 bit variants, need a bit less space than regular cuckoo filters
11+
* Golomb Compressed Set (GCS): needs less space than cuckoo filters, but lookup is slow
12+
* Minimal Perfect Hash filter: needs less space than cuckoo filters, but lookup is slow
13+
* Xor filter: 8 and 16 bit variants; needs less space than cuckoo filters, with faster lookup
14+
* Xor+ filter: 8 and 16 bit variants; compressed xor filter
15+
16+
# Password Look
17+
18+
Included is a tool to build a filter from a list of known password (hashes), and a tool to do lookups. That way, the password list can be queried locally, without requiring a large file. The filter is only 650 MB, instead of the original file which is 11 GB. At the cost of some false positives (unknown passwords reported as known, with about 1% probability).
19+
20+
## Generate the Password Filter File
21+
22+
Download the latest SHA-1 password file that is ordered by hash,
23+
for example the file pwned-passwords-sha1-ordered-by-hash-v4.7z (10 GB)
24+
from https://haveibeenpwned.com/passwords
25+
with about 550 million passwords.
26+
27+
If you have enough disk space, you can extract the hash file (25 GB),
28+
and convert it as follows:
29+
30+
mvn clean install
31+
cat hash.txt | java -cp target/fastfilter*.jar org.fastfilter.tools.BuildFilterFile filter.bin
32+
33+
Converting takes about 2-3 minutes (depending on hardware).
34+
To save disk space, you can extract the file on the fly (Mac OS X using Keka):
35+
36+
/Applications/Keka.app/Contents/Resources/keka7z e -so
37+
pass.7z | java -cp target/fastfilter*.jar org.fastfilter.tools.BuildFilterFile filter.bin
38+
39+
Both will generate a file named pass.txt.filter (640 MB).
40+
41+
## Check Passwords
42+
43+
java -cp target/fastfilter*.jar org.fastfilter.tools.PasswordLookup filter.bin
44+
45+
Enter a password to see if it's in the list.
46+
If yes, it will (for sure) either show "Found", or "Found; common",
47+
which means it was seen 10 times or more often.
48+
Passwords not in the list will show "Not found" with more than 99% probability,
49+
and with less than 1% probability "Found" or "Found; common".

0 commit comments

Comments
 (0)