Skip to content

Commit 7c10cd4

Browse files
Update README.md
1 parent 771c992 commit 7c10cd4

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,18 @@ The following filter types are currently implemented:
1414
* Xor filter: 8 and 16 bit variants; needs less space than cuckoo filters, with faster lookup
1515
* Xor+ filter: 8 and 16 bit variants; compressed xor filter
1616

17-
# Password Lookup Tool
17+
## Password Lookup Tool
1818

1919
Included is a tool to build a filter from a list of known password (hashes), and a tool to do lookups. That way, the password list can be queried locally, without requiring a large file. The filter is only 650 MB, instead of the original file which is 11 GB. At the cost of some false positives (unknown passwords reported as known, with about 1% probability).
2020

21-
## Generate the Password Filter File
21+
### Generate the Password Filter File
2222

2323
Download the latest SHA-1 password file that is ordered by hash,
24-
for example the file pwned-passwords-sha1-ordered-by-hash-v4.7z (10 GB)
24+
for example the file pwned-passwords-sha1-ordered-by-hash-v4.7z (~10 GB)
2525
from https://haveibeenpwned.com/passwords
2626
with about 550 million passwords.
2727

28-
If you have enough disk space, you can extract the hash file (25 GB),
28+
If you have enough disk space, you can extract the hash file (~25 GB),
2929
and convert it as follows:
3030

3131
mvn clean install
@@ -37,9 +37,9 @@ To save disk space, you can extract the file on the fly (Mac OS X using Keka):
3737
/Applications/Keka.app/Contents/Resources/keka7z e -so
3838
pass.7z | java -cp target/fastfilter*.jar org.fastfilter.tools.BuildFilterFile filter.bin
3939

40-
Both will generate a file named filter.bin (640 MB).
40+
Both will generate a file named filter.bin (~630 MB).
4141

42-
## Check Passwords
42+
### Check Passwords
4343

4444
java -cp target/fastfilter*.jar org.fastfilter.tools.PasswordLookup filter.bin
4545

@@ -48,3 +48,5 @@ If yes, it will (for sure) either show "Found", or "Found; common",
4848
which means it was seen 10 times or more often.
4949
Passwords not in the list will show "Not found" with more than 99% probability,
5050
and with less than 1% probability "Found" or "Found; common".
51+
52+
Internally, the tool uses a xor+ filter (see above) with 8 bits per fingerprint. One bit of the key is either 0 (regular) or 1 (common), and so two lookups are made per password. Because two lookups are made, the false positive rate is twice of what it would be with just one lookup (0.0078 instead of 0.0039). A regular Bloom filter with the same guarantees would be ~760 MB.

0 commit comments

Comments
 (0)