-
Notifications
You must be signed in to change notification settings - Fork 467
Open
apache/arrow-rs
#9206Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
The bits/key values in this table do not seem to match results given by the formula used in the Parquet C++ and Parquet Java implementations.
Using that formula (the same in both implementations) I get this table:
Bits of space per insert |
False positive probability |
|---|---|
| 5.8 | 10 % |
| 9.7 | 1 % |
| 14.6 | 0.1 % |
| 21 | 0.01 % |
| 29.6 | 0.001 % |
In Python:
>>> fpp = 0.1 ; 8/math.log(1/(1 - fpp**0.125))
5.7725418439029506
>>> fpp = 0.01 ; 8/math.log(1/(1 - fpp**0.125))
9.681526738735679
>>> fpp = 0.001 ; 8/math.log(1/(1 - fpp**0.125))
14.607697478479535
>>> fpp = 0.0001 ; 8/math.log(1/(1 - fpp**0.125))
21.045409233894773
>>> fpp = 0.00001 ; 8/math.log(1/(1 - fpp**0.125))
29.555488704606017