-
Notifications
You must be signed in to change notification settings - Fork 73
Description
My understanding from the HLL algorithm (which may be flawed, in which case please correct me and close this issue) is that for any fixed set of input values, the accuracy of any estimate from an HLL built from those values should increase as the "m" value used in the HLL increases.
Ie:
if you build 2 HLL instances, with different
log2msettings, and add the exact same set of (raw) values to both, then the HLL with the largerlog2mwill give you the most accurate results then the HLL with a smallerlog2msetting.
In my testing however, I'm frequently encountering situations where "smaller" HLL instances are producing more accurate cardinality estimates -- which I can't explain.
I've created a reproducible test case that demonstrates the problem, which i will post as a separate comment.