-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
I'm interested in the case where a variable takes on discrete values. I created tdigest notebook to illustrate what might be an interesting issue.
Suppose I have sampled many rolls of a die. If I add a tiny amount of noise then tdigest works just fine as a nice representation of the data, with quite an accurate cdf and percentiles.
However, if you run the same spreadsheet with HACK=False then only six centroids are created. This leads to gross inaccuracy in both cdf and percentiles.
I am wondering if there could be a trick here, in order for tdigest to be able to handle cases like this without my hack.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels