-
Notifications
You must be signed in to change notification settings - Fork 230
Open
Description
I was wondering if there was a way to calculate the mean and standard deviation of the distribution summarized by a T-Digest?
Disclaimer: I don't really understand all the clever math behind T-Digest
Mean:
It seems that calculating the average of all centroids weighted by their respective count should get me close enough.
Variance:
I was thinking of:
- Calculate the mean
- For each centroid, calculate
(centroid-mean)^2 * count, and then sum the result from all centroids - Then the variance is equal to the value calculated at point 2, divided by
sum(centroid.count)
This is obviously wrong, since it makes the assumption that all the sample represented by a centroid have the same value, but I can't really figure out a reasonable way to get more accurate values.
Do you know a better way to calculate the mean and variance? Would it be possible to add the corresponding methods to the TDigest class?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels