-
Notifications
You must be signed in to change notification settings - Fork 9
Description
While looking at td_quantile_of I think I've found another bug. My knowledge of go is pretty limited so I'm not sure if it's present in the other repository, and unlike the previous issue I don't have a reproducer demonstrating the issue yet.
When computing the quantile, the code does this:
if (val == n->mean) {
// technically this needs to find all of the nodes which contain this value and sum their weight
double count_at_value = n->count;
for (i += 1; i < h->merged_nodes && h->nodes[i].mean == n->mean; i++) {
count_at_value += h->nodes[i].count;
}
return (k + (count_at_value/2)) / h->merged_count;
}
which makes perfect sense - there may be multiple centroids with the same mean, and we need to account for all of them (and then take 1/2 of the count). So far so good.
But when the value is not exactly equal to a mean of a centroid, the code ignores this possibility. It simply does this:
node_t *nr = n;
node_t *nl = n-1;
and entirely ignores the possibility there might be multiple centroids with the same mean, both for nr and nl. So the weights used for linear approximation will be somewhat bogus, not quite representing values from all the relevant centroids.
I think the code should do the same lookup as in the val == n->mean branch for both nr and nl.