Skip to content

possible bug in td_quantile_of #4

@tvondra

Description

@tvondra

While looking at td_quantile_of I think I've found another bug. My knowledge of go is pretty limited so I'm not sure if it's present in the other repository, and unlike the previous issue I don't have a reproducer demonstrating the issue yet.

When computing the quantile, the code does this:

     if (val == n->mean) {
          // technically this needs to find all of the nodes which contain this value and sum their weight
          double count_at_value = n->count;
          for (i += 1; i < h->merged_nodes && h->nodes[i].mean == n->mean; i++) {
               count_at_value += h->nodes[i].count;
          }
          return (k + (count_at_value/2)) / h->merged_count;
     }

which makes perfect sense - there may be multiple centroids with the same mean, and we need to account for all of them (and then take 1/2 of the count). So far so good.

But when the value is not exactly equal to a mean of a centroid, the code ignores this possibility. It simply does this:

     node_t *nr = n;
     node_t *nl = n-1;

and entirely ignores the possibility there might be multiple centroids with the same mean, both for nr and nl. So the weights used for linear approximation will be somewhat bogus, not quite representing values from all the relevant centroids.

I think the code should do the same lookup as in the val == n->mean branch for both nr and nl.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions