Skip to content

suggestions #1

@belm0

Description

@belm0

Hello-- noticed the link to this project from carsonfarmer/streamhist. I've been tinkering with optimization and correctness of this algorithm for about a year, and contributed some correctness changes to streamhist. Most of the performance work I did hasn't been published yet.

Suggestions after browsing this project:

consider a few small changes to the algorithm to make the implementation "exact" when the histogram is below capacity - ideally it should match the output of a well-tested library like numpy in this mode. I contributed such support to streamhist.

carsonfarmer/streamhist#11

support reporting of multiple quantile points efficiently - this is a common use, numpy and streamhist API's support it, and streamhist is an example of efficiently computing the sums one time in advance and using it for each quantile

Beyond that, one of the things I've developed which is missing here is a fast implementation based on numba + numpy, for projects which cannot use pypy. The implementation is about 10-15x faster than streamhist last time I measured. The dependencies could be optional. Would an enhancement like that fit into distogram?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions