Skip to content

Feature Request: Support bin width/step size in fit(Histogram, ...) #985

@anurag-mds

Description

@anurag-mds

As I, was discussing on slack about this with some members;

This idea was discussed in the community: See the thread

I thought that currently, fit(Histogram, data) requires users to either specify the number of bins (nbins) or the exact bin edges. While flexible, a common workflow in scientific computing is to specify bins by a fixed width (e.g., "one bin every 0.5 units") independent of the data range.

Although this can be done manually with fit(Histogram, data, min:width:max), it requires calculating bounds and handling potential edge alignment issues manually.

Reference:
The current fit implementations for Histogram are centered around these methods:

StatsBase.jl/src/hist.jl

Lines 298 to 307 in 7388a8e

fit(::Type{Histogram{T}},v::AbstractVector, edg::AbstractVector; closed::Symbol=:left) where {T} =
fit(Histogram{T},(v,), (edg,), closed=closed)
fit(::Type{Histogram{T}},v::AbstractVector; closed::Symbol=:left, nbins=sturges(length(v))) where {T} =
fit(Histogram{T},(v,); closed=closed, nbins=nbins)
fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights, edg::AbstractVector; closed::Symbol=:left) where {T} =
fit(Histogram{T},(v,), wv, (edg,), closed=closed)
fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights; closed::Symbol=:left, nbins=sturges(length(v))) where {T} =
fit(Histogram{T}, (v,), wv; closed=closed, nbins=nbins)
fit(::Type{Histogram}, v::AbstractVector, wv::AbstractWeights{W}, args...; kwargs...) where {W} = fit(Histogram{W}, v, wv, args...; kwargs...)

Proposed Change:
Add a keyword argument or helper type to allow binning by width. Examples:

# Option 1: keyword argument
fit(Histogram, data; width=0.5)

# Option 2: helper type
fit(Histogram, data, BinWidth(0.5))

My proposal is to extend this dispatch pattern. Currently, we have:

fit(..., edges) (User provides the range)

fit(..., nbins) (Internal histrange calculates the range)

I will add a third path:
3. fit(..., binwidth) (A simple range min:binwidth:max is generated and passed to the existing edges method).

Additional Considerations:

Alignment/anchor: Optional parameter to control where the first bin edge starts.

Variable resolution: Optional method for "equal-count" bins using data quantiles.

This Simplifies the API for new users, aligns with common libraries (NumPy, matplotlib), and reduces errors in manual bin calculations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions