-
Notifications
You must be signed in to change notification settings - Fork 197
Description
As I, was discussing on slack about this with some members;
This idea was discussed in the community: See the thread
I thought that currently, fit(Histogram, data) requires users to either specify the number of bins (nbins) or the exact bin edges. While flexible, a common workflow in scientific computing is to specify bins by a fixed width (e.g., "one bin every 0.5 units") independent of the data range.
Although this can be done manually with fit(Histogram, data, min:width:max), it requires calculating bounds and handling potential edge alignment issues manually.
Reference:
The current fit implementations for Histogram are centered around these methods:
Lines 298 to 307 in 7388a8e
| fit(::Type{Histogram{T}},v::AbstractVector, edg::AbstractVector; closed::Symbol=:left) where {T} = | |
| fit(Histogram{T},(v,), (edg,), closed=closed) | |
| fit(::Type{Histogram{T}},v::AbstractVector; closed::Symbol=:left, nbins=sturges(length(v))) where {T} = | |
| fit(Histogram{T},(v,); closed=closed, nbins=nbins) | |
| fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights, edg::AbstractVector; closed::Symbol=:left) where {T} = | |
| fit(Histogram{T},(v,), wv, (edg,), closed=closed) | |
| fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights; closed::Symbol=:left, nbins=sturges(length(v))) where {T} = | |
| fit(Histogram{T}, (v,), wv; closed=closed, nbins=nbins) | |
| fit(::Type{Histogram}, v::AbstractVector, wv::AbstractWeights{W}, args...; kwargs...) where {W} = fit(Histogram{W}, v, wv, args...; kwargs...) |
Proposed Change:
Add a keyword argument or helper type to allow binning by width. Examples:
# Option 1: keyword argument
fit(Histogram, data; width=0.5)
# Option 2: helper type
fit(Histogram, data, BinWidth(0.5))
My proposal is to extend this dispatch pattern. Currently, we have:
fit(..., edges) (User provides the range)
fit(..., nbins) (Internal histrange calculates the range)
I will add a third path:
3. fit(..., binwidth) (A simple range min:binwidth:max is generated and passed to the existing edges method).
Additional Considerations:
Alignment/anchor: Optional parameter to control where the first bin edge starts.
Variable resolution: Optional method for "equal-count" bins using data quantiles.
This Simplifies the API for new users, aligns with common libraries (NumPy, matplotlib), and reduces errors in manual bin calculations.