Skip to content

Commit d0eecda

Browse files
committed
fix pre-commit failures
1 parent 6650f3e commit d0eecda

File tree

2 files changed

+24
-2
lines changed

2 files changed

+24
-2
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Other enhancements
4040
- :meth:`Styler.format_index_names` can now be used to format the index and column names (:issue:`48936` and :issue:`47489`)
4141
- :class:`.errors.DtypeWarning` improved to include column names when mixed data types are detected (:issue:`58174`)
4242
- :func:`DataFrame.to_excel` argument ``merge_cells`` now accepts a value of ``"columns"`` to only merge :class:`MultiIndex` column header header cells (:issue:`35384`)
43+
- :func:`cut` now supports a string for ``bins`` kwarg by dispatching to ``numpy.histogram_bin_edges``. (:issue:`59165`)
4344
- :meth:`DataFrame.corrwith` now accepts ``min_periods`` as optional arguments, as in :meth:`DataFrame.corr` and :meth:`Series.corr` (:issue:`9490`)
4445
- :meth:`DataFrame.cummin`, :meth:`DataFrame.cummax`, :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods now have a ``numeric_only`` parameter (:issue:`53072`)
4546
- :meth:`DataFrame.fillna` and :meth:`Series.fillna` can now accept ``value=None``; for non-object dtype the corresponding NA value will be used (:issue:`57723`)

pandas/core/reshape/tile.py

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def cut(
7575
----------
7676
x : array-like
7777
The input array to be binned. Must be 1-dimensional.
78-
bins : int, sequence of scalars, or IntervalIndex
78+
bins : int, str, sequence of scalars or IntervalIndex
7979
The criteria to bin by.
8080
8181
* int : Defines the number of equal-width bins in the range of `x`. The
@@ -85,6 +85,14 @@ def cut(
8585
width. No extension of the range of `x` is done.
8686
* IntervalIndex : Defines the exact bins to be used. Note that
8787
IntervalIndex for `bins` must be non-overlapping.
88+
* str : If bins is a string from a list of accepted strings, bin
89+
calculation is dispatched to np.histogram_bin_edges. Which then
90+
uses the method chosen to calculate the optimal bin width and
91+
consequently the number of bins from the data that falls within the
92+
requested range.
93+
Supported strings = ["auto", "auto", "fd", "doane", "scott",
94+
"stone", "rice", "sturges", "sqrt"]
95+
Please check np.histogram_bin_edges documentation for more details.
8896
8997
right : bool, default True
9098
Indicates whether `bins` includes the rightmost edge or not. If
@@ -130,7 +138,7 @@ def cut(
130138
131139
bins : numpy.ndarray or IntervalIndex.
132140
The computed or specified bins. Only returned when `retbins=True`.
133-
For scalar or sequence `bins`, this is an ndarray with the computed
141+
For scalar, str or sequence `bins`, this is an ndarray with the computed
134142
bins. If set `duplicates=drop`, `bins` will drop non-unique bin. For
135143
an IntervalIndex `bins`, this is equal to `bins`.
136144
@@ -142,6 +150,8 @@ def cut(
142150
fixed set of values.
143151
Series : One-dimensional array with axis labels (including time series).
144152
IntervalIndex : Immutable Index implementing an ordered, sliceable set.
153+
np.histogram_bin_edges : Bin calculation dispatched to this method when
154+
`bins` is a string.
145155
146156
Notes
147157
-----
@@ -239,6 +249,12 @@ def cut(
239249
>>> pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
240250
[NaN, (0.0, 1.0], NaN, (2.0, 3.0], (4.0, 5.0]]
241251
Categories (3, interval[int64, right]): [(0, 1] < (2, 3] < (4, 5]]
252+
253+
Passing an str for 'bins' dispatches the bin calculation to np.histogram_bin_edges
254+
255+
>>> pd.cut(np.array([1, 7, 5, 4]), "auto")
256+
[NaN, (5.0, 7.0], (3.0, 5.0], (3.0, 5.0]]
257+
Categories (3, interval[float64, right]): [(1.0, 3.0] < (3.0, 5.0] < (5.0, 7.0]]
242258
"""
243259
# NOTE: this binning code is changed a bit from histogram for var(x) == 0
244260

@@ -253,6 +269,11 @@ def cut(
253269
if bins.is_overlapping:
254270
raise ValueError("Overlapping IntervalIndex is not accepted.")
255271

272+
elif isinstance(bins, str):
273+
# GH 59165
274+
# Raises ValueError if string is not supported
275+
bins = np.histogram_bin_edges(x, bins)
276+
256277
else:
257278
bins = Index(bins)
258279
if not bins.is_monotonic_increasing:

0 commit comments

Comments
 (0)