Skip to content

Commit 5bb9e8d

Browse files
authored
Merge branch 'main' into get_dummies
2 parents 85dd62c + 02267e5 commit 5bb9e8d

File tree

25 files changed

+331
-84
lines changed

25 files changed

+331
-84
lines changed

.github/workflows/package-checks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
fetch-depth: 0
6868

6969
- name: Set up Python
70-
uses: mamba-org/setup-micromamba@v1
70+
uses: mamba-org/setup-micromamba@v2
7171
with:
7272
environment-name: recipe-test
7373
create-args: >-

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ jobs:
165165
CIBW_PLATFORM: ${{ matrix.buildplat[1] == 'pyodide_wasm32' && 'pyodide' || 'auto' }}
166166

167167
- name: Set up Python
168-
uses: mamba-org/setup-micromamba@v1
168+
uses: mamba-org/setup-micromamba@v2
169169
with:
170170
environment-name: wheel-env
171171
# Use a fixed Python, since we might have an unreleased Python not

ci/code_checks.sh

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9696
-i "pandas.Series.dt.tz_localize PR01,PR02" \
9797
-i "pandas.Series.dt.unit GL08" \
9898
-i "pandas.Series.pad PR01,SA01" \
99-
-i "pandas.Series.sparse.from_coo PR07,SA01" \
10099
-i "pandas.Timedelta.max PR02" \
101100
-i "pandas.Timedelta.min PR02" \
102101
-i "pandas.Timedelta.resolution PR02" \
@@ -106,13 +105,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
106105
-i "pandas.Timestamp.resolution PR02" \
107106
-i "pandas.Timestamp.tzinfo GL08" \
108107
-i "pandas.Timestamp.year GL08" \
109-
-i "pandas.api.types.is_float PR01,SA01" \
110108
-i "pandas.api.types.is_integer PR01,SA01" \
111109
-i "pandas.api.types.is_iterator PR07,SA01" \
112110
-i "pandas.api.types.is_re_compilable PR07,SA01" \
113111
-i "pandas.api.types.pandas_dtype PR07,RT03,SA01" \
114112
-i "pandas.arrays.ArrowExtensionArray PR07,SA01" \
115-
-i "pandas.arrays.DatetimeArray SA01" \
116113
-i "pandas.arrays.IntegerArray SA01" \
117114
-i "pandas.arrays.IntervalArray.left SA01" \
118115
-i "pandas.arrays.IntervalArray.length SA01" \
@@ -163,7 +160,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
163160
-i "pandas.errors.DuplicateLabelError SA01" \
164161
-i "pandas.errors.IntCastingNaNError SA01" \
165162
-i "pandas.errors.InvalidIndexError SA01" \
166-
-i "pandas.errors.InvalidVersion SA01" \
167163
-i "pandas.errors.NullFrequencyError SA01" \
168164
-i "pandas.errors.NumExprClobberingError SA01" \
169165
-i "pandas.errors.NumbaUtilError SA01" \
@@ -172,24 +168,18 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
172168
-i "pandas.errors.PerformanceWarning SA01" \
173169
-i "pandas.errors.PossibleDataLossError SA01" \
174170
-i "pandas.errors.PossiblePrecisionLoss SA01" \
175-
-i "pandas.errors.SpecificationError SA01" \
176171
-i "pandas.errors.UndefinedVariableError PR01,SA01" \
177172
-i "pandas.errors.UnsortedIndexError SA01" \
178173
-i "pandas.errors.UnsupportedFunctionCall SA01" \
179174
-i "pandas.errors.ValueLabelTypeMismatch SA01" \
180175
-i "pandas.infer_freq SA01" \
181176
-i "pandas.io.json.build_table_schema PR07,RT03,SA01" \
182-
-i "pandas.io.stata.StataReader.data_label SA01" \
183-
-i "pandas.io.stata.StataReader.value_labels RT03,SA01" \
184177
-i "pandas.io.stata.StataReader.variable_labels RT03,SA01" \
185178
-i "pandas.io.stata.StataWriter.write_file SA01" \
186179
-i "pandas.json_normalize RT03,SA01" \
187-
-i "pandas.period_range RT03,SA01" \
188180
-i "pandas.plotting.andrews_curves RT03,SA01" \
189-
-i "pandas.plotting.lag_plot RT03,SA01" \
190181
-i "pandas.plotting.scatter_matrix PR07,SA01" \
191182
-i "pandas.set_eng_float_format RT03,SA01" \
192-
-i "pandas.testing.assert_extension_array_equal SA01" \
193183
-i "pandas.tseries.offsets.BDay PR02,SA01" \
194184
-i "pandas.tseries.offsets.BQuarterBegin.is_on_offset GL08" \
195185
-i "pandas.tseries.offsets.BQuarterBegin.n GL08" \

doc/source/whatsnew/v3.0.0.rst

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,67 @@ In cases with mixed-resolution inputs, the highest resolution is used:
203203
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
204204
Out[2]: dtype('<M8[ns]')
205205
206+
.. _whatsnew_300.api_breaking.value_counts_sorting:
207+
208+
Changed behavior in :meth:`DataFrame.value_counts` and :meth:`DataFrameGroupBy.value_counts` when ``sort=False``
209+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
210+
211+
In previous versions of pandas, :meth:`DataFrame.value_counts` with ``sort=False`` would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with :meth:`Series.value_counts` which would maintain the order of the input. Now :meth:`DataFrame.value_counts` will maintain the order of the input.
212+
213+
.. ipython:: python
214+
215+
df = pd.DataFrame(
216+
{
217+
"a": [2, 2, 2, 2, 1, 1, 1, 1],
218+
"b": [2, 1, 3, 1, 2, 3, 1, 1],
219+
}
220+
)
221+
df
222+
223+
*Old behavior*
224+
225+
.. code-block:: ipython
226+
227+
In [3]: df.value_counts(sort=False)
228+
Out[3]:
229+
a b
230+
1 1 2
231+
2 1
232+
3 1
233+
2 1 2
234+
2 1
235+
3 1
236+
Name: count, dtype: int64
237+
238+
*New behavior*
239+
240+
.. ipython:: python
241+
242+
df.value_counts(sort=False)
243+
244+
This change also applies to :meth:`.DataFrameGroupBy.value_counts`. Here, there are two options for sorting: one ``sort`` passed to :meth:`DataFrame.groupby` and one passed directly to :meth:`.DataFrameGroupBy.value_counts`. The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input *within groups*.
245+
246+
*Old behavior*
247+
248+
.. code-block:: ipython
249+
250+
In [5]: df.groupby("a", sort=True).value_counts(sort=False)
251+
Out[5]:
252+
a b
253+
1 1 2
254+
2 1
255+
3 1
256+
2 1 2
257+
2 1
258+
3 1
259+
dtype: int64
260+
261+
*New behavior*
262+
263+
.. ipython:: python
264+
265+
df.groupby("a", sort=True).value_counts(sort=False)
266+
206267
.. _whatsnew_300.api_breaking.deps:
207268

208269
Increased minimum version for Python
@@ -682,6 +743,7 @@ Sparse
682743
^^^^^^
683744
- Bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
684745
- Bug in :meth:`DataFrame.sparse.from_spmatrix` which hard coded an invalid ``fill_value`` for certain subtypes. (:issue:`59063`)
746+
- Bug in :meth:`DataFrame.sparse.to_dense` which ignored subclassing and always returned an instance of :class:`DataFrame` (:issue:`59913`)
685747

686748
ExtensionArray
687749
^^^^^^^^^^^^^^
@@ -700,6 +762,7 @@ Other
700762
- Bug in :func:`eval` on :class:`ExtensionArray` on including division ``/`` failed with a ``TypeError``. (:issue:`58748`)
701763
- Bug in :func:`eval` where the names of the :class:`Series` were not preserved when using ``engine="numexpr"``. (:issue:`10239`)
702764
- Bug in :func:`eval` with ``engine="numexpr"`` returning unexpected result for float division. (:issue:`59736`)
765+
- Bug in :func:`to_numeric` raising ``TypeError`` when ``arg`` is a :class:`Timedelta` or :class:`Timestamp` scalar. (:issue:`59944`)
703766
- Bug in :func:`unique` on :class:`Index` not always returning :class:`Index` (:issue:`57043`)
704767
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
705768
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)

pandas/_libs/lib.pyx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,9 +1089,23 @@ def is_float(obj: object) -> bool:
10891089
"""
10901090
Return True if given object is float.
10911091

1092+
This method checks whether the passed object is a float type. It
1093+
returns `True` if the object is a float, and `False` otherwise.
1094+
1095+
Parameters
1096+
----------
1097+
obj : object
1098+
The object to check for float type.
1099+
10921100
Returns
10931101
-------
10941102
bool
1103+
`True` if the object is of float type, otherwise `False`.
1104+
1105+
See Also
1106+
--------
1107+
api.types.is_integer : Check if an object is of integer type.
1108+
api.types.is_numeric_dtype : Check if an object is of numeric type.
10951109

10961110
Examples
10971111
--------

pandas/_libs/tslibs/nattype.pyi

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ from typing import (
99
Literal,
1010
NoReturn,
1111
TypeAlias,
12+
overload,
1213
)
1314

1415
import numpy as np
@@ -159,15 +160,31 @@ class NaTType:
159160
# inject Period properties
160161
@property
161162
def qyear(self) -> float: ...
163+
# comparisons
162164
def __eq__(self, other: object) -> bool: ...
163165
def __ne__(self, other: object) -> bool: ...
164166
__lt__: _NatComparison
165167
__le__: _NatComparison
166168
__gt__: _NatComparison
167169
__ge__: _NatComparison
170+
# unary operators
171+
def __pos__(self) -> Self: ...
172+
def __neg__(self) -> Self: ...
173+
# binary operators
168174
def __sub__(self, other: Self | timedelta | datetime) -> Self: ...
169175
def __rsub__(self, other: Self | timedelta | datetime) -> Self: ...
170176
def __add__(self, other: Self | timedelta | datetime) -> Self: ...
171177
def __radd__(self, other: Self | timedelta | datetime) -> Self: ...
178+
def __mul__(self, other: float) -> Self: ... # analogous to timedelta
179+
def __rmul__(self, other: float) -> Self: ...
180+
@overload # analogous to timedelta
181+
def __truediv__(self, other: Self | timedelta) -> float: ... # Literal[NaN]
182+
@overload
183+
def __truediv__(self, other: float) -> Self: ...
184+
@overload # analogous to timedelta
185+
def __floordiv__(self, other: Self | timedelta) -> float: ... # Literal[NaN]
186+
@overload
187+
def __floordiv__(self, other: float) -> Self: ...
188+
# other
172189
def __hash__(self) -> int: ...
173190
def as_unit(self, unit: str, round_ok: bool = ...) -> NaTType: ...

pandas/_testing/asserters.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,6 +701,10 @@ def assert_extension_array_equal(
701701
"""
702702
Check that left and right ExtensionArrays are equal.
703703
704+
This method compares two ``ExtensionArray`` instances for equality,
705+
including checks for missing values, the dtype of the arrays, and
706+
the exactness of the comparison (or tolerance when comparing floats).
707+
704708
Parameters
705709
----------
706710
left, right : ExtensionArray
@@ -726,6 +730,12 @@ def assert_extension_array_equal(
726730
727731
.. versionadded:: 2.0.0
728732
733+
See Also
734+
--------
735+
testing.assert_series_equal : Check that left and right ``Series`` are equal.
736+
testing.assert_frame_equal : Check that left and right ``DataFrame`` are equal.
737+
testing.assert_index_equal : Check that left and right ``Index`` are equal.
738+
729739
Notes
730740
-----
731741
Missing values are checked separately from valid values.

pandas/core/arrays/datetimes.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,14 @@ class DatetimeArray(dtl.TimelikeOps, dtl.DatelikeOps): # type: ignore[misc]
205205
-------
206206
None
207207
208+
See Also
209+
--------
210+
DatetimeIndex : Immutable Index for datetime-like data.
211+
Series : One-dimensional labeled array capable of holding datetime-like data.
212+
Timestamp : Pandas replacement for python datetime.datetime object.
213+
to_datetime : Convert argument to datetime.
214+
period_range : Return a fixed frequency PeriodIndex.
215+
208216
Examples
209217
--------
210218
>>> pd.arrays.DatetimeArray._from_sequence(

pandas/core/arrays/sparse/accessor.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,9 +88,17 @@ def from_coo(cls, A, dense_index: bool = False) -> Series:
8888
"""
8989
Create a Series with sparse values from a scipy.sparse.coo_matrix.
9090
91+
This method takes a ``scipy.sparse.coo_matrix`` (coordinate format) as input and
92+
returns a pandas ``Series`` where the non-zero elements are represented as
93+
sparse values. The index of the Series can either include only the coordinates
94+
of non-zero elements (default behavior) or the full sorted set of coordinates
95+
from the matrix if ``dense_index`` is set to `True`.
96+
9197
Parameters
9298
----------
9399
A : scipy.sparse.coo_matrix
100+
The sparse matrix in coordinate format from which the sparse Series
101+
will be created.
94102
dense_index : bool, default False
95103
If False (default), the index consists of only the
96104
coords of the non-null entries of the original coo_matrix.
@@ -102,6 +110,12 @@ def from_coo(cls, A, dense_index: bool = False) -> Series:
102110
s : Series
103111
A Series with sparse values.
104112
113+
See Also
114+
--------
115+
DataFrame.sparse.from_spmatrix : Create a new DataFrame from a scipy sparse
116+
matrix.
117+
scipy.sparse.coo_matrix : A sparse matrix in COOrdinate format.
118+
105119
Examples
106120
--------
107121
>>> from scipy import sparse
@@ -369,10 +383,10 @@ def to_dense(self) -> DataFrame:
369383
1 1
370384
2 0
371385
"""
372-
from pandas import DataFrame
373-
374386
data = {k: v.array.to_dense() for k, v in self._parent.items()}
375-
return DataFrame(data, index=self._parent.index, columns=self._parent.columns)
387+
return self._parent._constructor(
388+
data, index=self._parent.index, columns=self._parent.columns
389+
)
376390

377391
def to_coo(self) -> spmatrix:
378392
"""

pandas/core/frame.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7266,7 +7266,11 @@ def value_counts(
72667266
normalize : bool, default False
72677267
Return proportions rather than frequencies.
72687268
sort : bool, default True
7269-
Sort by frequencies when True. Sort by DataFrame column values when False.
7269+
Sort by frequencies when True. Preserve the order of the data when False.
7270+
7271+
.. versionchanged:: 3.0.0
7272+
7273+
Prior to 3.0.0, ``sort=False`` would sort by the columns values.
72707274
ascending : bool, default False
72717275
Sort in ascending order.
72727276
dropna : bool, default True
@@ -7372,7 +7376,9 @@ def value_counts(
73727376
subset = self.columns.tolist()
73737377

73747378
name = "proportion" if normalize else "count"
7375-
counts = self.groupby(subset, dropna=dropna, observed=False)._grouper.size()
7379+
counts = self.groupby(
7380+
subset, sort=False, dropna=dropna, observed=False
7381+
)._grouper.size()
73767382
counts.name = name
73777383

73787384
if sort:

0 commit comments

Comments
 (0)