Skip to content

Commit 7382f3d

Browse files
committed
Merge remote-tracking branch 'upstream/main' into ref/index_equiv
2 parents 002d98a + c9f876c commit 7382f3d

File tree

20 files changed

+265
-612
lines changed

20 files changed

+265
-612
lines changed

doc/source/user_guide/scale.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,5 +374,33 @@ datasets.
374374

375375
You see more dask examples at https://examples.dask.org.
376376

377+
Use Modin
378+
---------
379+
380+
Modin_ is a scalable dataframe library, which aims to be a drop-in replacement API for pandas and
381+
provides the ability to scale pandas workflows across nodes and CPUs available. It is also able
382+
to work with larger than memory datasets. To start working with Modin you just need
383+
to replace a single line of code, namely, the import statement.
384+
385+
.. code-block:: ipython
386+
387+
# import pandas as pd
388+
import modin.pandas as pd
389+
390+
After you have changed the import statement, you can proceed using the well-known pandas API
391+
to scale computation. Modin distributes computation across nodes and CPUs available utilizing
392+
an execution engine it runs on. At the time of Modin 0.27.0 the following execution engines are supported
393+
in Modin: Ray_, Dask_, `MPI through unidist`_, HDK_. The partitioning schema of a Modin DataFrame partitions it
394+
along both columns and rows because it gives Modin flexibility and scalability in both the number of columns and
395+
the number of rows.
396+
397+
For more information refer to `Modin's documentation`_ or the `Modin's tutorials`_.
398+
399+
.. _Modin: https://github.com/modin-project/modin
400+
.. _`Modin's documentation`: https://modin.readthedocs.io/en/latest
401+
.. _`Modin's tutorials`: https://github.com/modin-project/modin/tree/master/examples/tutorial/jupyter/execution
402+
.. _Ray: https://github.com/ray-project/ray
377403
.. _Dask: https://dask.org
404+
.. _`MPI through unidist`: https://github.com/modin-project/unidist
405+
.. _HDK: https://github.com/intel-ai/hdk
378406
.. _dask.dataframe: https://docs.dask.org/en/latest/dataframe.html

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Version 2.2
2525
.. toctree::
2626
:maxdepth: 2
2727

28+
v2.2.2
2829
v2.2.1
2930
v2.2.0
3031

doc/source/whatsnew/v2.2.2.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
.. _whatsnew_222:
2+
3+
What's new in 2.2.2 (April XX, 2024)
4+
---------------------------------------
5+
6+
These are the changes in pandas 2.2.2. See :ref:`release` for a full changelog
7+
including other versions of pandas.
8+
9+
{{ header }}
10+
11+
.. ---------------------------------------------------------------------------
12+
.. _whatsnew_222.regressions:
13+
14+
Fixed regressions
15+
~~~~~~~~~~~~~~~~~
16+
-
17+
18+
.. ---------------------------------------------------------------------------
19+
.. _whatsnew_222.bug_fixes:
20+
21+
Bug fixes
22+
~~~~~~~~~
23+
-
24+
25+
.. ---------------------------------------------------------------------------
26+
.. _whatsnew_222.other:
27+
28+
Other
29+
~~~~~
30+
-
31+
32+
.. ---------------------------------------------------------------------------
33+
.. _whatsnew_222.contributors:
34+
35+
Contributors
36+
~~~~~~~~~~~~

doc/source/whatsnew/v3.0.0.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,9 @@ Other Deprecations
189189

190190
Removal of prior version deprecations/changes
191191
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
192+
- :class:`.DataFrameGroupBy.idxmin`, :class:`.DataFrameGroupBy.idxmax`, :class:`.SeriesGroupBy.idxmin`, and :class:`.SeriesGroupBy.idxmax` will now raise a ``ValueError`` when used with ``skipna=False`` and an NA value is encountered (:issue:`10694`)
192193
- :func:`read_excel`, :func:`read_json`, :func:`read_html`, and :func:`read_xml` no longer accept raw string or byte representation of the data. That type of data must be wrapped in a :py:class:`StringIO` or :py:class:`BytesIO` (:issue:`53767`)
194+
- :meth:`DataFrame.groupby` with ``as_index=False`` and aggregation methods will no longer exclude from the result the groupings that do not arise from the input (:issue:`49519`)
193195
- :meth:`Series.dt.to_pydatetime` now returns a :class:`Series` of :py:class:`datetime.datetime` objects (:issue:`52459`)
194196
- :meth:`SeriesGroupBy.agg` no longer pins the name of the group to the input passed to the provided ``func`` (:issue:`51703`)
195197
- All arguments except ``name`` in :meth:`Index.rename` are now keyword only (:issue:`56493`)
@@ -198,11 +200,15 @@ Removal of prior version deprecations/changes
198200
- All arguments in :meth:`Series.to_dict` are now keyword only (:issue:`56493`)
199201
- Changed the default value of ``observed`` in :meth:`DataFrame.groupby` and :meth:`Series.groupby` to ``True`` (:issue:`51811`)
200202
- Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`57275`)
203+
- Enforced deprecation of :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` allowing the ``name`` argument to be a non-tuple when grouping by a list of length 1 (:issue:`54155`)
201204
- Enforced deprecation of ``axis=None`` acting the same as ``axis=0`` in the DataFrame reductions ``sum``, ``prod``, ``std``, ``var``, and ``sem``, passing ``axis=None`` will now reduce over both axes; this is particularly the case when doing e.g. ``numpy.sum(df)`` (:issue:`21597`)
205+
- Enforced deprecation of passing a dictionary to :meth:`SeriesGroupBy.agg` (:issue:`52268`)
202206
- Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
203207
- In :meth:`DataFrame.stack`, the default value of ``future_stack`` is now ``True``; specifying ``False`` will raise a ``FutureWarning`` (:issue:`55448`)
208+
- Iterating over a :class:`.DataFrameGroupBy` or :class:`.SeriesGroupBy` will return tuples of length 1 for the groups when grouping by ``level`` a list of length 1 (:issue:`50064`)
204209
- Methods ``apply``, ``agg``, and ``transform`` will no longer replace NumPy functions (e.g. ``np.sum``) and built-in functions (e.g. ``min``) with the equivalent pandas implementation; use string aliases (e.g. ``"sum"`` and ``"min"``) if you desire to use the pandas implementation (:issue:`53974`)
205210
- Passing both ``freq`` and ``fill_value`` in :meth:`DataFrame.shift` and :meth:`Series.shift` and :meth:`.DataFrameGroupBy.shift` now raises a ``ValueError`` (:issue:`54818`)
211+
- Removed :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` supporting bool dtype (:issue:`53975`)
206212
- Removed :meth:`DateOffset.is_anchored` and :meth:`offsets.Tick.is_anchored` (:issue:`56594`)
207213
- Removed ``DataFrame.applymap``, ``Styler.applymap`` and ``Styler.applymap_index`` (:issue:`52364`)
208214
- Removed ``DataFrame.bool`` and ``Series.bool`` (:issue:`51756`)
@@ -227,6 +233,7 @@ Removal of prior version deprecations/changes
227233
- Removed ``read_gbq`` and ``DataFrame.to_gbq``. Use ``pandas_gbq.read_gbq`` and ``pandas_gbq.to_gbq`` instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`)
228234
- Removed ``use_nullable_dtypes`` from :func:`read_parquet` (:issue:`51853`)
229235
- Removed ``year``, ``month``, ``quarter``, ``day``, ``hour``, ``minute``, and ``second`` keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`)
236+
- Removed argument ``limit`` from :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`.DataFrameGroupBy.pct_change`, and :meth:`.SeriesGroupBy.pct_change`; the argument ``method`` must be set to ``None`` and will be removed in a future version of pandas (:issue:`53520`)
230237
- Removed deprecated argument ``obj`` in :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` (:issue:`53545`)
231238
- Removed deprecated behavior of :meth:`Series.agg` using :meth:`Series.apply` (:issue:`53325`)
232239
- Removed option ``mode.use_inf_as_na``, convert inf entries to ``NaN`` before instead (:issue:`51684`)
@@ -239,6 +246,8 @@ Removal of prior version deprecations/changes
239246
- Removed the ``ordinal`` keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`)
240247
- Removed unused arguments ``*args`` and ``**kwargs`` in :class:`Resampler` methods (:issue:`50977`)
241248
- Unrecognized timezones when parsing strings to datetimes now raises a ``ValueError`` (:issue:`51477`)
249+
- Removed the :class:`Grouper` attributes ``ax``, ``groups``, ``indexer``, and ``obj`` (:issue:`51206`, :issue:`51182`)
250+
- Removed the attribute ``dtypes`` from :class:`.DataFrameGroupBy` (:issue:`51997`)
242251

243252
.. ---------------------------------------------------------------------------
244253
.. _whatsnew_300.performance:

pandas/core/generic.py

Lines changed: 8 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -11122,8 +11122,7 @@ def describe(
1112211122
def pct_change(
1112311123
self,
1112411124
periods: int = 1,
11125-
fill_method: FillnaOptions | None | lib.NoDefault = lib.no_default,
11126-
limit: int | None | lib.NoDefault = lib.no_default,
11125+
fill_method: None = None,
1112711126
freq=None,
1112811127
**kwargs,
1112911128
) -> Self:
@@ -11145,17 +11144,12 @@ def pct_change(
1114511144
----------
1114611145
periods : int, default 1
1114711146
Periods to shift for forming percent change.
11148-
fill_method : {'backfill', 'bfill', 'pad', 'ffill', None}, default 'pad'
11149-
How to handle NAs **before** computing percent changes.
11147+
fill_method : None
11148+
Must be None. This argument will be removed in a future version of pandas.
1115011149
1115111150
.. deprecated:: 2.1
1115211151
All options of `fill_method` are deprecated except `fill_method=None`.
1115311152
11154-
limit : int, default None
11155-
The number of consecutive NAs to fill before stopping.
11156-
11157-
.. deprecated:: 2.1
11158-
1115911153
freq : DateOffset, timedelta, or str, optional
1116011154
Increment to use from time series API (e.g. 'ME' or BDay()).
1116111155
**kwargs
@@ -11262,52 +11256,18 @@ def pct_change(
1126211256
APPL -0.252395 -0.011860 NaN
1126311257
"""
1126411258
# GH#53491
11265-
if fill_method not in (lib.no_default, None) or limit is not lib.no_default:
11266-
warnings.warn(
11267-
"The 'fill_method' keyword being not None and the 'limit' keyword in "
11268-
f"{type(self).__name__}.pct_change are deprecated and will be removed "
11269-
"in a future version. Either fill in any non-leading NA values prior "
11270-
"to calling pct_change or specify 'fill_method=None' to not fill NA "
11271-
"values.",
11272-
FutureWarning,
11273-
stacklevel=find_stack_level(),
11274-
)
11275-
if fill_method is lib.no_default:
11276-
if limit is lib.no_default:
11277-
cols = self.items() if self.ndim == 2 else [(None, self)]
11278-
for _, col in cols:
11279-
if len(col) > 0:
11280-
mask = col.isna().values
11281-
mask = mask[np.argmax(~mask) :]
11282-
if mask.any():
11283-
warnings.warn(
11284-
"The default fill_method='pad' in "
11285-
f"{type(self).__name__}.pct_change is deprecated and "
11286-
"will be removed in a future version. Either fill in "
11287-
"any non-leading NA values prior to calling pct_change "
11288-
"or specify 'fill_method=None' to not fill NA values.",
11289-
FutureWarning,
11290-
stacklevel=find_stack_level(),
11291-
)
11292-
break
11293-
fill_method = "pad"
11294-
if limit is lib.no_default:
11295-
limit = None
11259+
if fill_method is not None:
11260+
raise ValueError(f"fill_method must be None; got {fill_method=}.")
1129611261

1129711262
axis = self._get_axis_number(kwargs.pop("axis", "index"))
11298-
if fill_method is None:
11299-
data = self
11300-
else:
11301-
data = self._pad_or_backfill(fill_method, axis=axis, limit=limit)
11302-
11303-
shifted = data.shift(periods=periods, freq=freq, axis=axis, **kwargs)
11263+
shifted = self.shift(periods=periods, freq=freq, axis=axis, **kwargs)
1130411264
# Unsupported left operand type for / ("Self")
11305-
rs = data / shifted - 1 # type: ignore[operator]
11265+
rs = self / shifted - 1 # type: ignore[operator]
1130611266
if freq is not None:
1130711267
# Shift method is implemented differently when freq is not None
1130811268
# We want to restore the original index
1130911269
rs = rs.loc[~rs.index.duplicated()]
11310-
rs = rs.reindex_like(data)
11270+
rs = rs.reindex_like(self)
1131111271
return rs.__finalize__(self, method="pct_change")
1131211272

1131311273
@final

pandas/core/groupby/base.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,6 @@ class OutputKey:
9090
"corr",
9191
"cov",
9292
"describe",
93-
"dtypes",
9493
"expanding",
9594
"ewm",
9695
"filter",

pandas/core/groupby/generic.py

Lines changed: 11 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
Union,
2121
cast,
2222
)
23-
import warnings
2423

2524
import numpy as np
2625

@@ -32,7 +31,6 @@
3231
Substitution,
3332
doc,
3433
)
35-
from pandas.util._exceptions import find_stack_level
3634

3735
from pandas.core.dtypes.common import (
3836
ensure_int64,
@@ -384,23 +382,9 @@ def _python_agg_general(self, func, *args, **kwargs):
384382

385383
def _aggregate_multiple_funcs(self, arg, *args, **kwargs) -> DataFrame:
386384
if isinstance(arg, dict):
387-
if self.as_index:
388-
# GH 15931
389-
raise SpecificationError("nested renamer is not supported")
390-
else:
391-
# GH#50684 - This accidentally worked in 1.x
392-
msg = (
393-
"Passing a dictionary to SeriesGroupBy.agg is deprecated "
394-
"and will raise in a future version of pandas. Pass a list "
395-
"of aggregations instead."
396-
)
397-
warnings.warn(
398-
message=msg,
399-
category=FutureWarning,
400-
stacklevel=find_stack_level(),
401-
)
402-
arg = list(arg.items())
403-
elif any(isinstance(x, (tuple, list)) for x in arg):
385+
raise SpecificationError("nested renamer is not supported")
386+
387+
if any(isinstance(x, (tuple, list)) for x in arg):
404388
arg = [(x, x) if not isinstance(x, (tuple, list)) else x for x in arg]
405389
else:
406390
# list of functions / function names
@@ -1179,8 +1163,7 @@ def idxmin(self, skipna: bool = True) -> Series:
11791163
Parameters
11801164
----------
11811165
skipna : bool, default True
1182-
Exclude NA/null values. If the entire Series is NA, the result
1183-
will be NA.
1166+
Exclude NA values.
11841167
11851168
Returns
11861169
-------
@@ -1190,7 +1173,7 @@ def idxmin(self, skipna: bool = True) -> Series:
11901173
Raises
11911174
------
11921175
ValueError
1193-
If the Series is empty.
1176+
If the Series is empty or skipna=False and any value is NA.
11941177
11951178
See Also
11961179
--------
@@ -1233,8 +1216,7 @@ def idxmax(self, skipna: bool = True) -> Series:
12331216
Parameters
12341217
----------
12351218
skipna : bool, default True
1236-
Exclude NA/null values. If the entire Series is NA, the result
1237-
will be NA.
1219+
Exclude NA values.
12381220
12391221
Returns
12401222
-------
@@ -1244,7 +1226,7 @@ def idxmax(self, skipna: bool = True) -> Series:
12441226
Raises
12451227
------
12461228
ValueError
1247-
If the Series is empty.
1229+
If the Series is empty or skipna=False and any value is NA.
12481230
12491231
See Also
12501232
--------
@@ -2165,13 +2147,10 @@ def idxmax(
21652147
"""
21662148
Return index of first occurrence of maximum in each group.
21672149
2168-
NA/null values are excluded.
2169-
21702150
Parameters
21712151
----------
21722152
skipna : bool, default True
2173-
Exclude NA/null values. If an entire row/column is NA, the result
2174-
will be NA.
2153+
Exclude NA values.
21752154
numeric_only : bool, default False
21762155
Include only `float`, `int` or `boolean` data.
21772156
@@ -2185,7 +2164,7 @@ def idxmax(
21852164
Raises
21862165
------
21872166
ValueError
2188-
* If the row/column is empty
2167+
* If a column is empty or skipna=False and any value is NA.
21892168
21902169
See Also
21912170
--------
@@ -2230,13 +2209,10 @@ def idxmin(
22302209
"""
22312210
Return index of first occurrence of minimum in each group.
22322211
2233-
NA/null values are excluded.
2234-
22352212
Parameters
22362213
----------
22372214
skipna : bool, default True
2238-
Exclude NA/null values. If an entire row/column is NA, the result
2239-
will be NA.
2215+
Exclude NA values.
22402216
numeric_only : bool, default False
22412217
Include only `float`, `int` or `boolean` data.
22422218
@@ -2250,7 +2226,7 @@ def idxmin(
22502226
Raises
22512227
------
22522228
ValueError
2253-
* If the row/column is empty
2229+
* If a column is empty or skipna=False and any value is NA.
22542230
22552231
See Also
22562232
--------
@@ -2728,22 +2704,6 @@ def hist(
27282704
)
27292705
return result
27302706

2731-
@property
2732-
@doc(DataFrame.dtypes.__doc__)
2733-
def dtypes(self) -> Series:
2734-
# GH#51045
2735-
warnings.warn(
2736-
f"{type(self).__name__}.dtypes is deprecated and will be removed in "
2737-
"a future version. Check the dtypes on the base object instead",
2738-
FutureWarning,
2739-
stacklevel=find_stack_level(),
2740-
)
2741-
2742-
# error: Incompatible return value type (got "DataFrame", expected "Series")
2743-
return self._python_apply_general( # type: ignore[return-value]
2744-
lambda df: df.dtypes, self._selected_obj
2745-
)
2746-
27472707
def corrwith(
27482708
self,
27492709
other: DataFrame | Series,

0 commit comments

Comments
 (0)