Skip to content

Commit c2846a2

Browse files
authored
Merge branch 'main' into str.rsplit
2 parents 9519a0b + 8050f17 commit c2846a2

File tree

120 files changed

+2160
-797
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+2160
-797
lines changed

.github/workflows/wheels.yml

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ jobs:
4545
(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') && ( ! endsWith(github.ref, 'dev0')))
4646
runs-on: ubuntu-24.04
4747
env:
48-
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
4948
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
5049
outputs:
5150
sdist_file: ${{ steps.save-path.outputs.sdist_name }}
@@ -118,7 +117,6 @@ jobs:
118117
python: ["cp313t", "3.13"]
119118

120119
env:
121-
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
122120
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
123121
steps:
124122
- name: Checkout pandas
@@ -204,21 +202,11 @@ jobs:
204202
path: ./wheelhouse/*.whl
205203

206204
- name: Upload wheels & sdist
207-
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
208-
shell: bash -el {0}
209-
env:
210-
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
211-
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
212-
# trigger an upload to
213-
# https://anaconda.org/scientific-python-nightly-wheels/pandas
214-
# for cron jobs or "Run workflow" (restricted to main branch).
215-
# Tags will upload to
216-
# https://anaconda.org/multibuild-wheels-staging/pandas
217-
# The tokens were originally generated at anaconda.org
218-
run: |
219-
source ci/upload_wheels.sh
220-
set_upload_vars
221-
upload_wheels
205+
if: ${{ success() && env.IS_SCHEDULE_DISPATCH == 'true' }}
206+
uses: scientific-python/[email protected]
207+
with:
208+
artifacts_path: dist
209+
anaconda_nightly_upload_token: ${{secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN}}
222210

223211
publish:
224212
if: >

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.14.3
22+
rev: v0.14.7
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -71,7 +71,7 @@ repos:
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.21.0
74+
rev: v3.21.2
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,12 +87,12 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.1
90+
rev: v1.0.2
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]
9494
- repo: https://github.com/pre-commit/mirrors-clang-format
95-
rev: v21.1.2
95+
rev: v21.1.6
9696
hooks:
9797
- id: clang-format
9898
files: ^pandas/_libs/src|^pandas/_libs/include

ci/code_checks.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7272
-i "pandas.Series.dt PR01" `# Accessors are implemented as classes, but we do not document the Parameters section` \
7373
-i "pandas.Period.freq GL08" \
7474
-i "pandas.Period.ordinal GL08" \
75+
-i "pandas.errors.ChainedAssignmentError SA01" \
7576
-i "pandas.errors.IncompatibleFrequency SA01,SS06,EX01" \
7677
-i "pandas.api.extensions.ExtensionArray.value_counts EX01,RT03,SA01" \
7778
-i "pandas.api.typing.DataFrameGroupBy.plot PR02" \

ci/upload_wheels.sh

Lines changed: 0 additions & 42 deletions
This file was deleted.

doc/source/development/maintaining.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -433,14 +433,6 @@ which will be triggered when the tag is pushed.
433433
3. Download the source distribution and wheels from the `wheel staging area <https://anaconda.org/scientific-python-nightly-wheels/pandas>`_.
434434
Be careful to make sure that no wheels are missing (e.g. due to failed builds).
435435

436-
Running scripts/download_wheels.sh with the version that you want to download wheels/the sdist for should do the trick.
437-
This script will make a ``dist`` folder inside your clone of pandas and put the downloaded wheels and sdist there::
438-
439-
scripts/download_wheels.sh <VERSION>
440-
441-
ATTENTION: this is currently not downloading *all* wheels, and you have to
442-
manually download the remainings wheels and sdist!
443-
444436
4. Create a `new GitHub release <https://github.com/pandas-dev/pandas/releases/new>`_:
445437

446438
- Tag: ``<version>``

doc/source/user_guide/copy_on_write.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -249,9 +249,9 @@ two subsequent indexing operations, e.g.
249249
In [3]: df
250250
Out[3]:
251251
foo bar
252-
0 100 4
252+
0 1 4
253253
1 2 5
254-
2 3 6
254+
2 100 6
255255
256256
The column ``foo`` was updated where the column ``bar`` is greater than 5.
257257
This violated the CoW principles though, because it would have to modify the

doc/source/whatsnew/v3.0.0.rst

Lines changed: 56 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ process in more detail.
117117

118118
`PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
119119

120+
Setting the option ``mode.copy_on_write`` no longer has any impact. The option is deprecated
121+
and will be removed in pandas 4.0.
122+
120123
.. _whatsnew_300.enhancements.col:
121124

122125
``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
@@ -381,6 +384,8 @@ In cases with mixed-resolution inputs, the highest resolution is used:
381384
382385
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
383386

387+
Similarly, the :class:`Timedelta` constructor and :func:`to_timedelta` with a string input now defaults to a microsecond unit, using nanosecond unit only in cases that actually have nanosecond precision.
388+
384389
.. _whatsnew_300.api_breaking.concat_datetime_sorting:
385390

386391
:func:`concat` no longer ignores ``sort`` when all objects have a :class:`DatetimeIndex`
@@ -547,29 +552,55 @@ small behavior differences as collateral:
547552
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
548553
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
549554

550-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559+
to :class:`NA` in all cases.
551560

552-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562+
and will be treated the same as :class:`NA`. The only change users will see is
563+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564+
entries produce :class:`NA` entries instead.
553565

554566
*Old behavior:*
555567

556568
.. code-block:: ipython
557569
558-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570+
# NaN in input gets converted to NA
571+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572+
In [2]: ser
573+
Out[2]:
574+
0 0.0
575+
1 <NA>
576+
dtype: Float64
577+
# NaN produced by arithmetic (0/0) remained NaN
559578
In [3]: ser / 0
560579
Out[3]:
561580
0 NaN
562581
1 <NA>
563582
dtype: Float64
583+
# the NaN value is not considered as missing
584+
In [4]: (ser / 0).isna()
585+
Out[4]:
586+
0 False
587+
1 True
588+
dtype: bool
564589
565590
*New behavior:*
566591

567592
.. ipython:: python
568593
569-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595+
ser
570596
ser / 0
597+
(ser / 0).isna()
571598
572-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600+
values, and an option to control this behaviour is added in 3.0 through
601+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602+
considered distinct and specifically as a floating-point value. As a consequence,
603+
it cannot be used with integer dtypes.
573604

574605
*Old behavior:*
575606

@@ -583,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
583614

584615
.. ipython:: python
585616
586-
pd.set_option("mode.nan_is_na", False)
587-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
588-
ser[1]
617+
with pd.option_context("future.distinguish_nan_and_na", True):
618+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619+
print(ser[1])
620+
621+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623+
integer dtype.
589624

590-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627+
:class:`NA` entries are present, where before they would coerce to
628+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629+
to :meth:`Series.to_numpy`.
591630

592-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631+
Note that the option is experimental and subject to change in future releases.
593632

594633
The ``__module__`` attribute now points to public modules
595634
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1187,6 +1226,7 @@ MultiIndex
11871226
I/O
11881227
^^^
11891228
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1229+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
11901230
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
11911231
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
11921232
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1244,6 +1284,7 @@ Plotting
12441284
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
12451285
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
12461286
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1287+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
12471288

12481289
Groupby/resample/rolling
12491290
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1274,11 +1315,13 @@ Groupby/resample/rolling
12741315
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
12751316
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
12761317
- Bug in :meth:`Rolling.sem` computing incorrect results because it divided by ``sqrt((n - 1) * (n - ddof))`` instead of ``sqrt(n * (n - ddof))``. (:issue:`63180`)
1277-
- Bug in :meth:`Rolling.skew` incorrectly computing skewness for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`)
1318+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` incorrectly computing skewness and kurtosis, respectively, for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`, :issue:`61416`)
1319+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` where results varied with input length despite identical data and window contents (:issue:`54380`)
12781320
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
12791321
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
12801322
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
12811323
- Bug in :meth:`DataFrame.groupby` methods when operating on NumPy-nullable data failing when the NA mask was not C-contiguous (:issue:`61031`)
1324+
- Bug in :meth:`DataFrame.groupby` when grouping by a Series and that Series was modified after calling :meth:`DataFrame.groupby` but prior to the groupby operation (:issue:`63219`)
12821325

12831326
Reshaping
12841327
^^^^^^^^^
@@ -1303,6 +1346,7 @@ Reshaping
13031346
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
13041347
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
13051348
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
1349+
- Bug in :meth:`DataFrame.merge` where specifying both ``right_on`` and ``right_index`` did not raise a ``MergeError`` if ``left_on`` is also specified. Now raises a ``MergeError`` in such cases. (:issue:`63242`)
13061350
- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
13071351
- Bug in :meth:`DataFrame.merge` with :class:`CategoricalDtype` columns incorrectly raising ``RecursionError`` (:issue:`56376`)
13081352
- Bug in :meth:`DataFrame.merge` with a ``float32`` index incorrectly casting the index to ``float64`` (:issue:`41626`)
@@ -1312,6 +1356,7 @@ Sparse
13121356
- Bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
13131357
- Bug in :meth:`DataFrame.sparse.from_spmatrix` which hard coded an invalid ``fill_value`` for certain subtypes. (:issue:`59063`)
13141358
- Bug in :meth:`DataFrame.sparse.to_dense` which ignored subclassing and always returned an instance of :class:`DataFrame` (:issue:`59913`)
1359+
- Bug in :meth:`cumsum` for integer arrays Calling SparseArray.cumsum caused max recursion depth error. (:issue:`62669`)
13151360

13161361
ExtensionArray
13171362
^^^^^^^^^^^^^^

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

0 commit comments

Comments
 (0)