You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v3.0.0.rst
+56-11Lines changed: 56 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,6 +117,9 @@ process in more detail.
117
117
118
118
`PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
119
119
120
+
Setting the option ``mode.copy_on_write`` no longer has any impact. The option is deprecated
121
+
and will be removed in pandas 4.0.
122
+
120
123
.. _whatsnew_300.enhancements.col:
121
124
122
125
``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
@@ -381,6 +384,8 @@ In cases with mixed-resolution inputs, the highest resolution is used:
381
384
382
385
.. warning:: Many users will now get "M8[us]" dtype data in cases when they used to get "M8[ns]". For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.
383
386
387
+
Similarly, the :class:`Timedelta` constructor and :func:`to_timedelta` with a string input now defaults to a microsecond unit, using nanosecond unit only in cases that actually have nanosecond precision.
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
555
+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
556
+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
557
+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
558
+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
559
+
to :class:`NA` in all cases.
551
560
552
-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
561
+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
562
+
and will be treated the same as :class:`NA`. The only change users will see is
563
+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
564
+
entries produce :class:`NA` entries instead.
553
565
554
566
*Old behavior:*
555
567
556
568
.. code-block:: ipython
557
569
558
-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
570
+
# NaN in input gets converted to NA
571
+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
572
+
In [2]: ser
573
+
Out[2]:
574
+
0 0.0
575
+
1 <NA>
576
+
dtype: Float64
577
+
# NaN produced by arithmetic (0/0) remained NaN
559
578
In [3]: ser / 0
560
579
Out[3]:
561
580
0 NaN
562
581
1 <NA>
563
582
dtype: Float64
583
+
# the NaN value is not considered as missing
584
+
In [4]: (ser / 0).isna()
585
+
Out[4]:
586
+
0 False
587
+
1 True
588
+
dtype: bool
564
589
565
590
*New behavior:*
566
591
567
592
.. ipython:: python
568
593
569
-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
594
+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
595
+
ser
570
596
ser /0
597
+
(ser /0).isna()
571
598
572
-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
599
+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
600
+
values, and an option to control this behaviour is added in 3.0 through
601
+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
602
+
considered distinct and specifically as a floating-point value. As a consequence,
603
+
it cannot be used with integer dtypes.
573
604
574
605
*Old behavior:*
575
606
@@ -583,13 +614,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
583
614
584
615
.. ipython:: python
585
616
586
-
pd.set_option("mode.nan_is_na", False)
587
-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
588
-
ser[1]
617
+
with pd.option_context("future.distinguish_nan_and_na", True):
618
+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
619
+
print(ser[1])
620
+
621
+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
622
+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
623
+
integer dtype.
589
624
590
-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
625
+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
626
+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
627
+
:class:`NA` entries are present, where before they would coerce to
628
+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
629
+
to :meth:`Series.to_numpy`.
591
630
592
-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
631
+
Note that the option is experimental and subject to change in future releases.
593
632
594
633
The ``__module__`` attribute now points to public modules
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1229
+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
1190
1230
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
1191
1231
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
1192
1232
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1244,6 +1284,7 @@ Plotting
1244
1284
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
1245
1285
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
1246
1286
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1287
+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
1247
1288
1248
1289
Groupby/resample/rolling
1249
1290
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1274,11 +1315,13 @@ Groupby/resample/rolling
1274
1315
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
1275
1316
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
1276
1317
- Bug in :meth:`Rolling.sem` computing incorrect results because it divided by ``sqrt((n - 1) * (n - ddof))`` instead of ``sqrt(n * (n - ddof))``. (:issue:`63180`)
1277
-
- Bug in :meth:`Rolling.skew` incorrectly computing skewness for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`)
1318
+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` incorrectly computing skewness and kurtosis, respectively, for windows following outliers due to numerical instability. The calculation now properly handles catastrophic cancellation by recomputing affected windows (:issue:`47461`, :issue:`61416`)
1319
+
- Bug in :meth:`Rolling.skew` and in :meth:`Rolling.kurt` where results varied with input length despite identical data and window contents (:issue:`54380`)
1278
1320
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1279
1321
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
1280
1322
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
1281
1323
- Bug in :meth:`DataFrame.groupby` methods when operating on NumPy-nullable data failing when the NA mask was not C-contiguous (:issue:`61031`)
1324
+
- Bug in :meth:`DataFrame.groupby` when grouping by a Series and that Series was modified after calling :meth:`DataFrame.groupby` but prior to the groupby operation (:issue:`63219`)
1282
1325
1283
1326
Reshaping
1284
1327
^^^^^^^^^
@@ -1303,6 +1346,7 @@ Reshaping
1303
1346
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
1304
1347
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
1305
1348
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
1349
+
- Bug in :meth:`DataFrame.merge` where specifying both ``right_on`` and ``right_index`` did not raise a ``MergeError`` if ``left_on`` is also specified. Now raises a ``MergeError`` in such cases. (:issue:`63242`)
1306
1350
- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
1307
1351
- Bug in :meth:`DataFrame.merge` with :class:`CategoricalDtype` columns incorrectly raising ``RecursionError`` (:issue:`56376`)
1308
1352
- Bug in :meth:`DataFrame.merge` with a ``float32`` index incorrectly casting the index to ``float64`` (:issue:`41626`)
@@ -1312,6 +1356,7 @@ Sparse
1312
1356
- Bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
1313
1357
- Bug in :meth:`DataFrame.sparse.from_spmatrix` which hard coded an invalid ``fill_value`` for certain subtypes. (:issue:`59063`)
1314
1358
- Bug in :meth:`DataFrame.sparse.to_dense` which ignored subclassing and always returned an instance of :class:`DataFrame` (:issue:`59913`)
1359
+
- Bug in :meth:`cumsum` for integer arrays Calling SparseArray.cumsum caused max recursion depth error. (:issue:`62669`)
0 commit comments