You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
556
+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
557
+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
558
+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
559
+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
560
+
to :class:`NA` in all cases.
557
561
558
-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
562
+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
563
+
and will be treated the same as :class:`NA`. The only change users will see is
564
+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
565
+
entries produce :class:`NA` entries instead.
559
566
560
567
*Old behavior:*
561
568
562
569
.. code-block:: ipython
563
570
564
-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
571
+
# NaN in input gets converted to NA
572
+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
573
+
In [2]: ser
574
+
Out[2]:
575
+
0 0.0
576
+
1 <NA>
577
+
dtype: Float64
578
+
# NaN produced by arithmetic (0/0) remained NaN
565
579
In [3]: ser / 0
566
580
Out[3]:
567
581
0 NaN
568
582
1 <NA>
569
583
dtype: Float64
584
+
# the NaN value is not considered as missing
585
+
In [4]: (ser / 0).isna()
586
+
Out[4]:
587
+
0 False
588
+
1 True
589
+
dtype: bool
570
590
571
591
*New behavior:*
572
592
573
593
.. ipython:: python
574
594
575
-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
595
+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
596
+
ser
576
597
ser /0
598
+
(ser /0).isna()
577
599
578
-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
600
+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
601
+
values, and an option to control this behaviour is added in 3.0 through
602
+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
603
+
considered distinct and specifically as a floating-point value. As a consequence,
604
+
it cannot be used with integer dtypes.
579
605
580
606
*Old behavior:*
581
607
@@ -589,13 +615,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
589
615
590
616
.. ipython:: python
591
617
592
-
pd.set_option("mode.nan_is_na", False)
593
-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
594
-
ser[1]
618
+
with pd.option_context("future.distinguish_nan_and_na", True):
619
+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
620
+
print(ser[1])
621
+
622
+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
623
+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
624
+
integer dtype.
595
625
596
-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
626
+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
627
+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
628
+
:class:`NA` entries are present, where before they would coerce to
629
+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
630
+
to :meth:`Series.to_numpy`.
597
631
598
-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
632
+
Note that the option is experimental and subject to change in future releases.
599
633
600
634
The ``__module__`` attribute now points to public modules
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1230
+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
1196
1231
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
1197
1232
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
1198
1233
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1250,6 +1285,7 @@ Plotting
1250
1285
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
1251
1286
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
1252
1287
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1288
+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
1253
1289
1254
1290
Groupby/resample/rolling
1255
1291
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1286,6 +1322,7 @@ Groupby/resample/rolling
1286
1322
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
1287
1323
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
1288
1324
- Bug in :meth:`DataFrame.groupby` methods when operating on NumPy-nullable data failing when the NA mask was not C-contiguous (:issue:`61031`)
1325
+
- Bug in :meth:`DataFrame.groupby` when grouping by a Series and that Series was modified after calling :meth:`DataFrame.groupby` but prior to the groupby operation (:issue:`63219`)
1289
1326
1290
1327
Reshaping
1291
1328
^^^^^^^^^
@@ -1310,6 +1347,7 @@ Reshaping
1310
1347
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
1311
1348
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
1312
1349
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
1350
+
- Bug in :meth:`DataFrame.merge` where specifying both ``right_on`` and ``right_index`` did not raise a ``MergeError`` if ``left_on`` is also specified. Now raises a ``MergeError`` in such cases. (:issue:`63242`)
1313
1351
- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
1314
1352
- Bug in :meth:`DataFrame.merge` with :class:`CategoricalDtype` columns incorrectly raising ``RecursionError`` (:issue:`56376`)
1315
1353
- Bug in :meth:`DataFrame.merge` with a ``float32`` index incorrectly casting the index to ``float64`` (:issue:`41626`)
0 commit comments