You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v3.0.0.rst
+49Lines changed: 49 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -335,6 +335,55 @@ small behavior differences as collateral:
335
335
- Adding or subtracting a :class:`Day` with a :class:`Timedelta` is no longer supported.
336
336
- Adding or subtracting a :class:`Day` offset to a timezone-aware :class:`Timestamp` or datetime-like may lead to an ambiguous or non-existent time, which will raise.
337
337
338
+
.. _whatsnew_300.api_breaking.nan_vs_na:
339
+
340
+
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
344
+
345
+
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
346
+
347
+
*Old behavior:*
348
+
349
+
.. code-block:: ipython
350
+
351
+
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
352
+
In [3]: ser / 0
353
+
Out[3]:
354
+
0 NaN
355
+
1 <NA>
356
+
dtype: Float64
357
+
358
+
*New behavior:*
359
+
360
+
.. ipython:: python
361
+
362
+
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
363
+
ser /0
364
+
365
+
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
366
+
367
+
*Old behavior:*
368
+
369
+
.. code-block:: ipython
370
+
371
+
In [2]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
372
+
In [3]: ser[1]
373
+
Out[3]: <NA>
374
+
375
+
*New behavior:*
376
+
377
+
.. ipython:: python
378
+
379
+
pd.set_option("mode.nan_is_na", False)
380
+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
381
+
ser[1]
382
+
383
+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
384
+
385
+
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
0 commit comments