Skip to content

Commit 0d5beca

Browse files
authored
Merge branch 'main' into shiny-new-feature
2 parents b000c45 + e450f0c commit 0d5beca

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+645
-206
lines changed

.github/workflows/wheels.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -207,13 +207,10 @@ jobs:
207207
if: ${{ success() && (env.IS_SCHEDULE_DISPATCH == 'true' || env.IS_PUSH == 'true') }}
208208
shell: bash -el {0}
209209
env:
210-
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
211210
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
212211
# trigger an upload to
213212
# https://anaconda.org/scientific-python-nightly-wheels/pandas
214213
# for cron jobs or "Run workflow" (restricted to main branch).
215-
# Tags will upload to
216-
# https://anaconda.org/multibuild-wheels-staging/pandas
217214
# The tokens were originally generated at anaconda.org
218215
run: |
219216
source ci/upload_wheels.sh

ci/upload_wheels.sh

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,8 @@
22
# Modified from numpy's https://github.com/numpy/numpy/blob/main/tools/wheels/upload_wheels.sh
33

44
set_upload_vars() {
5-
echo "IS_PUSH is $IS_PUSH"
65
echo "IS_SCHEDULE_DISPATCH is $IS_SCHEDULE_DISPATCH"
7-
if [[ "$IS_PUSH" == "true" ]]; then
8-
echo push and tag event
9-
export ANACONDA_ORG="multibuild-wheels-staging"
10-
export TOKEN="$PANDAS_STAGING_UPLOAD_TOKEN"
11-
export ANACONDA_UPLOAD="true"
12-
elif [[ "$IS_SCHEDULE_DISPATCH" == "true" ]]; then
6+
if [[ "$IS_SCHEDULE_DISPATCH" == "true" ]]; then
137
echo scheduled or dispatched event
148
export ANACONDA_ORG="scientific-python-nightly-wheels"
159
export TOKEN="$PANDAS_NIGHTLY_UPLOAD_TOKEN"

doc/source/development/maintaining.rst

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -433,14 +433,6 @@ which will be triggered when the tag is pushed.
433433
3. Download the source distribution and wheels from the `wheel staging area <https://anaconda.org/scientific-python-nightly-wheels/pandas>`_.
434434
Be careful to make sure that no wheels are missing (e.g. due to failed builds).
435435

436-
Running scripts/download_wheels.sh with the version that you want to download wheels/the sdist for should do the trick.
437-
This script will make a ``dist`` folder inside your clone of pandas and put the downloaded wheels and sdist there::
438-
439-
scripts/download_wheels.sh <VERSION>
440-
441-
ATTENTION: this is currently not downloading *all* wheels, and you have to
442-
manually download the remainings wheels and sdist!
443-
444436
4. Create a `new GitHub release <https://github.com/pandas-dev/pandas/releases/new>`_:
445437

446438
- Tag: ``<version>``

doc/source/user_guide/copy_on_write.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -249,9 +249,9 @@ two subsequent indexing operations, e.g.
249249
In [3]: df
250250
Out[3]:
251251
foo bar
252-
0 100 4
252+
0 1 4
253253
1 2 5
254-
2 3 6
254+
2 100 6
255255
256256
The column ``foo`` was updated where the column ``bar`` is greater than 5.
257257
This violated the CoW principles though, because it would have to modify the

doc/source/whatsnew/v3.0.0.rst

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -553,29 +553,55 @@ small behavior differences as collateral:
553553
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes
554554
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
555555

556-
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``), ``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others. This was done to make adoption easier, but caused some confusion (:issue:`32265`). In 3.0, an option ``"mode.nan_is_na"`` (default ``True``) controls whether to treat ``NaN`` as equivalent to :class:`NA`.
556+
Previously, when dealing with a nullable dtype (e.g. ``Float64Dtype`` or ``int64[pyarrow]``),
557+
``NaN`` was treated as interchangeable with :class:`NA` in some circumstances but not others.
558+
This was done to make adoption easier, but caused some confusion (:issue:`32265`).
559+
In 3.0, this behaviour is made consistent to by default treat ``NaN`` as equivalent
560+
to :class:`NA` in all cases.
557561

558-
With ``pd.set_option("mode.nan_is_na", True)`` (again, this is the default), ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__`` and be treated the same as :class:`NA`. The only change users will see is that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN`` entries produce :class:`NA` entries instead:
562+
By default, ``NaN`` can be passed to constructors, ``__setitem__``, ``__contains__``
563+
and will be treated the same as :class:`NA`. The only change users will see is
564+
that arithmetic and ``np.ufunc`` operations that previously introduced ``NaN``
565+
entries produce :class:`NA` entries instead.
559566

560567
*Old behavior:*
561568

562569
.. code-block:: ipython
563570
564-
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
571+
# NaN in input gets converted to NA
572+
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
573+
In [2]: ser
574+
Out[2]:
575+
0 0.0
576+
1 <NA>
577+
dtype: Float64
578+
# NaN produced by arithmetic (0/0) remained NaN
565579
In [3]: ser / 0
566580
Out[3]:
567581
0 NaN
568582
1 <NA>
569583
dtype: Float64
584+
# the NaN value is not considered as missing
585+
In [4]: (ser / 0).isna()
586+
Out[4]:
587+
0 False
588+
1 True
589+
dtype: bool
570590
571591
*New behavior:*
572592

573593
.. ipython:: python
574594
575-
ser = pd.Series([0, None], dtype=pd.Float64Dtype())
595+
ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
596+
ser
576597
ser / 0
598+
(ser / 0).isna()
577599
578-
By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
600+
In the future, the intention is to consider ``NaN`` and :class:`NA` as distinct
601+
values, and an option to control this behaviour is added in 3.0 through
602+
``pd.options.future.distinguish_nan_and_na``. When enabled, ``NaN`` is always
603+
considered distinct and specifically as a floating-point value. As a consequence,
604+
it cannot be used with integer dtypes.
579605

580606
*Old behavior:*
581607

@@ -589,13 +615,21 @@ By contrast, with ``pd.set_option("mode.nan_is_na", False)``, ``NaN`` is always
589615

590616
.. ipython:: python
591617
592-
pd.set_option("mode.nan_is_na", False)
593-
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
594-
ser[1]
618+
with pd.option_context("future.distinguish_nan_and_na", True):
619+
ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
620+
print(ser[1])
621+
622+
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in
623+
the latter example, this would raise, as a float ``NaN`` cannot be held by an
624+
integer dtype.
595625

596-
If we had passed ``pd.Int64Dtype()`` or ``"int64[pyarrow]"`` for the dtype in the latter example, this would raise, as a float ``NaN`` cannot be held by an integer dtype.
626+
With ``"future.distinguish_nan_and_na"`` enabled, ``ser.to_numpy()`` (and
627+
``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if
628+
:class:`NA` entries are present, where before they would coerce to
629+
``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan``
630+
to :meth:`Series.to_numpy`.
597631

598-
With ``"mode.nan_is_na"`` set to ``False``, ``ser.to_numpy()`` (and ``frame.values`` and ``np.asarray(obj)``) will convert to ``object`` dtype if :class:`NA` entries are present, where before they would coerce to ``NaN``. To retain a float numpy dtype, explicitly pass ``na_value=np.nan`` to :meth:`Series.to_numpy`.
632+
Note that the option is experimental and subject to change in future releases.
599633

600634
The ``__module__`` attribute now points to public modules
601635
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1193,6 +1227,7 @@ MultiIndex
11931227
I/O
11941228
^^^
11951229
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1230+
- Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with ``timedelta64`` dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`63239`)
11961231
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
11971232
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
11981233
- Bug in :func:`pandas.json_normalize` inconsistently handling non-dict items in ``data`` when ``max_level`` was set. The function will now raise a ``TypeError`` if ``data`` is a list containing non-dict items (:issue:`62829`)
@@ -1250,6 +1285,7 @@ Plotting
12501285
- Bug in :meth:`Series.plot` preventing a line and bar from being aligned on the same plot (:issue:`61161`)
12511286
- Bug in :meth:`Series.plot` preventing a line and scatter plot from being aligned (:issue:`61005`)
12521287
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
1288+
- Bug in plotting with a :class:`TimedeltaIndex` with non-nanosecond resolution displaying incorrect labels (:issue:`63237`)
12531289

12541290
Groupby/resample/rolling
12551291
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1286,6 +1322,7 @@ Groupby/resample/rolling
12861322
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
12871323
- Bug in :meth:`Series.rolling.var` and :meth:`Series.rolling.std` computing incorrect results due to numerical instability. (:issue:`47721`, :issue:`52407`, :issue:`54518`, :issue:`55343`)
12881324
- Bug in :meth:`DataFrame.groupby` methods when operating on NumPy-nullable data failing when the NA mask was not C-contiguous (:issue:`61031`)
1325+
- Bug in :meth:`DataFrame.groupby` when grouping by a Series and that Series was modified after calling :meth:`DataFrame.groupby` but prior to the groupby operation (:issue:`63219`)
12891326

12901327
Reshaping
12911328
^^^^^^^^^
@@ -1310,6 +1347,7 @@ Reshaping
13101347
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
13111348
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
13121349
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
1350+
- Bug in :meth:`DataFrame.merge` where specifying both ``right_on`` and ``right_index`` did not raise a ``MergeError`` if ``left_on`` is also specified. Now raises a ``MergeError`` in such cases. (:issue:`63242`)
13131351
- Bug in :meth:`DataFrame.merge` where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a :class:`MergeError` in such cases. (:issue:`61402`)
13141352
- Bug in :meth:`DataFrame.merge` with :class:`CategoricalDtype` columns incorrectly raising ``RecursionError`` (:issue:`56376`)
13151353
- Bug in :meth:`DataFrame.merge` with a ``float32`` index incorrectly casting the index to ``float64`` (:issue:`41626`)

pandas/_config/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,5 @@ def using_string_dtype() -> bool:
3636

3737

3838
def is_nan_na() -> bool:
39-
_mode_options = _global_config["mode"]
40-
return _mode_options["nan_is_na"]
39+
_mode_options = _global_config["future"]
40+
return not _mode_options["distinguish_nan_and_na"]

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2114,7 +2114,7 @@ class Timedelta(_Timedelta):
21142114
int(ns)
21152115
+ int(us * 1_000)
21162116
+ int(ms * 1_000_000)
2117-
+ seconds
2117+
+ seconds, "ns"
21182118
)
21192119
except OverflowError as err:
21202120
# GH#55503
@@ -2124,6 +2124,13 @@ class Timedelta(_Timedelta):
21242124
)
21252125
raise OutOfBoundsTimedelta(msg) from err
21262126

2127+
if (
2128+
"nanoseconds" not in kwargs
2129+
and cnp.get_timedelta64_value(value) % 1000 == 0
2130+
):
2131+
# If possible, give a microsecond unit
2132+
value = value.astype("m8[us]")
2133+
21272134
disallow_ambiguous_unit(unit)
21282135

21292136
cdef:

pandas/conftest.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -938,7 +938,7 @@ def rand_series_with_duplicate_datetimeindex() -> Series:
938938
Timestamp("2011-01-01", tz="US/Eastern").as_unit("s"),
939939
DatetimeTZDtype(unit="s", tz="US/Eastern"),
940940
),
941-
(Timedelta(seconds=500), "timedelta64[ns]"),
941+
(Timedelta(seconds=500), "timedelta64[us]"),
942942
]
943943
)
944944
def ea_scalar_and_dtype(request):
@@ -2127,5 +2127,5 @@ def monkeysession():
21272127
@pytest.fixture(params=[True, False])
21282128
def using_nan_is_na(request):
21292129
opt = request.param
2130-
with pd.option_context("mode.nan_is_na", opt):
2130+
with pd.option_context("future.distinguish_nan_and_na", not opt):
21312131
yield opt

pandas/core/algorithms.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -948,7 +948,7 @@ def value_counts_internal(
948948
result = Series(counts, index=idx, name=name, copy=False)
949949

950950
if sort:
951-
result = result.sort_values(ascending=ascending)
951+
result = result.sort_values(ascending=ascending, kind="stable")
952952

953953
if normalize:
954954
result = result / counts.sum()

pandas/core/arrays/interval.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2128,9 +2128,8 @@ def _combined(self) -> IntervalSide:
21282128
)
21292129
comb = comb.view("complex128")[:, 0]
21302130
else:
2131-
comb = (np.array(left.ravel(), dtype="complex128")) + (
2132-
1j * np.array(right.ravel(), dtype="complex128")
2133-
)
2131+
comb = np.asarray(left.ravel(), dtype="complex128")
2132+
comb.imag = right.ravel()
21342133
return comb
21352134

21362135
def _from_combined(self, combined: np.ndarray) -> IntervalArray:

0 commit comments

Comments
 (0)