Skip to content

Commit bae4e48

Browse files
authored
Merge branch 'pandas-dev:main' into main
2 parents 70a8357 + 6cca195 commit bae4e48

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1642
-1162
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/cibuildwheel@v3.1.4
165+
uses: pypa/cibuildwheel@v3.2.0
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

doc/source/reference/missing_value.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,12 @@ NA is the way to represent missing values for nullable dtypes (see below):
1111

1212
.. autosummary::
1313
:toctree: api/
14-
:template: autosummary/class_without_autosummary.rst
1514

1615
NA
1716

1817
NaT is the missing value for timedelta and datetime data (see below):
1918

2019
.. autosummary::
2120
:toctree: api/
22-
:template: autosummary/class_without_autosummary.rst
2321

2422
NaT

doc/source/whatsnew/v2.3.2.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@ become the default string dtype in pandas 3.0. See
2222

2323
Bug fixes
2424
^^^^^^^^^
25-
- Fix :meth:`~Series.str.isdigit` to correctly recognize unicode superscript
26-
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
2725
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2826
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2927
(:issue:`61889`)
@@ -39,4 +37,4 @@ Bug fixes
3937
Contributors
4038
~~~~~~~~~~~~
4139

42-
.. contributors:: v2.3.1..v2.3.2|HEAD
40+
.. contributors:: v2.3.1..v2.3.2

doc/source/whatsnew/v2.3.3.rst

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
.. _whatsnew_233:
22

3-
What's new in 2.3.3 (September XX, 2025)
3+
What's new in 2.3.3 (September 29, 2025)
44
----------------------------------------
55

66
These are the changes in pandas 2.3.3. See :ref:`release` for a full changelog
77
including other versions of pandas.
88

99
{{ header }}
1010

11-
.. _whatsnew_220.py14_compat:
11+
.. _whatsnew_233.py14_compat:
1212

1313
Pandas 2.3.3 is now compatible with Python 3.14
1414
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -37,25 +37,22 @@ Improvements
3737
specifying ``include=["object"]`` for backwards compatibility. In a future
3838
release, this will be deprecated and code for pandas 3+ should be updated to
3939
do ``include=["str"]`` (:issue:`61916`)
40-
40+
- Support the ``/`` operation between a ``pathlib.Path`` object and a :class:`StringDtype`
41+
Series, similarly as it works for object-dtype Series (:issue:`61940`)
4142

4243
.. _whatsnew_233.string_fixes.bugs:
4344

4445
Bug fixes
4546
^^^^^^^^^
4647
- Fix bug in :meth:`Series.str.replace` using named capture groups (e.g., ``\g<name>``) with the Arrow-backed dtype would raise an error (:issue:`57636`)
47-
- Fix regression in ``~Series.str.contains``, ``~Series.str.match`` and ``~Series.str.fullmatch``
48+
- Fix regression in :meth:`Series.str.contains`, :meth:`~Series.str.match` and :meth:`~Series.str.fullmatch`
4849
with a compiled regex and custom flags (:issue:`62240`)
49-
- Fix :meth:`Series.str.match` and :meth:`Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
50-
51-
52-
Improvements and fixes for Copy-on-Write
53-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54-
55-
Bug fixes
56-
^^^^^^^^^
57-
58-
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
50+
- Fix :meth:`Series.str.match` and :meth:`~Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
51+
- Fix bug in :meth:`~DataFrame.groupby` with ``sum()`` and unobserved categories resulting in ``0`` instead of the empty string ``""`` (:issue:`61909`)
52+
- Fix :meth:`Series.str.isdigit` to correctly recognize unicode superscript
53+
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
54+
- Fix comparing a :class:`StringDtype` Series with mixed objects raising an error (:issue:`60228`)
55+
- Fix error being raised when using a numpy ufunc with a Python-backed string array (:issue:`40800`)
5956

6057
Other changes
6158
~~~~~~~~~~~~~
@@ -65,9 +62,17 @@ Other changes
6562
Resampling with a :class:`PeriodIndex` is supported again, but a subset of
6663
methods that return incorrect results will raise an error in pandas 3.0 (:issue:`57033`)
6764

65+
Other bug fixes
66+
~~~~~~~~~~~~~~~~
67+
68+
- Fix memory leak in :meth:`DataFrame.to_json` with datetime columns (:issue:`62204`)
69+
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
70+
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
6871

6972
.. ---------------------------------------------------------------------------
7073
.. _whatsnew_233.contributors:
7174

7275
Contributors
7376
~~~~~~~~~~~~
77+
78+
.. contributors:: v2.3.2..v2.3.3|HEAD

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -981,7 +981,8 @@ Timezones
981981
^^^^^^^^^
982982
- Bug in :meth:`DatetimeIndex.union`, :meth:`DatetimeIndex.intersection`, and :meth:`DatetimeIndex.symmetric_difference` changing timezone to UTC when merging two DatetimeIndex objects with the same timezone but different units (:issue:`60080`)
983983
- Bug in :meth:`Series.dt.tz_localize` with a timezone-aware :class:`ArrowDtype` incorrectly converting to UTC when ``tz=None`` (:issue:`61780`)
984-
-
984+
- Fixed bug in :func:`date_range` where tz-aware endpoints with calendar offsets (e.g. ``"MS"``) failed on DST fall-back. These now respect ``ambiguous``/ ``nonexistent``. (:issue:`52908`)
985+
985986

986987
Numeric
987988
^^^^^^^
@@ -1054,6 +1055,8 @@ MultiIndex
10541055
I/O
10551056
^^^
10561057
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1058+
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
1059+
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
10571060
- Bug in :meth:`.DataFrame.to_json` when ``"index"`` was a value in the :attr:`DataFrame.column` and :attr:`Index.name` was ``None``. Now, this will fail with a ``ValueError`` (:issue:`58925`)
10581061
- Bug in :meth:`.io.common.is_fsspec_url` not recognizing chained fsspec URLs (:issue:`48978`)
10591062
- Bug in :meth:`DataFrame._repr_html_` which ignored the ``"display.float_format"`` option (:issue:`59876`)
@@ -1217,10 +1220,11 @@ Other
12171220
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
12181221
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
12191222
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
1223+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`NA` values in a :class:`Float64Dtype` object with ``np.nan``; this now works with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`55127`)
1224+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`np.nan` values in a :class:`Int64Dtype` object with :class:`NA`; this is now a no-op with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`51237`)
12201225
- Fixed bug in the :meth:`Series.rank` with object dtype and extremely small float values (:issue:`62036`)
12211226
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)
12221227
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
1223-
-
12241228

12251229
.. ***DO NOT USE THIS SECTION***
12261230

pandas/_libs/missing.pyx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,7 @@ class NAType(C_NAType):
393393
>>> True | pd.NA
394394
True
395395
"""
396+
__module__ = "pandas"
396397

397398
_instance = None
398399

pandas/_libs/tslibs/nattype.pyx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,8 @@ class NaTType(_NaT):
372372
1 NaT
373373
"""
374374

375+
__module__ = "pandas"
376+
375377
def __new__(cls):
376378
cdef _NaT base
377379

pandas/core/arrays/arrow/array.py

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -883,22 +883,27 @@ def _cmp_method(self, other, op) -> ArrowExtensionArray:
883883
ltype = self._pa_array.type
884884

885885
if isinstance(other, (ExtensionArray, np.ndarray, list)):
886-
boxed = self._box_pa(other)
887-
rtype = boxed.type
888-
if (pa.types.is_timestamp(ltype) and pa.types.is_date(rtype)) or (
889-
pa.types.is_timestamp(rtype) and pa.types.is_date(ltype)
890-
):
891-
# GH#62157 match non-pyarrow behavior
892-
result = ops.invalid_comparison(self, other, op)
893-
result = pa.array(result, type=pa.bool_())
886+
try:
887+
boxed = self._box_pa(other)
888+
except pa.lib.ArrowInvalid:
889+
# e.g. GH#60228 [1, "b"] we have to operate pointwise
890+
res_values = [op(x, y) for x, y in zip(self, other)]
891+
result = pa.array(res_values, type=pa.bool_(), from_pandas=True)
894892
else:
895-
try:
896-
result = pc_func(self._pa_array, boxed)
897-
except pa.ArrowNotImplementedError:
898-
# TODO: could this be wrong if other is object dtype?
899-
# in which case we need to operate pointwise?
893+
rtype = boxed.type
894+
if (pa.types.is_timestamp(ltype) and pa.types.is_date(rtype)) or (
895+
pa.types.is_timestamp(rtype) and pa.types.is_date(ltype)
896+
):
897+
# GH#62157 match non-pyarrow behavior
900898
result = ops.invalid_comparison(self, other, op)
901899
result = pa.array(result, type=pa.bool_())
900+
else:
901+
try:
902+
result = pc_func(self._pa_array, boxed)
903+
except pa.ArrowNotImplementedError:
904+
result = ops.invalid_comparison(self, other, op)
905+
result = pa.array(result, type=pa.bool_())
906+
902907
elif is_scalar(other):
903908
if (isinstance(other, datetime) and pa.types.is_date(ltype)) or (
904909
type(other) is date and pa.types.is_timestamp(ltype)

pandas/core/arrays/base.py

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@
3030
from pandas.compat.numpy import function as nv
3131
from pandas.errors import AbstractMethodError
3232
from pandas.util._decorators import (
33-
Appender,
34-
Substitution,
3533
cache_readonly,
3634
)
3735
from pandas.util._validators import (
@@ -1669,9 +1667,48 @@ def factorize(
16691667
Categories (3, str): ['a', 'b', 'c']
16701668
"""
16711669

1672-
@Substitution(klass="ExtensionArray")
1673-
@Appender(_extension_array_shared_docs["repeat"])
16741670
def repeat(self, repeats: int | Sequence[int], axis: AxisInt | None = None) -> Self:
1671+
"""
1672+
Repeat elements of an ExtensionArray.
1673+
1674+
Returns a new ExtensionArray where each element of the current ExtensionArray
1675+
is repeated consecutively a given number of times.
1676+
1677+
Parameters
1678+
----------
1679+
repeats : int or array of ints
1680+
The number of repetitions for each element. This should be a
1681+
non-negative integer. Repeating 0 times will return an empty
1682+
ExtensionArray.
1683+
axis : None
1684+
Must be ``None``. Has no effect but is accepted for compatibility
1685+
with numpy.
1686+
1687+
Returns
1688+
-------
1689+
ExtensionArray
1690+
Newly created ExtensionArray with repeated elements.
1691+
1692+
See Also
1693+
--------
1694+
Series.repeat : Equivalent function for Series.
1695+
Index.repeat : Equivalent function for Index.
1696+
numpy.repeat : Similar method for :class:`numpy.ndarray`.
1697+
ExtensionArray.take : Take arbitrary positions.
1698+
1699+
Examples
1700+
--------
1701+
>>> cat = pd.Categorical(["a", "b", "c"])
1702+
>>> cat
1703+
['a', 'b', 'c']
1704+
Categories (3, str): ['a', 'b', 'c']
1705+
>>> cat.repeat(2)
1706+
['a', 'a', 'b', 'b', 'c', 'c']
1707+
Categories (3, str): ['a', 'b', 'c']
1708+
>>> cat.repeat([1, 2, 3])
1709+
['a', 'b', 'b', 'c', 'c', 'c']
1710+
Categories (3, str): ['a', 'b', 'c']
1711+
"""
16751712
nv.validate_repeat((), {"axis": axis})
16761713
ind = np.arange(len(self)).repeat(repeats)
16771714
return self.take(ind)

pandas/core/arrays/datetimes.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -456,13 +456,14 @@ def _generate_range(
456456
end = _maybe_localize_point(end, freq, tz, ambiguous, nonexistent)
457457

458458
if freq is not None:
459-
# We break Day arithmetic (fixed 24 hour) here and opt for
460-
# Day to mean calendar day (23/24/25 hour). Therefore, strip
461-
# tz info from start and day to avoid DST arithmetic
462-
if isinstance(freq, Day):
463-
if start is not None:
459+
# Offset handling:
460+
# Ticks (fixed-duration like hours/minutes): keep tz; do absolute-time math.
461+
# Other calendar offsets: drop tz; do naive wall time; localize once later
462+
# so `ambiguous`/`nonexistent` are applied correctly.
463+
if not isinstance(freq, Tick):
464+
if start is not None and start.tz is not None:
464465
start = start.tz_localize(None)
465-
if end is not None:
466+
if end is not None and end.tz is not None:
466467
end = end.tz_localize(None)
467468

468469
if isinstance(freq, (Tick, Day)):

0 commit comments

Comments
 (0)