Skip to content

Commit da928f8

Browse files
committed
Merge remote-tracking branch 'upstream/main' into test/sleep
2 parents b2f4839 + d815947 commit da928f8

File tree

21 files changed

+645
-479
lines changed

21 files changed

+645
-479
lines changed

doc/source/whatsnew/v2.3.2.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@ become the default string dtype in pandas 3.0. See
2222

2323
Bug fixes
2424
^^^^^^^^^
25-
- Fix :meth:`~Series.str.isdigit` to correctly recognize unicode superscript
26-
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
2725
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2826
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2927
(:issue:`61889`)
@@ -39,4 +37,4 @@ Bug fixes
3937
Contributors
4038
~~~~~~~~~~~~
4139

42-
.. contributors:: v2.3.1..v2.3.2|HEAD
40+
.. contributors:: v2.3.1..v2.3.2

doc/source/whatsnew/v2.3.3.rst

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
.. _whatsnew_233:
22

3-
What's new in 2.3.3 (September XX, 2025)
3+
What's new in 2.3.3 (September 29, 2025)
44
----------------------------------------
55

66
These are the changes in pandas 2.3.3. See :ref:`release` for a full changelog
77
including other versions of pandas.
88

99
{{ header }}
1010

11-
.. _whatsnew_220.py14_compat:
11+
.. _whatsnew_233.py14_compat:
1212

1313
Pandas 2.3.3 is now compatible with Python 3.14
1414
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -37,27 +37,23 @@ Improvements
3737
specifying ``include=["object"]`` for backwards compatibility. In a future
3838
release, this will be deprecated and code for pandas 3+ should be updated to
3939
do ``include=["str"]`` (:issue:`61916`)
40-
40+
- Support the ``/`` operation between a ``pathlib.Path`` object and a :class:`StringDtype`
41+
Series, similarly as it works for object-dtype Series (:issue:`61940`)
4142

4243
.. _whatsnew_233.string_fixes.bugs:
4344

4445
Bug fixes
4546
^^^^^^^^^
4647
- Fix bug in :meth:`Series.str.replace` using named capture groups (e.g., ``\g<name>``) with the Arrow-backed dtype would raise an error (:issue:`57636`)
47-
- Fix regression in ``~Series.str.contains``, ``~Series.str.match`` and ``~Series.str.fullmatch``
48+
- Fix regression in :meth:`Series.str.contains`, :meth:`~Series.str.match` and :meth:`~Series.str.fullmatch`
4849
with a compiled regex and custom flags (:issue:`62240`)
49-
- Fix :meth:`Series.str.match` and :meth:`Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
50+
- Fix :meth:`Series.str.match` and :meth:`~Series.str.fullmatch` not matching patterns with groups correctly for the Arrow-backed string dtype (:issue:`61072`)
51+
- Fix bug in :meth:`~DataFrame.groupby` with ``sum()`` and unobserved categories resulting in ``0`` instead of the empty string ``""`` (:issue:`61909`)
52+
- Fix :meth:`Series.str.isdigit` to correctly recognize unicode superscript
53+
characters as digits for :class:`StringDtype` backed by PyArrow (:issue:`61466`)
5054
- Fix comparing a :class:`StringDtype` Series with mixed objects raising an error (:issue:`60228`)
5155
- Fix error being raised when using a numpy ufunc with a Python-backed string array (:issue:`40800`)
5256

53-
Improvements and fixes for Copy-on-Write
54-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55-
56-
Bug fixes
57-
^^^^^^^^^
58-
59-
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
60-
6157
Other changes
6258
~~~~~~~~~~~~~
6359

@@ -66,9 +62,17 @@ Other changes
6662
Resampling with a :class:`PeriodIndex` is supported again, but a subset of
6763
methods that return incorrect results will raise an error in pandas 3.0 (:issue:`57033`)
6864

65+
Other bug fixes
66+
~~~~~~~~~~~~~~~~
67+
68+
- Fix memory leak in :meth:`DataFrame.to_json` with datetime columns (:issue:`62204`)
69+
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
70+
- The :meth:`DataFrame.iloc` now works correctly with ``copy_on_write`` option when assigning values after subsetting the columns of a homogeneous DataFrame (:issue:`60309`)
6971

7072
.. ---------------------------------------------------------------------------
7173
.. _whatsnew_233.contributors:
7274

7375
Contributors
7476
~~~~~~~~~~~~
77+
78+
.. contributors:: v2.3.2..v2.3.3|HEAD

doc/source/whatsnew/v3.0.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1054,6 +1054,8 @@ MultiIndex
10541054
I/O
10551055
^^^
10561056
- Bug in :class:`DataFrame` and :class:`Series` ``repr`` of :py:class:`collections.abc.Mapping` elements. (:issue:`57915`)
1057+
- Fix bug in ``on_bad_lines`` callable when returning too many fields: now emits
1058+
``ParserWarning`` and truncates extra fields regardless of ``index_col`` (:issue:`61837`)
10571059
- Bug in :meth:`.DataFrame.to_json` when ``"index"`` was a value in the :attr:`DataFrame.column` and :attr:`Index.name` was ``None``. Now, this will fail with a ``ValueError`` (:issue:`58925`)
10581060
- Bug in :meth:`.io.common.is_fsspec_url` not recognizing chained fsspec URLs (:issue:`48978`)
10591061
- Bug in :meth:`DataFrame._repr_html_` which ignored the ``"display.float_format"`` option (:issue:`59876`)
@@ -1217,10 +1219,11 @@ Other
12171219
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
12181220
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
12191221
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
1222+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`NA` values in a :class:`Float64Dtype` object with ``np.nan``; this now works with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`55127`)
1223+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when trying to replace :class:`np.nan` values in a :class:`Int64Dtype` object with :class:`NA`; this is now a no-op with ``pd.set_option("mode.nan_is_na", False)`` and is irrelevant otherwise (:issue:`51237`)
12201224
- Fixed bug in the :meth:`Series.rank` with object dtype and extremely small float values (:issue:`62036`)
12211225
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)
12221226
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
1223-
-
12241227

12251228
.. ***DO NOT USE THIS SECTION***
12261229

pandas/core/arrays/masked.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,9 @@ def __setitem__(self, key, value) -> None:
312312
key = check_array_indexer(self, key)
313313

314314
if is_scalar(value):
315-
if is_valid_na_for_dtype(value, self.dtype):
315+
if is_valid_na_for_dtype(value, self.dtype) and not (
316+
lib.is_float(value) and not is_nan_na()
317+
):
316318
self._mask[key] = True
317319
else:
318320
value = self._validate_setitem_value(value)

pandas/core/missing.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515

1616
import numpy as np
1717

18+
from pandas._config import is_nan_na
19+
1820
from pandas._libs import (
1921
NaT,
2022
algos,
@@ -37,7 +39,11 @@
3739
is_object_dtype,
3840
needs_i8_conversion,
3941
)
40-
from pandas.core.dtypes.dtypes import DatetimeTZDtype
42+
from pandas.core.dtypes.dtypes import (
43+
ArrowDtype,
44+
BaseMaskedDtype,
45+
DatetimeTZDtype,
46+
)
4147
from pandas.core.dtypes.missing import (
4248
is_valid_na_for_dtype,
4349
isna,
@@ -86,6 +92,31 @@ def mask_missing(arr: ArrayLike, value) -> npt.NDArray[np.bool_]:
8692
"""
8793
dtype, value = infer_dtype_from(value)
8894

95+
if (
96+
isinstance(arr.dtype, (BaseMaskedDtype, ArrowDtype))
97+
and lib.is_float(value)
98+
and np.isnan(value)
99+
and not is_nan_na()
100+
):
101+
# TODO: this should be done in an EA method?
102+
if arr.dtype.kind == "f":
103+
# GH#55127
104+
if isinstance(arr.dtype, BaseMaskedDtype):
105+
# error: "ExtensionArray" has no attribute "_data" [attr-defined]
106+
mask = np.isnan(arr._data) & ~arr.isna() # type: ignore[attr-defined,operator]
107+
return mask
108+
else:
109+
# error: "ExtensionArray" has no attribute "_pa_array" [attr-defined]
110+
import pyarrow.compute as pc
111+
112+
mask = pc.is_nan(arr._pa_array).fill_null(False).to_numpy() # type: ignore[attr-defined]
113+
return mask
114+
115+
elif arr.dtype.kind in "iu":
116+
# GH#51237
117+
mask = np.zeros(arr.shape, dtype=bool)
118+
return mask
119+
89120
if isna(value):
90121
return isna(arr)
91122

pandas/io/parsers/python_parser.py

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import numpy as np
2222

2323
from pandas._libs import lib
24+
from pandas._typing import Scalar
2425
from pandas.errors import (
2526
EmptyDataError,
2627
ParserError,
@@ -77,7 +78,6 @@
7778
ArrayLike,
7879
DtypeObj,
7980
ReadCsvBuffer,
80-
Scalar,
8181
T,
8282
)
8383

@@ -954,7 +954,9 @@ def _alert_malformed(self, msg: str, row_num: int) -> None:
954954
"""
955955
if self.on_bad_lines == self.BadLineHandleMethod.ERROR:
956956
raise ParserError(msg)
957-
if self.on_bad_lines == self.BadLineHandleMethod.WARN:
957+
if self.on_bad_lines == self.BadLineHandleMethod.WARN or callable(
958+
self.on_bad_lines
959+
):
958960
warnings.warn(
959961
f"Skipping line {row_num}: {msg}\n",
960962
ParserWarning,
@@ -1189,29 +1191,35 @@ def _rows_to_cols(self, content: list[list[Scalar]]) -> list[np.ndarray]:
11891191

11901192
for i, _content in iter_content:
11911193
actual_len = len(_content)
1192-
11931194
if actual_len > col_len:
11941195
if callable(self.on_bad_lines):
11951196
new_l = self.on_bad_lines(_content)
11961197
if new_l is not None:
1197-
content.append(new_l) # pyright: ignore[reportArgumentType]
1198+
new_l = cast(list[Scalar], new_l)
1199+
if len(new_l) > col_len:
1200+
row_num = self.pos - (content_len - i + footers)
1201+
bad_lines.append((row_num, len(new_l), "callable"))
1202+
new_l = new_l[:col_len]
1203+
content.append(new_l)
1204+
11981205
elif self.on_bad_lines in (
11991206
self.BadLineHandleMethod.ERROR,
12001207
self.BadLineHandleMethod.WARN,
12011208
):
12021209
row_num = self.pos - (content_len - i + footers)
1203-
bad_lines.append((row_num, actual_len))
1204-
1210+
bad_lines.append((row_num, actual_len, "normal"))
12051211
if self.on_bad_lines == self.BadLineHandleMethod.ERROR:
12061212
break
12071213
else:
12081214
content.append(_content)
12091215

1210-
for row_num, actual_len in bad_lines:
1216+
for row_num, actual_len, source in bad_lines:
12111217
msg = (
12121218
f"Expected {col_len} fields in line {row_num + 1}, saw {actual_len}"
12131219
)
1214-
if (
1220+
if source == "callable":
1221+
msg += " from bad_lines callable"
1222+
elif (
12151223
self.delimiter
12161224
and len(self.delimiter) > 1
12171225
and self.quoting != csv.QUOTE_NONE
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
from pathlib import Path
2+
3+
import numpy as np
4+
import pytest
5+
6+
from pandas.errors import Pandas4Warning
7+
8+
from pandas import (
9+
NA,
10+
ArrowDtype,
11+
Series,
12+
StringDtype,
13+
)
14+
import pandas._testing as tm
15+
16+
17+
def test_reversed_logical_ops(any_string_dtype):
18+
# GH#60234
19+
dtype = any_string_dtype
20+
warn = None if dtype == object else Pandas4Warning
21+
left = Series([True, False, False, True])
22+
right = Series(["", "", "b", "c"], dtype=dtype)
23+
24+
msg = "operations between boolean dtype and"
25+
with tm.assert_produces_warning(warn, match=msg):
26+
result = left | right
27+
expected = left | right.astype(bool)
28+
tm.assert_series_equal(result, expected)
29+
30+
with tm.assert_produces_warning(warn, match=msg):
31+
result = left & right
32+
expected = left & right.astype(bool)
33+
tm.assert_series_equal(result, expected)
34+
35+
with tm.assert_produces_warning(warn, match=msg):
36+
result = left ^ right
37+
expected = left ^ right.astype(bool)
38+
tm.assert_series_equal(result, expected)
39+
40+
41+
def test_pathlib_path_division(any_string_dtype, request):
42+
# GH#61940
43+
if any_string_dtype == object:
44+
mark = pytest.mark.xfail(
45+
reason="with NA present we go through _masked_arith_op which "
46+
"raises TypeError bc Path is not recognized by lib.is_scalar."
47+
)
48+
request.applymarker(mark)
49+
50+
item = Path("/Users/Irv/")
51+
ser = Series(["A", "B", NA], dtype=any_string_dtype)
52+
53+
result = item / ser
54+
expected = Series([item / "A", item / "B", ser.dtype.na_value], dtype=object)
55+
tm.assert_series_equal(result, expected)
56+
57+
result = ser / item
58+
expected = Series(["A" / item, "B" / item, ser.dtype.na_value], dtype=object)
59+
tm.assert_series_equal(result, expected)
60+
61+
62+
def test_mixed_object_comparison(any_string_dtype):
63+
# GH#60228
64+
dtype = any_string_dtype
65+
ser = Series(["a", "b"], dtype=dtype)
66+
67+
mixed = Series([1, "b"], dtype=object)
68+
69+
result = ser == mixed
70+
expected = Series([False, True], dtype=bool)
71+
if dtype == object:
72+
pass
73+
elif dtype.storage == "python" and dtype.na_value is NA:
74+
expected = expected.astype("boolean")
75+
elif dtype.storage == "pyarrow" and dtype.na_value is NA:
76+
expected = expected.astype("bool[pyarrow]")
77+
78+
tm.assert_series_equal(result, expected)
79+
80+
81+
def test_pyarrow_numpy_string_invalid():
82+
# GH#56008
83+
pa = pytest.importorskip("pyarrow")
84+
ser = Series([False, True])
85+
ser2 = Series(["a", "b"], dtype=StringDtype(na_value=np.nan))
86+
result = ser == ser2
87+
expected_eq = Series(False, index=ser.index)
88+
tm.assert_series_equal(result, expected_eq)
89+
90+
result = ser != ser2
91+
expected_ne = Series(True, index=ser.index)
92+
tm.assert_series_equal(result, expected_ne)
93+
94+
with pytest.raises(TypeError, match="Invalid comparison"):
95+
ser > ser2
96+
97+
# GH#59505
98+
ser3 = ser2.astype("string[pyarrow]")
99+
result3_eq = ser3 == ser
100+
tm.assert_series_equal(result3_eq, expected_eq.astype("bool[pyarrow]"))
101+
result3_ne = ser3 != ser
102+
tm.assert_series_equal(result3_ne, expected_ne.astype("bool[pyarrow]"))
103+
104+
with pytest.raises(TypeError, match="Invalid comparison"):
105+
ser > ser3
106+
107+
ser4 = ser2.astype(ArrowDtype(pa.string()))
108+
result4_eq = ser4 == ser
109+
tm.assert_series_equal(result4_eq, expected_eq.astype("bool[pyarrow]"))
110+
result4_ne = ser4 != ser
111+
tm.assert_series_equal(result4_ne, expected_ne.astype("bool[pyarrow]"))
112+
113+
with pytest.raises(TypeError, match="Invalid comparison"):
114+
ser > ser4

0 commit comments

Comments
 (0)