Skip to content

Commit b8ce7b7

Browse files
authored
Merge branch 'main' into skonda29-issue-61311
2 parents be85d35 + 36b8f20 commit b8ce7b7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+581
-439
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/cibuildwheel@v2.23.3
165+
uses: pypa/cibuildwheel@v3.1.1
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ details, see the commit logs at https://github.com/pandas-dev/pandas.
115115
## Dependencies
116116
- [NumPy - Adds support for large, multi-dimensional arrays, matrices and high-level mathematical functions to operate on these arrays](https://www.numpy.org)
117117
- [python-dateutil - Provides powerful extensions to the standard datetime module](https://dateutil.readthedocs.io/en/stable/index.html)
118-
- [pytz - Brings the Olson tz database into Python which allows accurate and cross platform timezone calculations](https://github.com/stub42/pytz)
118+
- [tzdata - Provides an IANA time zone database](https://tzdata.readthedocs.io/en/latest/)
119119

120120
See the [full installation instructions](https://pandas.pydata.org/pandas-docs/stable/install.html#dependencies) for minimum supported versions of required, recommended and optional dependencies.
121121

ci/code_checks.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
5858

5959
MSG='Python and Cython Doctests' ; echo "$MSG"
6060
python -c 'import pandas as pd; pd.test(run_doctests=True)'
61-
# TEMP don't let doctests fail the build until all string dtype changes are fixed
62-
# RET=$(($RET + $?)) ; echo "$MSG" "DONE"
63-
echo "$MSG" "DONE"
61+
RET=$(($RET + $?)) ; echo "$MSG" "DONE"
6462

6563
fi
6664

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,7 @@ dependencies:
5050
- pytz>=2023.4
5151
- pyxlsb>=1.0.10
5252
- s3fs>=2023.12.2
53-
# TEMP upper pin for scipy (https://github.com/statsmodels/statsmodels/issues/9584)
54-
- scipy>=1.12.0,<1.16
53+
- scipy>=1.12.0
5554
- sqlalchemy>=2.0.0
5655
- tabulate>=0.9.0
5756
- xarray>=2024.1.1

doc/source/user_guide/migration-3-strings.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,17 @@ through the ``str`` accessor will work the same:
118118
Overview of behavior differences and how to address them
119119
---------------------------------------------------------
120120

121-
The dtype is no longer object dtype
122-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
121+
The dtype is no longer a numpy "object" dtype
122+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123123

124124
When inferring or reading string data, the data type of the resulting DataFrame
125125
column or Series will silently start being the new ``"str"`` dtype instead of
126-
``"object"`` dtype, and this can have some impact on your code.
126+
the numpy ``"object"`` dtype, and this can have some impact on your code.
127+
128+
The new string dtype is a pandas data type ("extension dtype"), and no longer a
129+
numpy ``np.dtype`` instance. Therefore, passing the dtype of a string column to
130+
numpy functions will no longer work (e.g. passing it to a ``dtype=`` argument
131+
of a numpy function, or using ``np.issubdtype`` to check the dtype).
127132

128133
Checking the dtype
129134
^^^^^^^^^^^^^^^^^^

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Version 2.3
2424
.. toctree::
2525
:maxdepth: 2
2626

27+
v2.3.2
2728
v2.3.1
2829
v2.3.0
2930

doc/source/whatsnew/v2.3.2.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.. _whatsnew_232:
2+
3+
What's new in 2.3.2 (August XX, 2025)
4+
-------------------------------------
5+
6+
These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog
7+
including other versions of pandas.
8+
9+
{{ header }}
10+
11+
.. ---------------------------------------------------------------------------
12+
.. _whatsnew_232.string_fixes:
13+
14+
Improvements and fixes for the StringDtype
15+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16+
17+
Most changes in this release are related to :class:`StringDtype` which will
18+
become the default string dtype in pandas 3.0. See
19+
:ref:`whatsnew_230.upcoming_changes` for more details.
20+
21+
.. _whatsnew_232.string_fixes.bugs:
22+
23+
Bug fixes
24+
^^^^^^^^^
25+
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
26+
"string" type in the JSON Table Schema for :class:`StringDtype` columns
27+
(:issue:`61889`)
28+
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
29+
30+
.. ---------------------------------------------------------------------------
31+
.. _whatsnew_232.contributors:
32+
33+
Contributors
34+
~~~~~~~~~~~~

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,6 +731,7 @@ Timezones
731731

732732
Numeric
733733
^^^^^^^
734+
- Bug in :func:`api.types.infer_dtype` returning "mixed" for complex and ``pd.NA`` mix (:issue:`61976`)
734735
- Bug in :func:`api.types.infer_dtype` returning "mixed-integer-float" for float and ``pd.NA`` mix (:issue:`61621`)
735736
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
736737
- Bug in :meth:`DataFrame.cov` raises a ``TypeError`` instead of returning potentially incorrect results or other errors (:issue:`53115`)
@@ -851,6 +852,7 @@ Groupby/resample/rolling
851852
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
852853
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` were not keeping the index name when the index had :class:`ArrowDtype` timestamp dtype (:issue:`61222`)
853854
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
855+
- Bug in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` that was returning numpy dtype values when input values are pyarrow dtype values, instead of returning pyarrow dtype values. (:issue:`53030`)
854856
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
855857
- Bug in :meth:`DataFrameGroupBy.agg` where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (:issue:`61503`)
856858
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
@@ -940,6 +942,7 @@ Other
940942
- Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers' associated dtype, for string and datetime columns (:issue:`54781`)
941943
- Bug in ``Series.list`` methods not preserving the original :class:`Index`. (:issue:`58425`)
942944
- Bug in ``Series.list`` methods not preserving the original name. (:issue:`60522`)
945+
- Bug in ``Series.replace`` when the Series was created from an :class:`Index` and Copy-On-Write is enabled (:issue:`61622`)
943946
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
944947
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
945948
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)

pandas/_libs/lib.pyx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1974,9 +1974,11 @@ cdef class ComplexValidator(Validator):
19741974
return cnp.PyDataType_ISCOMPLEX(self.dtype)
19751975

19761976

1977-
cdef bint is_complex_array(ndarray values):
1977+
cdef bint is_complex_array(ndarray values, bint skipna=True):
19781978
cdef:
1979-
ComplexValidator validator = ComplexValidator(values.size, values.dtype)
1979+
ComplexValidator validator = ComplexValidator(values.size,
1980+
values.dtype,
1981+
skipna=skipna)
19801982
return validator.validate(values)
19811983

19821984

pandas/core/algorithms.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -391,11 +391,11 @@ def unique(values):
391391
392392
>>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
393393
['b', 'a', 'c']
394-
Categories (3, object): ['a', 'b', 'c']
394+
Categories (3, str): ['a', 'b', 'c']
395395
396396
>>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
397397
['b', 'a', 'c']
398-
Categories (3, object): ['a', 'b', 'c']
398+
Categories (3, str): ['a', 'b', 'c']
399399
400400
An ordered Categorical preserves the category ordering.
401401
@@ -405,7 +405,7 @@ def unique(values):
405405
... )
406406
... )
407407
['b', 'a', 'c']
408-
Categories (3, object): ['a' < 'b' < 'c']
408+
Categories (3, str): ['a' < 'b' < 'c']
409409
410410
An array of tuples
411411
@@ -751,7 +751,7 @@ def factorize(
751751
array([0, 0, 1])
752752
>>> uniques
753753
['a', 'c']
754-
Categories (3, object): ['a', 'b', 'c']
754+
Categories (3, str): ['a', 'b', 'c']
755755
756756
Notice that ``'b'`` is in ``uniques.categories``, despite not being
757757
present in ``cat.values``.
@@ -764,7 +764,7 @@ def factorize(
764764
>>> codes
765765
array([0, 0, 1])
766766
>>> uniques
767-
Index(['a', 'c'], dtype='object')
767+
Index(['a', 'c'], dtype='str')
768768
769769
If NaN is in the values, and we want to include NaN in the uniques of the
770770
values, it can be achieved by setting ``use_na_sentinel=False``.

0 commit comments

Comments
 (0)