Skip to content

Commit 7281a18

Browse files
authored
Merge branch 'pandas-dev:main' into no-warning-in-set_index
2 parents fde5ed8 + 57fd502 commit 7281a18

File tree

24 files changed

+376
-145
lines changed

24 files changed

+376
-145
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ jobs:
153153
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
154154

155155
- name: Build wheels
156-
uses: pypa/cibuildwheel@v2.22.0
156+
uses: pypa/cibuildwheel@v2.23.0
157157
with:
158158
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
159159
env:

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
198198
obj = cast(str, obj) # Mypy complains without this!
199199
return obj.upper()
200200
201-
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
201+
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_). While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
202202

203203
.. code-block:: python
204204

doc/source/getting_started/intro_tutorials/03_subset_data.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ the name ``anonymous`` to the first 3 elements of the fourth column:
335335
.. ipython:: python
336336
337337
titanic.iloc[0:3, 3] = "anonymous"
338-
titanic.head()
338+
titanic.iloc[:5, 3]
339339
340340
.. raw:: html
341341

doc/source/reference/arrays.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ is an :class:`ArrowDtype`.
6161
support as NumPy including first-class nullability support for all data types, immutability and more.
6262

6363
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
64-
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
64+
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``.
6565

6666
=============================================== ========================== ===================
6767
PyArrow type pandas extension type NumPy type
@@ -114,7 +114,7 @@ values.
114114

115115
ArrowDtype
116116

117-
For more information, please see the :ref:`PyArrow user guide <pyarrow>`
117+
For more information, please see the :ref:`PyArrow user guide <pyarrow>`.
118118

119119
.. _api.arrays.datetime:
120120

@@ -495,7 +495,7 @@ a :class:`CategoricalDtype`.
495495
CategoricalDtype.categories
496496
CategoricalDtype.ordered
497497

498-
Categorical data can be stored in a :class:`pandas.Categorical`
498+
Categorical data can be stored in a :class:`pandas.Categorical`:
499499

500500
.. autosummary::
501501
:toctree: api/

doc/source/user_guide/text.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Text data types
1313

1414
There are two ways to store text data in pandas:
1515

16-
1. ``object`` -dtype NumPy array.
16+
1. ``object`` dtype NumPy array.
1717
2. :class:`StringDtype` extension type.
1818

1919
We recommend using :class:`StringDtype` to store text data.
@@ -40,20 +40,20 @@ to significantly increase the performance and lower the memory overhead of
4040
and parts of the API may change without warning.
4141

4242
For backwards-compatibility, ``object`` dtype remains the default type we
43-
infer a list of strings to
43+
infer a list of strings to:
4444

4545
.. ipython:: python
4646
4747
pd.Series(["a", "b", "c"])
4848
49-
To explicitly request ``string`` dtype, specify the ``dtype``
49+
To explicitly request ``string`` dtype, specify the ``dtype``:
5050

5151
.. ipython:: python
5252
5353
pd.Series(["a", "b", "c"], dtype="string")
5454
pd.Series(["a", "b", "c"], dtype=pd.StringDtype())
5555
56-
Or ``astype`` after the ``Series`` or ``DataFrame`` is created
56+
Or ``astype`` after the ``Series`` or ``DataFrame`` is created:
5757

5858
.. ipython:: python
5959
@@ -88,7 +88,7 @@ Behavior differences
8888
^^^^^^^^^^^^^^^^^^^^
8989

9090
These are places where the behavior of ``StringDtype`` objects differ from
91-
``object`` dtype
91+
``object`` dtype:
9292

9393
l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
9494
that return **numeric** output will always return a nullable integer dtype,
@@ -102,7 +102,7 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
102102
s.str.count("a")
103103
s.dropna().str.count("a")
104104
105-
Both outputs are ``Int64`` dtype. Compare that with object-dtype
105+
Both outputs are ``Int64`` dtype. Compare that with object-dtype:
106106

107107
.. ipython:: python
108108

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,7 +614,9 @@ Performance improvements
614614
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
615615
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
616616
- Performance improvement in :meth:`CategoricalDtype.update_dtype` when ``dtype`` is a :class:`CategoricalDtype` with non ``None`` categories and ordered (:issue:`59647`)
617+
- Performance improvement in :meth:`DataFrame.__getitem__` when ``key`` is a :class:`DataFrame` with many columns (:issue:`61010`)
617618
- Performance improvement in :meth:`DataFrame.astype` when converting to extension floating dtypes, e.g. "Float64" (:issue:`60066`)
619+
- Performance improvement in :meth:`DataFrame.where` when ``cond`` is a :class:`DataFrame` with many columns (:issue:`61010`)
618620
- Performance improvement in :meth:`to_hdf` avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (:issue:`58248`)
619621
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
620622
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)
@@ -647,6 +649,7 @@ Datetimelike
647649
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
648650
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
649651
- Bug in :meth:`to_datetime` not respecting dayfirst if an uncommon date string was passed. (:issue:`58859`)
652+
- Bug in :meth:`to_datetime` on float array with missing values throwing ``FloatingPointError`` (:issue:`58419`)
650653
- Bug in :meth:`to_datetime` on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (:issue:`60506`)
651654
- Bug in :meth:`to_datetime` reports incorrect index in case of any failure scenario. (:issue:`58298`)
652655
- Bug in :meth:`to_datetime` wrongly converts when ``arg`` is a ``np.datetime64`` object with unit of ``ps``. (:issue:`60341`)
@@ -690,6 +693,7 @@ Indexing
690693
^^^^^^^^
691694
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
692695
- Bug in :meth:`DataFrame.from_records` throwing a ``ValueError`` when passed an empty list in ``index`` (:issue:`58594`)
696+
- Bug in :meth:`DataFrame.loc` with inconsistent behavior of loc-set with 2 given indexes to Series (:issue:`59933`)
693697
- Bug in :meth:`MultiIndex.insert` when a new value inserted to a datetime-like level gets cast to ``NaT`` and fails indexing (:issue:`60388`)
694698
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
695699

@@ -705,6 +709,7 @@ MultiIndex
705709
- :meth:`MultiIndex.insert` would not insert NA value correctly at unified location of index -1 (:issue:`59003`)
706710
- :func:`MultiIndex.get_level_values` accessing a :class:`DatetimeIndex` does not carry the frequency attribute along (:issue:`58327`, :issue:`57949`)
707711
- Bug in :class:`DataFrame` arithmetic operations in case of unaligned MultiIndex columns (:issue:`60498`)
712+
- Bug in :class:`DataFrame` arithmetic operations with :class:`Series` in case of unaligned MultiIndex (:issue:`61009`)
708713
-
709714

710715
I/O
@@ -790,6 +795,7 @@ ExtensionArray
790795
^^^^^^^^^^^^^^
791796
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
792797
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
798+
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
793799
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
794800
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
795801
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Autogenerated file containing Cython compile-time defines
2+
3+
DEF CYTHON_COMPATIBLE_WITH_FREE_THREADING = @freethreading_compatible@

pandas/_libs/internals.pyx

Lines changed: 56 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,7 @@ cimport cython
44
from cpython.object cimport PyObject
55
from cpython.pyport cimport PY_SSIZE_T_MAX
66
from cpython.slice cimport PySlice_GetIndicesEx
7-
from cpython.weakref cimport (
8-
PyWeakref_GetObject,
9-
PyWeakref_NewRef,
10-
)
7+
from cpython.weakref cimport PyWeakref_NewRef
118
from cython cimport Py_ssize_t
129

1310
import numpy as np
@@ -29,6 +26,14 @@ from pandas._libs.util cimport (
2926
is_integer_object,
3027
)
3128

29+
include "free_threading_config.pxi"
30+
31+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
32+
from cpython.ref cimport Py_DECREF
33+
from cpython.weakref cimport PyWeakref_GetRef
34+
ELSE:
35+
from cpython.weakref cimport PyWeakref_GetObject
36+
3237

3338
cdef extern from "Python.h":
3439
PyObject* Py_None
@@ -908,17 +913,37 @@ cdef class BlockValuesRefs:
908913
# if force=False. Clearing for every insertion causes slowdowns if
909914
# all these objects stay alive, e.g. df.items() for wide DataFrames
910915
# see GH#55245 and GH#55008
916+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
917+
cdef PyObject* pobj
918+
cdef bint status
919+
911920
if force or len(self.referenced_blocks) > self.clear_counter:
912-
self.referenced_blocks = [
913-
ref for ref in self.referenced_blocks
914-
if PyWeakref_GetObject(ref) != Py_None
915-
]
921+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
922+
new_referenced_blocks = []
923+
for ref in self.referenced_blocks:
924+
status = PyWeakref_GetRef(ref, &pobj)
925+
if status == -1:
926+
return
927+
elif status == 1:
928+
new_referenced_blocks.append(ref)
929+
Py_DECREF(<object>pobj)
930+
self.referenced_blocks = new_referenced_blocks
931+
ELSE:
932+
self.referenced_blocks = [
933+
ref for ref in self.referenced_blocks
934+
if PyWeakref_GetObject(ref) != Py_None
935+
]
936+
916937
nr_of_refs = len(self.referenced_blocks)
917938
if nr_of_refs < self.clear_counter // 2:
918939
self.clear_counter = max(self.clear_counter // 2, 500)
919940
elif nr_of_refs > self.clear_counter:
920941
self.clear_counter = max(self.clear_counter * 2, nr_of_refs)
921942

943+
cpdef _add_reference_maybe_locked(self, Block blk):
944+
self._clear_dead_references()
945+
self.referenced_blocks.append(PyWeakref_NewRef(blk, None))
946+
922947
cpdef add_reference(self, Block blk):
923948
"""Adds a new reference to our reference collection.
924949
@@ -927,8 +952,15 @@ cdef class BlockValuesRefs:
927952
blk : Block
928953
The block that the new references should point to.
929954
"""
955+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
956+
with cython.critical_section(self):
957+
self._add_reference_maybe_locked(blk)
958+
ELSE:
959+
self._add_reference_maybe_locked(blk)
960+
961+
def _add_index_reference_maybe_locked(self, index: object) -> None:
930962
self._clear_dead_references()
931-
self.referenced_blocks.append(PyWeakref_NewRef(blk, None))
963+
self.referenced_blocks.append(PyWeakref_NewRef(index, None))
932964

933965
def add_index_reference(self, index: object) -> None:
934966
"""Adds a new reference to our reference collection when creating an index.
@@ -938,8 +970,16 @@ cdef class BlockValuesRefs:
938970
index : Index
939971
The index that the new reference should point to.
940972
"""
941-
self._clear_dead_references()
942-
self.referenced_blocks.append(PyWeakref_NewRef(index, None))
973+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
974+
with cython.critical_section(self):
975+
self._add_index_reference_maybe_locked(index)
976+
ELSE:
977+
self._add_index_reference_maybe_locked(index)
978+
979+
def _has_reference_maybe_locked(self) -> bool:
980+
self._clear_dead_references(force=True)
981+
# Checking for more references than block pointing to itself
982+
return len(self.referenced_blocks) > 1
943983

944984
def has_reference(self) -> bool:
945985
"""Checks if block has foreign references.
@@ -951,6 +991,8 @@ cdef class BlockValuesRefs:
951991
-------
952992
bool
953993
"""
954-
self._clear_dead_references(force=True)
955-
# Checking for more references than block pointing to itself
956-
return len(self.referenced_blocks) > 1
994+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
995+
with cython.critical_section(self):
996+
return self._has_reference_maybe_locked()
997+
ELSE:
998+
return self._has_reference_maybe_locked()

pandas/_libs/meson.build

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,19 @@ _khash_primitive_helper_dep = declare_dependency(
5050
sources: _khash_primitive_helper,
5151
)
5252

53+
cdata = configuration_data()
54+
if cy.version().version_compare('>=3.1.0')
55+
cdata.set('freethreading_compatible', '1')
56+
else
57+
cdata.set('freethreading_compatible', '0')
58+
endif
59+
_free_threading_config = configure_file(
60+
input: 'free_threading_config.pxi.in',
61+
output: 'free_threading_config.pxi',
62+
configuration: cdata,
63+
install: false,
64+
)
65+
5366
subdir('tslibs')
5467

5568
libs_sources = {

pandas/_libs/tslibs/conversion.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ def cast_from_unit_vectorized(
137137

138138
out = np.empty(shape, dtype="i8")
139139
base = np.empty(shape, dtype="i8")
140-
frac = np.empty(shape, dtype="f8")
140+
frac = np.zeros(shape, dtype="f8")
141141

142142
for i in range(len(values)):
143143
if is_nan(values[i]):

0 commit comments

Comments
 (0)