Skip to content

Commit 7fea8d7

Browse files
committed
Merge branch 'main' into fix-difference-between-PeriodIndex-and-Index
2 parents f1a7363 + 5da9eb7 commit 7fea8d7

File tree

17 files changed

+228
-132
lines changed

17 files changed

+228
-132
lines changed

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
198198
obj = cast(str, obj) # Mypy complains without this!
199199
return obj.upper()
200200
201-
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
201+
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_). While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
202202

203203
.. code-block:: python
204204

doc/source/getting_started/intro_tutorials/03_subset_data.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ the name ``anonymous`` to the first 3 elements of the fourth column:
335335
.. ipython:: python
336336
337337
titanic.iloc[0:3, 3] = "anonymous"
338-
titanic.head()
338+
titanic.iloc[:5, 3]
339339
340340
.. raw:: html
341341

doc/source/reference/arrays.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ is an :class:`ArrowDtype`.
6161
support as NumPy including first-class nullability support for all data types, immutability and more.
6262

6363
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
64-
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
64+
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``.
6565

6666
=============================================== ========================== ===================
6767
PyArrow type pandas extension type NumPy type
@@ -114,7 +114,7 @@ values.
114114

115115
ArrowDtype
116116

117-
For more information, please see the :ref:`PyArrow user guide <pyarrow>`
117+
For more information, please see the :ref:`PyArrow user guide <pyarrow>`.
118118

119119
.. _api.arrays.datetime:
120120

@@ -495,7 +495,7 @@ a :class:`CategoricalDtype`.
495495
CategoricalDtype.categories
496496
CategoricalDtype.ordered
497497

498-
Categorical data can be stored in a :class:`pandas.Categorical`
498+
Categorical data can be stored in a :class:`pandas.Categorical`:
499499

500500
.. autosummary::
501501
:toctree: api/

doc/source/user_guide/text.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Text data types
1313

1414
There are two ways to store text data in pandas:
1515

16-
1. ``object`` -dtype NumPy array.
16+
1. ``object`` dtype NumPy array.
1717
2. :class:`StringDtype` extension type.
1818

1919
We recommend using :class:`StringDtype` to store text data.
@@ -40,20 +40,20 @@ to significantly increase the performance and lower the memory overhead of
4040
and parts of the API may change without warning.
4141

4242
For backwards-compatibility, ``object`` dtype remains the default type we
43-
infer a list of strings to
43+
infer a list of strings to:
4444

4545
.. ipython:: python
4646
4747
pd.Series(["a", "b", "c"])
4848
49-
To explicitly request ``string`` dtype, specify the ``dtype``
49+
To explicitly request ``string`` dtype, specify the ``dtype``:
5050

5151
.. ipython:: python
5252
5353
pd.Series(["a", "b", "c"], dtype="string")
5454
pd.Series(["a", "b", "c"], dtype=pd.StringDtype())
5555
56-
Or ``astype`` after the ``Series`` or ``DataFrame`` is created
56+
Or ``astype`` after the ``Series`` or ``DataFrame`` is created:
5757

5858
.. ipython:: python
5959
@@ -88,7 +88,7 @@ Behavior differences
8888
^^^^^^^^^^^^^^^^^^^^
8989

9090
These are places where the behavior of ``StringDtype`` objects differ from
91-
``object`` dtype
91+
``object`` dtype:
9292

9393
l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
9494
that return **numeric** output will always return a nullable integer dtype,
@@ -102,7 +102,7 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
102102
s.str.count("a")
103103
s.dropna().str.count("a")
104104
105-
Both outputs are ``Int64`` dtype. Compare that with object-dtype
105+
Both outputs are ``Int64`` dtype. Compare that with object-dtype:
106106

107107
.. ipython:: python
108108

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -791,6 +791,7 @@ ExtensionArray
791791
^^^^^^^^^^^^^^
792792
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
793793
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
794+
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
794795
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
795796
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
796797
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Autogenerated file containing Cython compile-time defines
2+
3+
DEF CYTHON_COMPATIBLE_WITH_FREE_THREADING = @freethreading_compatible@

pandas/_libs/internals.pyx

Lines changed: 56 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,7 @@ cimport cython
44
from cpython.object cimport PyObject
55
from cpython.pyport cimport PY_SSIZE_T_MAX
66
from cpython.slice cimport PySlice_GetIndicesEx
7-
from cpython.weakref cimport (
8-
PyWeakref_GetObject,
9-
PyWeakref_NewRef,
10-
)
7+
from cpython.weakref cimport PyWeakref_NewRef
118
from cython cimport Py_ssize_t
129

1310
import numpy as np
@@ -29,6 +26,14 @@ from pandas._libs.util cimport (
2926
is_integer_object,
3027
)
3128

29+
include "free_threading_config.pxi"
30+
31+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
32+
from cpython.ref cimport Py_DECREF
33+
from cpython.weakref cimport PyWeakref_GetRef
34+
ELSE:
35+
from cpython.weakref cimport PyWeakref_GetObject
36+
3237

3338
cdef extern from "Python.h":
3439
PyObject* Py_None
@@ -908,17 +913,37 @@ cdef class BlockValuesRefs:
908913
# if force=False. Clearing for every insertion causes slowdowns if
909914
# all these objects stay alive, e.g. df.items() for wide DataFrames
910915
# see GH#55245 and GH#55008
916+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
917+
cdef PyObject* pobj
918+
cdef bint status
919+
911920
if force or len(self.referenced_blocks) > self.clear_counter:
912-
self.referenced_blocks = [
913-
ref for ref in self.referenced_blocks
914-
if PyWeakref_GetObject(ref) != Py_None
915-
]
921+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
922+
new_referenced_blocks = []
923+
for ref in self.referenced_blocks:
924+
status = PyWeakref_GetRef(ref, &pobj)
925+
if status == -1:
926+
return
927+
elif status == 1:
928+
new_referenced_blocks.append(ref)
929+
Py_DECREF(<object>pobj)
930+
self.referenced_blocks = new_referenced_blocks
931+
ELSE:
932+
self.referenced_blocks = [
933+
ref for ref in self.referenced_blocks
934+
if PyWeakref_GetObject(ref) != Py_None
935+
]
936+
916937
nr_of_refs = len(self.referenced_blocks)
917938
if nr_of_refs < self.clear_counter // 2:
918939
self.clear_counter = max(self.clear_counter // 2, 500)
919940
elif nr_of_refs > self.clear_counter:
920941
self.clear_counter = max(self.clear_counter * 2, nr_of_refs)
921942

943+
cpdef _add_reference_maybe_locked(self, Block blk):
944+
self._clear_dead_references()
945+
self.referenced_blocks.append(PyWeakref_NewRef(blk, None))
946+
922947
cpdef add_reference(self, Block blk):
923948
"""Adds a new reference to our reference collection.
924949
@@ -927,8 +952,15 @@ cdef class BlockValuesRefs:
927952
blk : Block
928953
The block that the new references should point to.
929954
"""
955+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
956+
with cython.critical_section(self):
957+
self._add_reference_maybe_locked(blk)
958+
ELSE:
959+
self._add_reference_maybe_locked(blk)
960+
961+
def _add_index_reference_maybe_locked(self, index: object) -> None:
930962
self._clear_dead_references()
931-
self.referenced_blocks.append(PyWeakref_NewRef(blk, None))
963+
self.referenced_blocks.append(PyWeakref_NewRef(index, None))
932964

933965
def add_index_reference(self, index: object) -> None:
934966
"""Adds a new reference to our reference collection when creating an index.
@@ -938,8 +970,16 @@ cdef class BlockValuesRefs:
938970
index : Index
939971
The index that the new reference should point to.
940972
"""
941-
self._clear_dead_references()
942-
self.referenced_blocks.append(PyWeakref_NewRef(index, None))
973+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
974+
with cython.critical_section(self):
975+
self._add_index_reference_maybe_locked(index)
976+
ELSE:
977+
self._add_index_reference_maybe_locked(index)
978+
979+
def _has_reference_maybe_locked(self) -> bool:
980+
self._clear_dead_references(force=True)
981+
# Checking for more references than block pointing to itself
982+
return len(self.referenced_blocks) > 1
943983

944984
def has_reference(self) -> bool:
945985
"""Checks if block has foreign references.
@@ -951,6 +991,8 @@ cdef class BlockValuesRefs:
951991
-------
952992
bool
953993
"""
954-
self._clear_dead_references(force=True)
955-
# Checking for more references than block pointing to itself
956-
return len(self.referenced_blocks) > 1
994+
IF CYTHON_COMPATIBLE_WITH_FREE_THREADING:
995+
with cython.critical_section(self):
996+
return self._has_reference_maybe_locked()
997+
ELSE:
998+
return self._has_reference_maybe_locked()

pandas/_libs/meson.build

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,19 @@ _khash_primitive_helper_dep = declare_dependency(
5050
sources: _khash_primitive_helper,
5151
)
5252

53+
cdata = configuration_data()
54+
if cy.version().version_compare('>=3.1.0')
55+
cdata.set('freethreading_compatible', '1')
56+
else
57+
cdata.set('freethreading_compatible', '0')
58+
endif
59+
_free_threading_config = configure_file(
60+
input: 'free_threading_config.pxi.in',
61+
output: 'free_threading_config.pxi',
62+
configuration: cdata,
63+
install: false,
64+
)
65+
5366
subdir('tslibs')
5467

5568
libs_sources = {

pandas/core/arrays/arrow/array.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1208,7 +1208,12 @@ def factorize(
12081208
data = data.cast(pa.int64())
12091209

12101210
if pa.types.is_dictionary(data.type):
1211-
encoded = data
1211+
if null_encoding == "encode":
1212+
# dictionary encode does nothing if an already encoded array is given
1213+
data = data.cast(data.type.value_type)
1214+
encoded = data.dictionary_encode(null_encoding=null_encoding)
1215+
else:
1216+
encoded = data
12121217
else:
12131218
encoded = data.dictionary_encode(null_encoding=null_encoding)
12141219
if encoded.length() == 0:

pandas/core/dtypes/dtypes.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1156,7 +1156,7 @@ def __ne__(self, other: object) -> bool:
11561156
@classmethod
11571157
def is_dtype(cls, dtype: object) -> bool:
11581158
"""
1159-
Return a boolean if we if the passed type is an actual dtype that we
1159+
Return a boolean if the passed type is an actual dtype that we
11601160
can match (via string or type)
11611161
"""
11621162
if isinstance(dtype, str):
@@ -1436,7 +1436,7 @@ def __setstate__(self, state) -> None:
14361436
@classmethod
14371437
def is_dtype(cls, dtype: object) -> bool:
14381438
"""
1439-
Return a boolean if we if the passed type is an actual dtype that we
1439+
Return a boolean if the passed type is an actual dtype that we
14401440
can match (via string or type)
14411441
"""
14421442
if isinstance(dtype, str):

0 commit comments

Comments
 (0)