Skip to content

Commit 449313e

Browse files
authored
Merge branch 'pandas-dev:main' into main
2 parents efbb2ce + a5ac798 commit 449313e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+904
-442
lines changed

.github/workflows/comment-commands.yml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,6 @@ permissions:
99
pull-requests: write
1010

1111
jobs:
12-
issue_assign:
13-
runs-on: ubuntu-24.04
14-
if: (!github.event.issue.pull_request) && github.event.comment.body == 'take'
15-
concurrency:
16-
group: ${{ github.actor }}-issue-assign
17-
steps:
18-
- run: |
19-
echo "Assigning issue ${{ github.event.issue.number }} to ${{ github.event.comment.user.login }}"
20-
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -d '{"assignees": ["${{ github.event.comment.user.login }}"]}' https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.issue.number }}/assignees
2112
preview_docs:
2213
runs-on: ubuntu-24.04
2314
if: github.event.issue.pull_request && github.event.comment.body == '/preview'

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.12.11
22+
rev: v0.13.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.16.7
49+
rev: v0.17.0
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,7 +67,7 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.0.1
70+
rev: 6.1.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
@@ -92,14 +92,14 @@ repos:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]
9494
- repo: https://github.com/pre-commit/mirrors-clang-format
95-
rev: v21.1.0
95+
rev: v21.1.2
9696
hooks:
9797
- id: clang-format
9898
files: ^pandas/_libs/src|^pandas/_libs/include
9999
args: [-i]
100100
types_or: [c, c++]
101101
- repo: https://github.com/trim21/pre-commit-mirror-meson
102-
rev: v1.9.0
102+
rev: v1.9.1
103103
hooks:
104104
- id: meson-fmt
105105
args: ['--inplace']

doc/source/development/contributing.rst

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,16 +36,14 @@ and `good first issue
3636
<https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+sort%3Aupdated-desc+label%3A%22good+first+issue%22+no%3Aassignee>`_
3737
are typically good for newer contributors.
3838

39-
Once you've found an interesting issue, it's a good idea to assign the issue to yourself,
40-
so nobody else duplicates the work on it. On the Github issue, a comment with the exact
41-
text ``take`` to automatically assign you the issue
42-
(this will take seconds and may require refreshing the page to see it).
39+
Once you've found an interesting issue, leave a comment with your intention
40+
to start working on it. If somebody else has
41+
already commented on issue but they have shown a lack of activity in the issue
42+
or a pull request in the past 2-3 weeks, you may take it over.
4343

4444
If for whatever reason you are not able to continue working with the issue, please
45-
unassign it, so other people know it's available again. You can check the list of
46-
assigned issues, since people may not be working in them anymore. If you want to work on one
47-
that is assigned, feel free to kindly ask the current assignee if you can take it
48-
(please allow at least a week of inactivity before considering work in the issue discontinued).
45+
leave a comment on an issue, so other people know it's available again. You can check the list of
46+
assigned issues, since people may not be working in them anymore.
4947

5048
We have several :ref:`contributor community <community>` communication channels, which you are
5149
welcome to join, and ask questions as you figure things out. Among them are regular meetings for

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -948,6 +948,7 @@ Datetimelike
948948
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
949949
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
950950
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
951+
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
951952
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
952953
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
953954
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
@@ -973,6 +974,7 @@ Datetimelike
973974
- Bug in retaining frequency in :meth:`value_counts` specifically for :meth:`DatetimeIndex` and :meth:`TimedeltaIndex` (:issue:`33830`)
974975
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
975976

977+
976978
Timedelta
977979
^^^^^^^^^
978980
- Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
@@ -1081,6 +1083,7 @@ I/O
10811083
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)
10821084
- Bug in :meth:`read_csv` where the order of the ``na_values`` makes an inconsistency when ``na_values`` is a list non-string values. (:issue:`59303`)
10831085
- Bug in :meth:`read_csv` with ``engine="c"`` reading big integers as strings. Now reads them as python integers. (:issue:`51295`)
1086+
- Bug in :meth:`read_csv` with ``engine="c"`` reading large float numbers with preceding integers as strings. Now reads them as floats. (:issue:`51295`)
10841087
- Bug in :meth:`read_csv` with ``engine="pyarrow"`` and ``dtype="Int64"`` losing precision (:issue:`56136`)
10851088
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)
10861089
- Bug in :meth:`read_html` where ``rowspan`` in header row causes incorrect conversion to ``DataFrame``. (:issue:`60210`)

pandas/_libs/parsers.pyx

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1070,6 +1070,10 @@ cdef class TextReader:
10701070
else:
10711071
col_res = None
10721072
for dt in self.dtype_cast_order:
1073+
if (dt.kind in "iu" and
1074+
self._column_has_float(i, start, end, na_filter, na_hashset)):
1075+
continue
1076+
10731077
try:
10741078
col_res, na_count = self._convert_with_dtype(
10751079
dt, i, start, end, na_filter, 0, na_hashset, na_fset)
@@ -1347,6 +1351,58 @@ cdef class TextReader:
13471351
else:
13481352
return None
13491353

1354+
cdef bint _column_has_float(self, Py_ssize_t col,
1355+
int64_t start, int64_t end,
1356+
bint na_filter, kh_str_starts_t *na_hashset):
1357+
"""Check if the column contains any float number."""
1358+
cdef:
1359+
Py_ssize_t i, j, lines = end - start
1360+
coliter_t it
1361+
const char *word = NULL
1362+
const char *ignored_chars = " +-"
1363+
const char *digits = "0123456789"
1364+
const char *float_indicating_chars = "eE"
1365+
char null_byte = 0
1366+
1367+
coliter_setup(&it, self.parser, col, start)
1368+
1369+
for i in range(lines):
1370+
COLITER_NEXT(it, word)
1371+
1372+
if na_filter and kh_get_str_starts_item(na_hashset, word):
1373+
continue
1374+
1375+
found_first_digit = False
1376+
j = 0
1377+
while word[j] != null_byte:
1378+
if word[j] == self.parser.decimal:
1379+
return True
1380+
elif not found_first_digit and word[j] in ignored_chars:
1381+
# no-op
1382+
pass
1383+
elif not found_first_digit and word[j] not in digits:
1384+
# word isn't numeric
1385+
return False
1386+
elif not found_first_digit and word[j] in digits:
1387+
found_first_digit = True
1388+
elif word[j] in float_indicating_chars:
1389+
# preceding chars indicates numeric and
1390+
# current char indicates float
1391+
return True
1392+
elif word[j] not in digits:
1393+
# previous characters indicates numeric
1394+
# current character shows otherwise
1395+
return False
1396+
elif word[j] in digits:
1397+
# no-op
1398+
pass
1399+
else:
1400+
raise AssertionError(
1401+
f"Unhandled case {word[j]=} {found_first_digit=}"
1402+
)
1403+
j += 1
1404+
1405+
return False
13501406

13511407
# Factor out code common to TextReader.__dealloc__ and TextReader.close
13521408
# It cannot be a class method, since calling self.close() in __dealloc__

pandas/_libs/tslibs/strptime.pyx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,11 @@ def array_strptime(
405405
if len(val) == 0 or val in nat_strings:
406406
iresult[i] = NPY_NAT
407407
continue
408+
elif type(val) is not str:
409+
# GH#60933: normalize string subclasses
410+
# (e.g. lxml.etree._ElementUnicodeResult). The downstream Cython
411+
# path expects an exact `str`, so ensure we pass a plain str
412+
val = str(val)
408413
elif checknull_with_nat_and_na(val):
409414
iresult[i] = NPY_NAT
410415
continue

pandas/core/arrays/arrow/array.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -888,7 +888,7 @@ def _cmp_method(self, other, op) -> ArrowExtensionArray:
888888
boxed = self._box_pa(other)
889889
except pa.lib.ArrowInvalid:
890890
# e.g. GH#60228 [1, "b"] we have to operate pointwise
891-
res_values = [op(x, y) for x, y in zip(self, other)]
891+
res_values = [op(x, y) for x, y in zip(self, other, strict=True)]
892892
result = pa.array(res_values, type=pa.bool_(), from_pandas=True)
893893
else:
894894
rtype = boxed.type
@@ -2713,7 +2713,7 @@ def _str_extract(self, pat: str, flags: int = 0, expand: bool = True):
27132713
if expand:
27142714
return {
27152715
col: self._from_pyarrow_array(pc.struct_field(result, [i]))
2716-
for col, i in zip(groups, range(result.type.num_fields))
2716+
for col, i in zip(groups, range(result.type.num_fields), strict=True)
27172717
}
27182718
else:
27192719
return type(self)(pc.struct_field(result, [0]))

pandas/core/arrays/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2869,7 +2869,7 @@ def convert_values(param):
28692869

28702870
# If the operator is not defined for the underlying objects,
28712871
# a TypeError should be raised
2872-
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]
2872+
res = [op(a, b) for (a, b) in zip(lvalues, rvalues, strict=True)]
28732873

28742874
def _maybe_convert(arr):
28752875
if coerce_to_dtype:
@@ -2885,7 +2885,7 @@ def _maybe_convert(arr):
28852885
return res
28862886

28872887
if op.__name__ in {"divmod", "rdivmod"}:
2888-
a, b = zip(*res)
2888+
a, b = zip(*res, strict=True)
28892889
return _maybe_convert(a), _maybe_convert(b)
28902890

28912891
return _maybe_convert(res)

pandas/core/arrays/categorical.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from csv import QUOTE_NONNUMERIC
44
from functools import partial
5+
import itertools
56
import operator
67
from shutil import get_terminal_size
78
from typing import (
@@ -2429,8 +2430,8 @@ def _reverse_indexer(self) -> dict[Hashable, npt.NDArray[np.intp]]:
24292430
ensure_platform_int(self.codes), categories.size
24302431
)
24312432
counts = ensure_int64(counts).cumsum()
2432-
_result = (r[start:end] for start, end in zip(counts, counts[1:]))
2433-
return dict(zip(categories, _result))
2433+
_result = (r[start:end] for start, end in itertools.pairwise(counts))
2434+
return dict(zip(categories, _result, strict=True))
24342435

24352436
# ------------------------------------------------------------------
24362437
# Reductions
@@ -3165,5 +3166,8 @@ def factorize_from_iterables(iterables) -> tuple[list[np.ndarray], list[Index]]:
31653166
# For consistency, it should return two empty lists.
31663167
return [], []
31673168

3168-
codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
3169+
codes, categories = zip(
3170+
*(factorize_from_iterable(it) for it in iterables),
3171+
strict=True,
3172+
)
31693173
return list(codes), list(categories)

pandas/core/arrays/datetimelike.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2374,7 +2374,7 @@ def _concat_same_type(
23742374
to_concat = [x for x in to_concat if len(x)]
23752375

23762376
if obj.freq is not None and all(x.freq == obj.freq for x in to_concat):
2377-
pairs = zip(to_concat[:-1], to_concat[1:])
2377+
pairs = zip(to_concat[:-1], to_concat[1:], strict=True)
23782378
if all(pair[0][-1] + obj.freq == pair[1][0] for pair in pairs):
23792379
new_freq = obj.freq
23802380
new_obj._freq = new_freq

0 commit comments

Comments
 (0)