Skip to content

Commit 05d0167

Browse files
Merge branch 'main' into main
2 parents 55b11ef + 0c24b20 commit 05d0167

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+1588
-510
lines changed

.github/actions/build_pandas/action.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@ runs:
2222
fi
2323
shell: bash -el {0}
2424

25+
- name: Uninstall nomkl
26+
run: |
27+
if conda list nomkl | grep nomkl 1>/dev/null; then
28+
conda remove nomkl -y
29+
fi
30+
shell: bash -el {0}
31+
2532
- name: Build Pandas
2633
run: |
2734
if [[ ${{ inputs.editable }} == "true" ]]; then

.pre-commit-config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ repos:
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
26+
exclude: ^pandas/tests/frame/test_query_eval.py
2627
- id: ruff
2728
# TODO: remove autofixe-only rules when they are checked by ruff
2829
name: ruff-selected-autofixes
@@ -31,7 +32,7 @@ repos:
3132
exclude: ^pandas/tests
3233
args: [--select, "ANN001,ANN2", --fix-only, --exit-non-zero-on-fix]
3334
- id: ruff-format
34-
exclude: ^scripts
35+
exclude: ^scripts|^pandas/tests/frame/test_query_eval.py
3536
- repo: https://github.com/jendrikseipp/vulture
3637
rev: 'v2.11'
3738
hooks:
@@ -85,6 +86,7 @@ repos:
8586
types: [text] # overwrite types: [rst]
8687
types_or: [python, rst]
8788
- id: rst-inline-touching-normal
89+
exclude: ^pandas/tests/frame/test_query_eval.py
8890
types: [text] # overwrite types: [rst]
8991
types_or: [python, rst]
9092
- repo: https://github.com/sphinx-contrib/sphinx-lint

ci/code_checks.sh

Lines changed: 1 addition & 129 deletions
Large diffs are not rendered by default.

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -762,8 +762,7 @@ install pandas) by typing::
762762
your installation is probably fine and you can start contributing!
763763

764764
Often it is worth running only a subset of tests first around your changes before running the
765-
entire suite (tip: you can use the `pandas-coverage app <https://pandas-coverage-12d2130077bc.herokuapp.com/>`_)
766-
to find out which tests hit the lines of code you've modified, and then run only those).
765+
entire suite.
767766

768767
The easiest way to do this is with::
769768

doc/source/whatsnew/v3.0.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Other enhancements
3131
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`)
3232
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
3333
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
34+
- :func:`pandas.merge` now validates the ``how`` parameter input (merge type) (:issue:`59435`)
3435
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
3536
- :meth:`DataFrame.agg` called with ``axis=1`` and a ``func`` which relabels the result index now raises a ``NotImplementedError`` (:issue:`58807`).
3637
- :meth:`Index.get_loc` now accepts also subclasses of ``tuple`` as keys (:issue:`57922`)
@@ -543,7 +544,7 @@ Datetimelike
543544
- Bug in :attr:`is_year_start` where a DateTimeIndex constructed via a date_range with frequency 'MS' wouldn't have the correct year or quarter start attributes (:issue:`57377`)
544545
- Bug in :class:`Timestamp` constructor failing to raise when ``tz=None`` is explicitly specified in conjunction with timezone-aware ``tzinfo`` or data (:issue:`48688`)
545546
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
546-
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56382`)
547+
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
547548
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
548549
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
549550
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
@@ -632,6 +633,7 @@ Period
632633
Plotting
633634
^^^^^^^^
634635
- Bug in :meth:`.DataFrameGroupBy.boxplot` failed when there were multiple groupings (:issue:`14701`)
636+
- Bug in :meth:`DataFrame.plot.line` raising ``ValueError`` when set both color and a ``dict`` style (:issue:`59461`)
635637
- Bug in :meth:`DataFrame.plot` that causes a shift to the right when the frequency multiplier is greater than one. (:issue:`57587`)
636638
- Bug in :meth:`Series.plot` with ``kind="pie"`` with :class:`ArrowDtype` (:issue:`59192`)
637639

@@ -650,6 +652,7 @@ Groupby/resample/rolling
650652
- Bug in :meth:`DataFrameGroupBy.cumsum` where it did not return the correct dtype when the label contained ``None``. (:issue:`58811`)
651653
- Bug in :meth:`DataFrameGroupby.transform` and :meth:`SeriesGroupby.transform` with a reducer and ``observed=False`` that coerces dtype to float when there are unobserved categories. (:issue:`55326`)
652654
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
655+
- Bug in :meth:`Series.resample` could raise when the the date range ended shortly before a non-existent time. (:issue:`58380`)
653656

654657
Reshaping
655658
^^^^^^^^^
@@ -685,6 +688,7 @@ Other
685688
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
686689
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)
687690
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which did not allow to use ``tan`` function. (:issue:`55091`)
691+
- Bug in :meth:`DataFrame.query` which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character ``#``, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (:issue:`59285`) (:issue:`49633`)
688692
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` and ``ascending=False`` not returning a :class:`RangeIndex` columns (:issue:`57293`)
689693
- Bug in :meth:`DataFrame.transform` that was returning the wrong order unless the index was monotonically increasing. (:issue:`57069`)
690694
- Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`)

pandas/_libs/arrays.pyx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ cdef class NDArrayBacked:
6767
"""
6868
Construct a new ExtensionArray `new_array` with `arr` as its _ndarray.
6969
70+
The returned array has the same dtype as self.
71+
72+
Caller is responsible for ensuring `values.dtype == self._ndarray.dtype`.
73+
7074
This should round-trip:
7175
self == self._from_backing_data(self._ndarray)
7276
"""

pandas/_libs/hashtable.pyx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,10 @@ from pandas._libs.khash cimport (
3030
kh_python_hash_func,
3131
khiter_t,
3232
)
33-
from pandas._libs.missing cimport checknull
33+
from pandas._libs.missing cimport (
34+
checknull,
35+
is_matching_na,
36+
)
3437

3538

3639
def get_hashtable_trace_domain():

pandas/_libs/hashtable_class_helper.pxi.in

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1171,11 +1171,13 @@ cdef class StringHashTable(HashTable):
11711171
const char **vecs
11721172
khiter_t k
11731173
bint use_na_value
1174+
bint non_null_na_value
11741175

11751176
if return_inverse:
11761177
labels = np.zeros(n, dtype=np.intp)
11771178
uindexer = np.empty(n, dtype=np.int64)
11781179
use_na_value = na_value is not None
1180+
non_null_na_value = not checknull(na_value)
11791181

11801182
# assign pointers and pre-filter out missing (if ignore_na)
11811183
vecs = <const char **>malloc(n * sizeof(char *))
@@ -1186,7 +1188,12 @@ cdef class StringHashTable(HashTable):
11861188

11871189
if (ignore_na
11881190
and (not isinstance(val, str)
1189-
or (use_na_value and val == na_value))):
1191+
or (use_na_value and (
1192+
(non_null_na_value and val == na_value) or
1193+
(not non_null_na_value and is_matching_na(val, na_value)))
1194+
)
1195+
)
1196+
):
11901197
# if missing values do not count as unique values (i.e. if
11911198
# ignore_na is True), we can skip the actual value, and
11921199
# replace the label with na_sentinel directly
@@ -1452,18 +1459,23 @@ cdef class PyObjectHashTable(HashTable):
14521459
object val
14531460
khiter_t k
14541461
bint use_na_value
1455-
1462+
bint non_null_na_value
14561463
if return_inverse:
14571464
labels = np.empty(n, dtype=np.intp)
14581465
use_na_value = na_value is not None
1466+
non_null_na_value = not checknull(na_value)
14591467

14601468
for i in range(n):
14611469
val = values[i]
14621470
hash(val)
14631471

14641472
if ignore_na and (
14651473
checknull(val)
1466-
or (use_na_value and val == na_value)
1474+
or (use_na_value and (
1475+
(non_null_na_value and val == na_value) or
1476+
(not non_null_na_value and is_matching_na(val, na_value))
1477+
)
1478+
)
14671479
):
14681480
# if missing values do not count as unique values (i.e. if
14691481
# ignore_na is True), skip the hashtable entry for them, and

pandas/_libs/lib.pyx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2699,16 +2699,16 @@ def maybe_convert_objects(ndarray[object] objects,
26992699
seen.object_ = True
27002700

27012701
elif seen.str_:
2702-
if using_string_dtype() and is_string_array(objects, skipna=True):
2702+
if convert_to_nullable_dtype and is_string_array(objects, skipna=True):
27032703
from pandas.core.arrays.string_ import StringDtype
27042704

2705-
dtype = StringDtype(na_value=np.nan)
2705+
dtype = StringDtype()
27062706
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)
27072707

2708-
elif convert_to_nullable_dtype and is_string_array(objects, skipna=True):
2708+
elif using_string_dtype() and is_string_array(objects, skipna=True):
27092709
from pandas.core.arrays.string_ import StringDtype
27102710

2711-
dtype = StringDtype()
2711+
dtype = StringDtype(na_value=np.nan)
27122712
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype)
27132713

27142714
seen.object_ = True

pandas/_libs/src/vendored/numpy/datetime/np_datetime.c

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,12 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
2020
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
2121
#endif // NPY_NO_DEPRECATED_API
2222

23-
#include <Python.h>
24-
2523
#include "pandas/vendored/numpy/datetime/np_datetime.h"
26-
2724
#define NO_IMPORT_ARRAY
2825
#define PY_ARRAY_UNIQUE_SYMBOL PANDAS_DATETIME_NUMPY
2926
#include <numpy/ndarrayobject.h>
3027
#include <numpy/npy_common.h>
28+
#include <stdbool.h>
3129

3230
#if defined(_WIN32)
3331
#ifndef ENABLE_INTSAFE_SIGNED_FUNCTIONS
@@ -58,12 +56,15 @@ _Static_assert(0, "__has_builtin not detected; please try a newer compiler");
5856
#endif
5957
#endif
6058

59+
#define XSTR(a) STR(a)
60+
#define STR(a) #a
61+
6162
#define PD_CHECK_OVERFLOW(FUNC) \
6263
do { \
6364
if ((FUNC) != 0) { \
6465
PyGILState_STATE gstate = PyGILState_Ensure(); \
6566
PyErr_SetString(PyExc_OverflowError, \
66-
"Overflow occurred in npy_datetimestruct_to_datetime"); \
67+
"Overflow occurred at " __FILE__ ":" XSTR(__LINE__)); \
6768
PyGILState_Release(gstate); \
6869
return -1; \
6970
} \
@@ -139,53 +140,53 @@ npy_int64 get_datetimestruct_days(const npy_datetimestruct *dts) {
139140
npy_int64 year, days = 0;
140141
const int *month_lengths;
141142

142-
year = dts->year - 1970;
143-
days = year * 365;
143+
PD_CHECK_OVERFLOW(checked_int64_sub(dts->year, 1970, &year));
144+
PD_CHECK_OVERFLOW(checked_int64_mul(year, 365, &days));
144145

145146
/* Adjust for leap years */
146147
if (days >= 0) {
147148
/*
148149
* 1968 is the closest leap year before 1970.
149150
* Exclude the current year, so add 1.
150151
*/
151-
year += 1;
152+
PD_CHECK_OVERFLOW(checked_int64_add(year, 1, &year));
152153
/* Add one day for each 4 years */
153-
days += year / 4;
154+
PD_CHECK_OVERFLOW(checked_int64_add(days, year / 4, &days));
154155
/* 1900 is the closest previous year divisible by 100 */
155-
year += 68;
156+
PD_CHECK_OVERFLOW(checked_int64_add(year, 68, &year));
156157
/* Subtract one day for each 100 years */
157-
days -= year / 100;
158+
PD_CHECK_OVERFLOW(checked_int64_sub(days, year / 100, &days));
158159
/* 1600 is the closest previous year divisible by 400 */
159-
year += 300;
160+
PD_CHECK_OVERFLOW(checked_int64_add(year, 300, &year));
160161
/* Add one day for each 400 years */
161-
days += year / 400;
162+
PD_CHECK_OVERFLOW(checked_int64_add(days, year / 400, &days));
162163
} else {
163164
/*
164165
* 1972 is the closest later year after 1970.
165166
* Include the current year, so subtract 2.
166167
*/
167-
year -= 2;
168+
PD_CHECK_OVERFLOW(checked_int64_sub(year, 2, &year));
168169
/* Subtract one day for each 4 years */
169-
days += year / 4;
170+
PD_CHECK_OVERFLOW(checked_int64_add(days, year / 4, &days));
170171
/* 2000 is the closest later year divisible by 100 */
171-
year -= 28;
172+
PD_CHECK_OVERFLOW(checked_int64_sub(year, 28, &year));
172173
/* Add one day for each 100 years */
173-
days -= year / 100;
174+
PD_CHECK_OVERFLOW(checked_int64_sub(days, year / 100, &days));
174175
/* 2000 is also the closest later year divisible by 400 */
175176
/* Subtract one day for each 400 years */
176-
days += year / 400;
177+
PD_CHECK_OVERFLOW(checked_int64_add(days, year / 400, &days));
177178
}
178179

179180
month_lengths = days_per_month_table[is_leapyear(dts->year)];
180181
month = dts->month - 1;
181182

182183
/* Add the months */
183184
for (i = 0; i < month; ++i) {
184-
days += month_lengths[i];
185+
PD_CHECK_OVERFLOW(checked_int64_add(days, month_lengths[i], &days));
185186
}
186187

187188
/* Add the days */
188-
days += dts->day - 1;
189+
PD_CHECK_OVERFLOW(checked_int64_add(days, dts->day - 1, &days));
189190

190191
return days;
191192
}
@@ -430,6 +431,15 @@ npy_datetime npy_datetimestruct_to_datetime(NPY_DATETIMEUNIT base,
430431
}
431432

432433
const int64_t days = get_datetimestruct_days(dts);
434+
if (days == -1) {
435+
PyGILState_STATE gstate = PyGILState_Ensure();
436+
bool did_error = PyErr_Occurred() == NULL ? false : true;
437+
PyGILState_Release(gstate);
438+
if (did_error) {
439+
return -1;
440+
}
441+
}
442+
433443
if (base == NPY_FR_D) {
434444
return days;
435445
}

0 commit comments

Comments
 (0)