Skip to content

Commit ded61b3

Browse files
committed
Merge remote-tracking branch 'upstream/main' into cln/deps
2 parents f8325a4 + caa58c7 commit ded61b3

File tree

35 files changed

+908
-250
lines changed

35 files changed

+908
-250
lines changed

.github/workflows/unit-tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
strategy:
2828
matrix:
2929
platform: [ubuntu-22.04, ubuntu-24.04-arm]
30-
env_file: [actions-310.yaml, actions-311.yaml, actions-312.yaml]
30+
env_file: [actions-310.yaml, actions-311.yaml, actions-312.yaml, actions-313.yaml]
3131
# Prevent the include jobs from overriding other jobs
3232
pattern: [""]
3333
pandas_future_infer_string: ["0"]
@@ -188,7 +188,7 @@ jobs:
188188
matrix:
189189
# Note: Don't use macOS latest since macos 14 appears to be arm64 only
190190
os: [macos-13, macos-14, windows-latest]
191-
env_file: [actions-310.yaml, actions-311.yaml, actions-312.yaml]
191+
env_file: [actions-310.yaml, actions-311.yaml, actions-312.yaml, actions-313.yaml]
192192
fail-fast: false
193193
runs-on: ${{ matrix.os }}
194194
name: ${{ format('{0} {1}', matrix.os, matrix.env_file) }}
@@ -316,7 +316,7 @@ jobs:
316316
# To freeze this file, uncomment out the ``if: false`` condition, and migrate the jobs
317317
# to the corresponding posix/windows-macos/sdist etc. workflows.
318318
# Feel free to modify this comment as necessary.
319-
# if: false # Uncomment this to freeze the workflow, comment it to unfreeze
319+
if: false
320320
defaults:
321321
run:
322322
shell: bash -eou pipefail {0}

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ jobs:
153153
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
154154

155155
- name: Build wheels
156-
uses: pypa/[email protected].2
156+
uses: pypa/[email protected].3
157157
with:
158158
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
159159
env:

ci/deps/actions-313.yaml

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
name: pandas-dev-313
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
- python=3.13
6+
7+
# build dependencies
8+
- versioneer
9+
- cython>=0.29.33
10+
- meson=1.2.1
11+
- meson-python=0.13.1
12+
13+
# test dependencies
14+
- pytest>=7.3.2
15+
- pytest-cov
16+
- pytest-xdist>=3.4.0
17+
- pytest-localserver>=0.8.1
18+
- pytest-qt>=4.4.0
19+
- boto3
20+
21+
# required dependencies
22+
- python-dateutil
23+
- numpy
24+
25+
# optional dependencies
26+
- beautifulsoup4>=4.12.3
27+
- blosc>=1.21.3
28+
- bottleneck>=1.3.6
29+
- fastparquet>=2024.2.0
30+
- fsspec>=2024.2.0
31+
- html5lib>=1.1
32+
- hypothesis>=6.84.0
33+
- gcsfs>=2024.2.0
34+
- jinja2>=3.1.3
35+
- lxml>=4.9.2
36+
- matplotlib>=3.8.3
37+
- numba>=0.59.0
38+
- numexpr>=2.9.0
39+
- odfpy>=1.4.1
40+
- qtpy>=2.3.0
41+
- pyqt>=5.15.9
42+
- openpyxl>=3.1.2
43+
- psycopg2>=2.9.6
44+
- pyarrow>=10.0.1
45+
- pymysql>=1.1.0
46+
- pyreadstat>=1.2.6
47+
- pytables>=3.8.0
48+
- python-calamine>=0.1.7
49+
- pytz>=2023.4
50+
- pyxlsb>=1.0.10
51+
- s3fs>=2024.2.0
52+
- scipy>=1.12.0
53+
- sqlalchemy>=2.0.0
54+
- tabulate>=0.9.0
55+
- xarray>=2024.1.1, <=2024.9.0
56+
- xlrd>=2.0.1
57+
- xlsxwriter>=3.2.0
58+
- zstandard>=0.22.0
59+
60+
- pip:
61+
- adbc-driver-postgresql>=0.10.0
62+
- adbc-driver-sqlite>=0.8.0
63+
- tzdata>=2022.7

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,7 @@ Numeric
710710
^^^^^^^
711711
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
712712
- Bug in :meth:`DataFrame.quantile` where the column type was not preserved when ``numeric_only=True`` with a list-like ``q`` produced an empty result (:issue:`59035`)
713+
- Bug in :meth:`Series.dot` returning ``object`` dtype for :class:`ArrowDtype` and nullable-dtype data (:issue:`61375`)
713714
- Bug in ``np.matmul`` with :class:`Index` inputs raising a ``TypeError`` (:issue:`57079`)
714715

715716
Conversion
@@ -767,6 +768,7 @@ I/O
767768
- Bug in :meth:`DataFrame.to_dict` raises unnecessary ``UserWarning`` when columns are not unique and ``orient='tight'``. (:issue:`58281`)
768769
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
769770
- Bug in :meth:`DataFrame.to_excel` where the :class:`MultiIndex` index with a period level was not a date (:issue:`60099`)
771+
- Bug in :meth:`DataFrame.to_stata` when exporting a column containing both long strings (Stata strL) and :class:`pd.NA` values (:issue:`23633`)
770772
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
771773
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
772774
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
@@ -794,6 +796,7 @@ Period
794796
Plotting
795797
^^^^^^^^
796798
- Bug in :meth:`.DataFrameGroupBy.boxplot` failed when there were multiple groupings (:issue:`14701`)
799+
- Bug in :meth:`DataFrame.plot.bar` when ``subplots`` and ``stacked=True`` are used in conjunction which causes incorrect stacking. (:issue:`61018`)
797800
- Bug in :meth:`DataFrame.plot.bar` with ``stacked=True`` where labels on stacked bars with zero-height segments were incorrectly positioned at the base instead of the label position of the previous segment (:issue:`59429`)
798801
- Bug in :meth:`DataFrame.plot.line` raising ``ValueError`` when set both color and a ``dict`` style (:issue:`59461`)
799802
- Bug in :meth:`DataFrame.plot` that causes a shift to the right when the frequency multiplier is greater than one. (:issue:`57587`)
@@ -805,10 +808,12 @@ Groupby/resample/rolling
805808
^^^^^^^^^^^^^^^^^^^^^^^^
806809
- Bug in :meth:`.DataFrameGroupBy.__len__` and :meth:`.SeriesGroupBy.__len__` would raise when the grouping contained NA values and ``dropna=False`` (:issue:`58644`)
807810
- Bug in :meth:`.DataFrameGroupBy.any` that returned True for groups where all Timedelta values are NaT. (:issue:`59712`)
811+
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupBy.groups` would fail when the groups were :class:`Categorical` with an NA value (:issue:`61356`)
808812
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
809813
- Bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
810814
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
811815
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
816+
- Bug in :meth:`.Series.rolling` when used with a :class:`.BaseIndexer` subclass and computing min/max (:issue:`46726`)
812817
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
813818
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` were not keeping the index name when the index had :class:`ArrowDtype` timestamp dtype (:issue:`61222`)
814819
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
@@ -834,6 +839,7 @@ Reshaping
834839
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
835840
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
836841
- Bug in :meth:`DataFrame.pivot_table` incorrectly subaggregating results when called without an ``index`` argument (:issue:`58722`)
842+
- Bug in :meth:`DataFrame.pivot_table` incorrectly ignoring the ``values`` argument when also supplied to the ``index`` or ``columns`` parameters (:issue:`57876`, :issue:`61292`)
837843
- Bug in :meth:`DataFrame.stack` with the new implementation where ``ValueError`` is raised when ``level=[]`` (:issue:`60740`)
838844
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
839845
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)

pandas/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
__docformat__ = "restructuredtext"
44

55
# Let users know if they're missing any of our hard dependencies
6-
_hard_dependencies = ("numpy", "dateutil")
6+
_hard_dependencies = ("numpy", "dateutil", "tzdata")
77

88
for _dependency in _hard_dependencies:
99
try:

pandas/_libs/window/aggregations.pyx

Lines changed: 115 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ from libc.math cimport (
66
sqrt,
77
)
88
from libcpp.deque cimport deque
9+
from libcpp.stack cimport stack
910
from libcpp.unordered_map cimport unordered_map
1011

1112
from pandas._libs.algos cimport TiebreakEnumType
@@ -988,39 +989,29 @@ def roll_median_c(const float64_t[:] values, ndarray[int64_t] start,
988989

989990
# ----------------------------------------------------------------------
990991

991-
# Moving maximum / minimum code taken from Bottleneck
992-
# Licence at LICENSES/BOTTLENECK_LICENCE
993-
994-
995-
cdef float64_t init_mm(float64_t ai, Py_ssize_t *nobs, bint is_max) noexcept nogil:
996-
997-
if ai == ai:
998-
nobs[0] = nobs[0] + 1
999-
elif is_max:
1000-
ai = MINfloat64
1001-
else:
1002-
ai = MAXfloat64
1003-
1004-
return ai
1005-
1006-
1007-
cdef void remove_mm(float64_t aold, Py_ssize_t *nobs) noexcept nogil:
1008-
""" remove a value from the mm calc """
1009-
if aold == aold:
1010-
nobs[0] = nobs[0] - 1
1011-
1012-
1013-
cdef float64_t calc_mm(int64_t minp, Py_ssize_t nobs,
1014-
float64_t value) noexcept nogil:
1015-
cdef:
1016-
float64_t result
992+
cdef int64_t bisect_left(
993+
deque[int64_t]& a,
994+
int64_t x,
995+
int64_t lo=0,
996+
int64_t hi=-1
997+
) nogil:
998+
"""Same as https://docs.python.org/3/library/bisect.html."""
999+
1000+
cdef int64_t mid
1001+
if hi == -1:
1002+
hi = a.size()
1003+
while lo < hi:
1004+
mid = (lo + hi) // 2
1005+
if a.at(mid) < x:
1006+
lo = mid + 1
1007+
else:
1008+
hi = mid
1009+
return lo
10171010

1018-
if nobs >= minp:
1019-
result = value
1020-
else:
1021-
result = NaN
1011+
from libc.math cimport isnan
10221012

1023-
return result
1013+
# Prior version of moving maximum / minimum code taken from Bottleneck
1014+
# Licence at LICENSES/BOTTLENECK_LICENCE
10241015

10251016

10261017
def roll_max(ndarray[float64_t] values, ndarray[int64_t] start,
@@ -1068,69 +1059,110 @@ def roll_min(ndarray[float64_t] values, ndarray[int64_t] start,
10681059
return _roll_min_max(values, start, end, minp, is_max=0)
10691060

10701061

1071-
cdef _roll_min_max(ndarray[float64_t] values,
1072-
ndarray[int64_t] starti,
1073-
ndarray[int64_t] endi,
1074-
int64_t minp,
1075-
bint is_max):
1062+
def _roll_min_max(
1063+
ndarray[float64_t] values,
1064+
ndarray[int64_t] start,
1065+
ndarray[int64_t] end,
1066+
int64_t minp,
1067+
bint is_max
1068+
):
10761069
cdef:
1077-
float64_t ai
1078-
int64_t curr_win_size, start
1079-
Py_ssize_t i, k, nobs = 0, N = len(starti)
1080-
deque Q[int64_t] # min/max always the front
1081-
deque W[int64_t] # track the whole window for nobs compute
1070+
Py_ssize_t i, i_next, k, valid_start, last_end, last_start, N = len(start)
1071+
# Indices of bounded extrema in `values`. `candidates[i]` is always increasing.
1072+
# `values[candidates[i]]` is decreasing for max and increasing for min.
1073+
deque candidates[int64_t]
1074+
# Indices of largest windows that "cover" preceding windows.
1075+
stack dominators[int64_t]
10821076
ndarray[float64_t, ndim=1] output
10831077

1078+
Py_ssize_t this_start, this_end, stash_start
1079+
int64_t q_idx
1080+
10841081
output = np.empty(N, dtype=np.float64)
1085-
Q = deque[int64_t]()
1086-
W = deque[int64_t]()
1082+
candidates = deque[int64_t]()
1083+
dominators = stack[int64_t]()
1084+
1085+
# This function was "ported" / translated from sliding_min_max()
1086+
# in /pandas/core/_numba/kernels/min_max_.py.
1087+
# (See there for credits and some comments.)
1088+
# Code translation assumptions/rules:
1089+
# - min_periods --> minp
1090+
# - deque[0] --> front()
1091+
# - deque[-1] --> back()
1092+
# - stack[-1] --> top()
1093+
# - bool(stack/deque) --> !empty()
1094+
# - deque.append() --> push_back()
1095+
# - stack.append() --> push()
1096+
# - deque.popleft --> pop_front()
1097+
# - deque.pop() --> pop_back()
10871098

10881099
with nogil:
1100+
if minp < 1:
1101+
minp = 1
1102+
1103+
if N>2:
1104+
i_next = N - 1
1105+
for i in range(N - 2, -1, -1):
1106+
if start[i_next] < start[i] \
1107+
and (
1108+
dominators.empty()
1109+
or start[dominators.top()] > start[i_next]
1110+
):
1111+
dominators.push(i_next)
1112+
i_next = i
1113+
1114+
# NaN tracking to guarantee minp
1115+
valid_start = -minp
1116+
1117+
last_end = 0
1118+
last_start = -1
10891119

1090-
# This is using a modified version of the C++ code in this
1091-
# SO post: https://stackoverflow.com/a/12239580
1092-
# The original impl didn't deal with variable window sizes
1093-
# So the code was optimized for that
1094-
1095-
# first window's size
1096-
curr_win_size = endi[0] - starti[0]
1097-
# GH 32865
1098-
# Anchor output index to values index to provide custom
1099-
# BaseIndexer support
11001120
for i in range(N):
1121+
this_start = start[i]
1122+
this_end = end[i]
11011123

1102-
curr_win_size = endi[i] - starti[i]
1103-
if i == 0:
1104-
start = starti[i]
1105-
else:
1106-
start = endi[i - 1]
1107-
1108-
for k in range(start, endi[i]):
1109-
ai = init_mm(values[k], &nobs, is_max)
1110-
# Discard previous entries if we find new min or max
1111-
if is_max:
1112-
while not Q.empty() and ((ai >= values[Q.back()]) or
1113-
values[Q.back()] != values[Q.back()]):
1114-
Q.pop_back()
1115-
else:
1116-
while not Q.empty() and ((ai <= values[Q.back()]) or
1117-
values[Q.back()] != values[Q.back()]):
1118-
Q.pop_back()
1119-
Q.push_back(k)
1120-
W.push_back(k)
1121-
1122-
# Discard entries outside and left of current window
1123-
while not Q.empty() and Q.front() <= starti[i] - 1:
1124-
Q.pop_front()
1125-
while not W.empty() and W.front() <= starti[i] - 1:
1126-
remove_mm(values[W.front()], &nobs)
1127-
W.pop_front()
1128-
1129-
# Save output based on index in input value array
1130-
if not Q.empty() and curr_win_size > 0:
1131-
output[i] = calc_mm(minp, nobs, values[Q.front()])
1124+
if (not dominators.empty() and dominators.top() == i):
1125+
dominators.pop()
1126+
1127+
if not (this_end > last_end
1128+
or (this_end == last_end and this_start >= last_start)):
1129+
raise ValueError(
1130+
"Start/End ordering requirement is violated at index {}".format(i))
1131+
1132+
if dominators.empty():
1133+
stash_start = this_start
11321134
else:
1135+
stash_start = min(this_start, start[dominators.top()])
1136+
1137+
while not candidates.empty() and candidates.front() < stash_start:
1138+
candidates.pop_front()
1139+
1140+
for k in range(last_end, this_end):
1141+
if not isnan(values[k]):
1142+
valid_start += 1
1143+
while valid_start >= 0 and isnan(values[valid_start]):
1144+
valid_start += 1
1145+
1146+
if is_max:
1147+
while (not candidates.empty()
1148+
and values[k] >= values[candidates.back()]):
1149+
candidates.pop_back()
1150+
else:
1151+
while (not candidates.empty()
1152+
and values[k] <= values[candidates.back()]):
1153+
candidates.pop_back()
1154+
candidates.push_back(k)
1155+
1156+
if candidates.empty() or this_start > valid_start:
11331157
output[i] = NaN
1158+
elif candidates.front() >= this_start:
1159+
# ^^ This is here to avoid costly bisection for fixed window sizes.
1160+
output[i] = values[candidates.front()]
1161+
else:
1162+
q_idx = bisect_left(candidates, this_start, lo=1)
1163+
output[i] = values[candidates[q_idx]]
1164+
last_end = this_end
1165+
last_start = this_start
11341166

11351167
return output
11361168

0 commit comments

Comments
 (0)