Skip to content

Commit 93b5bf3

Browse files
author
Kei
committed
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
2 parents 842f561 + de1131f commit 93b5bf3

File tree

210 files changed

+3112
-2211
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

210 files changed

+3112
-2211
lines changed

.github/ISSUE_TEMPLATE/pdep_vote.yaml

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
name: PDEP Vote
2+
description: Call for a vote on a PDEP
3+
title: "VOTE: "
4+
labels: [Vote]
5+
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: >
10+
As per [PDEP-1](https://pandas.pydata.org/pdeps/0001-purpose-and-guidelines.html), the following issue template should be used when a
11+
maintainer has opened a PDEP discussion and is ready to call for a vote.
12+
- type: checkboxes
13+
attributes:
14+
label: Locked issue
15+
options:
16+
- label: >
17+
I locked this voting issue so that only voting members are able to cast their votes or
18+
comment on this issue.
19+
required: true
20+
- type: input
21+
id: PDEP-name
22+
attributes:
23+
label: PDEP number and title
24+
placeholder: >
25+
PDEP-1: Purpose and guidelines
26+
validations:
27+
required: true
28+
- type: input
29+
id: PDEP-link
30+
attributes:
31+
label: Pull request with discussion
32+
description: e.g. https://github.com/pandas-dev/pandas/pull/47444
33+
validations:
34+
required: true
35+
- type: input
36+
id: PDEP-rendered-link
37+
attributes:
38+
label: Rendered PDEP for easy reading
39+
description: e.g. https://github.com/pandas-dev/pandas/pull/47444/files?short_path=7c449e6#diff-7c449e698132205b235c501f7e47ebba38da4d2b7f9492c98f16745dba787041
40+
validations:
41+
required: true
42+
- type: input
43+
id: PDEP-number-of-discussion-participants
44+
attributes:
45+
label: Discussion participants
46+
description: >
47+
You may find it useful to list or total the number of participating members in the
48+
PDEP discussion PR. This would be the maximum possible disapprove votes.
49+
placeholder: >
50+
14 voting members participated in the PR discussion thus far.
51+
- type: input
52+
id: PDEP-vote-end
53+
attributes:
54+
label: Voting will close in 15 days.
55+
description: The voting period end date. ('Voting will close in 15 days.' will be automatically written)
56+
- type: markdown
57+
attributes:
58+
value: ---
59+
- type: textarea
60+
id: Vote
61+
attributes:
62+
label: Vote
63+
value: |
64+
Cast your vote in a comment below.
65+
* +1: approve.
66+
* 0: abstain.
67+
* Reason: A one sentence reason is required.
68+
* -1: disapprove
69+
* Reason: A one sentence reason is required.
70+
A disapprove vote requires prior participation in the linked discussion PR.
71+
72+
@pandas-dev/pandas-core
73+
validations:
74+
required: true

.github/workflows/code-checks.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ jobs:
8585
echo "PYTHONPATH=$PYTHONPATH" >> $GITHUB_ENV
8686
if: ${{ steps.build.outcome == 'success' && always() }}
8787

88-
- name: Typing + pylint
88+
- name: Typing
8989
uses: pre-commit/[email protected]
9090
with:
9191
extra_args: --verbose --hook-stage manual --all-files

.pre-commit-config.yaml

Lines changed: 1 addition & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ci:
1616
autofix_prs: false
1717
autoupdate_schedule: monthly
1818
# manual stage hooks
19-
skip: [pylint, pyright, mypy]
19+
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
2222
rev: v0.3.4
@@ -30,12 +30,6 @@ repos:
3030
files: ^pandas
3131
exclude: ^pandas/tests
3232
args: [--select, "ANN001,ANN2", --fix-only, --exit-non-zero-on-fix]
33-
- id: ruff
34-
name: ruff-use-pd_array-in-core
35-
alias: ruff-use-pd_array-in-core
36-
files: ^pandas/core/
37-
exclude: ^pandas/core/api\.py$
38-
args: [--select, "ICN001", --exit-non-zero-on-fix]
3933
- id: ruff-format
4034
exclude: ^scripts
4135
- repo: https://github.com/jendrikseipp/vulture
@@ -73,25 +67,6 @@ repos:
7367
- id: fix-encoding-pragma
7468
args: [--remove]
7569
- id: trailing-whitespace
76-
- repo: https://github.com/pylint-dev/pylint
77-
rev: v3.0.1
78-
hooks:
79-
- id: pylint
80-
stages: [manual]
81-
args: [--load-plugins=pylint.extensions.redefined_loop_name, --fail-on=I0021]
82-
- id: pylint
83-
alias: redefined-outer-name
84-
name: Redefining name from outer scope
85-
files: ^pandas/
86-
exclude: |
87-
(?x)
88-
^pandas/tests # keep excluded
89-
|/_testing/ # keep excluded
90-
|^pandas/util/_test_decorators\.py # keep excluded
91-
|^pandas/_version\.py # keep excluded
92-
|^pandas/conftest\.py # keep excluded
93-
args: [--disable=all, --enable=redefined-outer-name]
94-
stages: [manual]
9570
- repo: https://github.com/PyCQA/isort
9671
rev: 5.13.2
9772
hooks:

asv_bench/asv.conf.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
// pip (with all the conda available packages installed first,
4242
// followed by the pip installed packages).
4343
"matrix": {
44+
"pip+build": [],
4445
"Cython": ["3.0"],
4546
"matplotlib": [],
4647
"sqlalchemy": [],

asv_bench/benchmarks/frame_methods.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -862,4 +862,28 @@ def time_last_valid_index(self, dtype):
862862
self.df.last_valid_index()
863863

864864

865+
class Update:
866+
def setup(self):
867+
rng = np.random.default_rng()
868+
self.df = DataFrame(rng.uniform(size=(1_000_000, 10)))
869+
870+
idx = rng.choice(range(1_000_000), size=1_000_000, replace=False)
871+
self.df_random = DataFrame(self.df, index=idx)
872+
873+
idx = rng.choice(range(1_000_000), size=100_000, replace=False)
874+
cols = rng.choice(range(10), size=2, replace=False)
875+
self.df_sample = DataFrame(
876+
rng.uniform(size=(100_000, 2)), index=idx, columns=cols
877+
)
878+
879+
def time_to_update_big_frame_small_arg(self):
880+
self.df.update(self.df_sample)
881+
882+
def time_to_update_random_indices(self):
883+
self.df_random.update(self.df_sample)
884+
885+
def time_to_update_small_frame_big_arg(self):
886+
self.df_sample.update(self.df)
887+
888+
865889
from .pandas_vb_common import setup # noqa: F401 isort:skip

ci/code_checks.sh

Lines changed: 3 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -81,23 +81,13 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8181
-i "pandas.CategoricalIndex.ordered SA01" \
8282
-i "pandas.DataFrame.__dataframe__ SA01" \
8383
-i "pandas.DataFrame.__iter__ SA01" \
84-
-i "pandas.DataFrame.assign SA01" \
8584
-i "pandas.DataFrame.at_time PR01" \
86-
-i "pandas.DataFrame.axes SA01" \
87-
-i "pandas.DataFrame.bfill SA01" \
8885
-i "pandas.DataFrame.columns SA01" \
89-
-i "pandas.DataFrame.copy SA01" \
9086
-i "pandas.DataFrame.droplevel SA01" \
91-
-i "pandas.DataFrame.dtypes SA01" \
92-
-i "pandas.DataFrame.ffill SA01" \
93-
-i "pandas.DataFrame.first_valid_index SA01" \
94-
-i "pandas.DataFrame.get SA01" \
9587
-i "pandas.DataFrame.hist RT03" \
9688
-i "pandas.DataFrame.infer_objects RT03" \
97-
-i "pandas.DataFrame.keys SA01" \
9889
-i "pandas.DataFrame.kurt RT03,SA01" \
9990
-i "pandas.DataFrame.kurtosis RT03,SA01" \
100-
-i "pandas.DataFrame.last_valid_index SA01" \
10191
-i "pandas.DataFrame.max RT03" \
10292
-i "pandas.DataFrame.mean RT03,SA01" \
10393
-i "pandas.DataFrame.median RT03,SA01" \
@@ -124,24 +114,18 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
124114
-i "pandas.DatetimeIndex.ceil SA01" \
125115
-i "pandas.DatetimeIndex.date SA01" \
126116
-i "pandas.DatetimeIndex.day SA01" \
127-
-i "pandas.DatetimeIndex.day_name SA01" \
128117
-i "pandas.DatetimeIndex.day_of_year SA01" \
129118
-i "pandas.DatetimeIndex.dayofyear SA01" \
130119
-i "pandas.DatetimeIndex.floor SA01" \
131120
-i "pandas.DatetimeIndex.freqstr SA01" \
132-
-i "pandas.DatetimeIndex.hour SA01" \
133121
-i "pandas.DatetimeIndex.indexer_at_time PR01,RT03" \
134122
-i "pandas.DatetimeIndex.indexer_between_time RT03" \
135123
-i "pandas.DatetimeIndex.inferred_freq SA01" \
136124
-i "pandas.DatetimeIndex.is_leap_year SA01" \
137125
-i "pandas.DatetimeIndex.microsecond SA01" \
138-
-i "pandas.DatetimeIndex.minute SA01" \
139-
-i "pandas.DatetimeIndex.month SA01" \
140-
-i "pandas.DatetimeIndex.month_name SA01" \
141126
-i "pandas.DatetimeIndex.nanosecond SA01" \
142127
-i "pandas.DatetimeIndex.quarter SA01" \
143128
-i "pandas.DatetimeIndex.round SA01" \
144-
-i "pandas.DatetimeIndex.second SA01" \
145129
-i "pandas.DatetimeIndex.snap PR01,RT03,SA01" \
146130
-i "pandas.DatetimeIndex.std PR01,RT03" \
147131
-i "pandas.DatetimeIndex.time SA01" \
@@ -150,16 +134,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
150134
-i "pandas.DatetimeIndex.to_pydatetime RT03,SA01" \
151135
-i "pandas.DatetimeIndex.tz SA01" \
152136
-i "pandas.DatetimeIndex.tz_convert RT03" \
153-
-i "pandas.DatetimeIndex.year SA01" \
154137
-i "pandas.DatetimeTZDtype SA01" \
155138
-i "pandas.DatetimeTZDtype.tz SA01" \
156139
-i "pandas.DatetimeTZDtype.unit SA01" \
157-
-i "pandas.ExcelFile PR01,SA01" \
158-
-i "pandas.ExcelFile.parse PR01,SA01" \
159-
-i "pandas.ExcelWriter SA01" \
160-
-i "pandas.Float32Dtype SA01" \
161-
-i "pandas.Float64Dtype SA01" \
162-
-i "pandas.Grouper PR02,SA01" \
140+
-i "pandas.Grouper PR02" \
163141
-i "pandas.HDFStore.append PR01,SA01" \
164142
-i "pandas.HDFStore.get SA01" \
165143
-i "pandas.HDFStore.groups SA01" \
@@ -309,7 +287,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
309287
-i "pandas.Series.add PR07" \
310288
-i "pandas.Series.at_time PR01" \
311289
-i "pandas.Series.backfill PR01,SA01" \
312-
-i "pandas.Series.bfill SA01" \
313290
-i "pandas.Series.case_when RT03" \
314291
-i "pandas.Series.cat PR07,SA01" \
315292
-i "pandas.Series.cat.add_categories PR01,PR02" \
@@ -322,36 +299,31 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
322299
-i "pandas.Series.cat.rename_categories PR01,PR02" \
323300
-i "pandas.Series.cat.reorder_categories PR01,PR02" \
324301
-i "pandas.Series.cat.set_categories PR01,PR02" \
325-
-i "pandas.Series.copy SA01" \
326302
-i "pandas.Series.div PR07" \
327303
-i "pandas.Series.droplevel SA01" \
328304
-i "pandas.Series.dt.as_unit PR01,PR02" \
329305
-i "pandas.Series.dt.ceil PR01,PR02,SA01" \
330306
-i "pandas.Series.dt.components SA01" \
331307
-i "pandas.Series.dt.date SA01" \
332308
-i "pandas.Series.dt.day SA01" \
333-
-i "pandas.Series.dt.day_name PR01,PR02,SA01" \
309+
-i "pandas.Series.dt.day_name PR01,PR02" \
334310
-i "pandas.Series.dt.day_of_year SA01" \
335311
-i "pandas.Series.dt.dayofyear SA01" \
336312
-i "pandas.Series.dt.days SA01" \
337313
-i "pandas.Series.dt.days_in_month SA01" \
338314
-i "pandas.Series.dt.daysinmonth SA01" \
339315
-i "pandas.Series.dt.floor PR01,PR02,SA01" \
340316
-i "pandas.Series.dt.freq GL08" \
341-
-i "pandas.Series.dt.hour SA01" \
342317
-i "pandas.Series.dt.is_leap_year SA01" \
343318
-i "pandas.Series.dt.microsecond SA01" \
344319
-i "pandas.Series.dt.microseconds SA01" \
345-
-i "pandas.Series.dt.minute SA01" \
346-
-i "pandas.Series.dt.month SA01" \
347-
-i "pandas.Series.dt.month_name PR01,PR02,SA01" \
320+
-i "pandas.Series.dt.month_name PR01,PR02" \
348321
-i "pandas.Series.dt.nanosecond SA01" \
349322
-i "pandas.Series.dt.nanoseconds SA01" \
350323
-i "pandas.Series.dt.normalize PR01" \
351324
-i "pandas.Series.dt.quarter SA01" \
352325
-i "pandas.Series.dt.qyear GL08" \
353326
-i "pandas.Series.dt.round PR01,PR02,SA01" \
354-
-i "pandas.Series.dt.second SA01" \
355327
-i "pandas.Series.dt.seconds SA01" \
356328
-i "pandas.Series.dt.strftime PR01,PR02" \
357329
-i "pandas.Series.dt.time SA01" \
@@ -362,27 +334,20 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
362334
-i "pandas.Series.dt.tz_convert PR01,PR02,RT03" \
363335
-i "pandas.Series.dt.tz_localize PR01,PR02" \
364336
-i "pandas.Series.dt.unit GL08" \
365-
-i "pandas.Series.dt.year SA01" \
366337
-i "pandas.Series.dtype SA01" \
367-
-i "pandas.Series.dtypes SA01" \
368338
-i "pandas.Series.empty GL08" \
369339
-i "pandas.Series.eq PR07,SA01" \
370-
-i "pandas.Series.ffill SA01" \
371-
-i "pandas.Series.first_valid_index SA01" \
372340
-i "pandas.Series.floordiv PR07" \
373341
-i "pandas.Series.ge PR07,SA01" \
374-
-i "pandas.Series.get SA01" \
375342
-i "pandas.Series.gt PR07,SA01" \
376343
-i "pandas.Series.hasnans SA01" \
377344
-i "pandas.Series.infer_objects RT03" \
378345
-i "pandas.Series.is_monotonic_decreasing SA01" \
379346
-i "pandas.Series.is_monotonic_increasing SA01" \
380347
-i "pandas.Series.is_unique SA01" \
381348
-i "pandas.Series.item SA01" \
382-
-i "pandas.Series.keys SA01" \
383349
-i "pandas.Series.kurt RT03,SA01" \
384350
-i "pandas.Series.kurtosis RT03,SA01" \
385-
-i "pandas.Series.last_valid_index SA01" \
386351
-i "pandas.Series.le PR07,SA01" \
387352
-i "pandas.Series.list.__getitem__ SA01" \
388353
-i "pandas.Series.list.flatten SA01" \

doc/source/getting_started/install.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,8 @@ SciPy 1.10.0 computation Miscellaneous stati
269269
xarray 2022.12.0 computation pandas-like API for N-dimensional data
270270
========================= ================== =============== =============================================================
271271

272+
.. _install.excel_dependencies:
273+
272274
Excel files
273275
^^^^^^^^^^^
274276

doc/source/getting_started/intro_tutorials/02_read_write.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,12 @@ strings (``object``).
111111

112112
My colleague requested the Titanic data as a spreadsheet.
113113

114+
.. note::
115+
If you want to use :func:`~pandas.to_excel` and :func:`~pandas.read_excel`,
116+
you need to install an Excel reader as outlined in the
117+
:ref:`Excel files <install.excel_dependencies>` section of the
118+
installation documentation.
119+
114120
.. ipython:: python
115121
116122
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

doc/source/user_guide/basics.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -476,15 +476,15 @@ For example:
476476
.. ipython:: python
477477
478478
df
479-
df.mean(0)
480-
df.mean(1)
479+
df.mean(axis=0)
480+
df.mean(axis=1)
481481
482482
All such methods have a ``skipna`` option signaling whether to exclude missing
483483
data (``True`` by default):
484484

485485
.. ipython:: python
486486
487-
df.sum(0, skipna=False)
487+
df.sum(axis=0, skipna=False)
488488
df.sum(axis=1, skipna=True)
489489
490490
Combined with the broadcasting / arithmetic behavior, one can describe various
@@ -495,8 +495,8 @@ standard deviation of 1), very concisely:
495495
496496
ts_stand = (df - df.mean()) / df.std()
497497
ts_stand.std()
498-
xs_stand = df.sub(df.mean(1), axis=0).div(df.std(1), axis=0)
499-
xs_stand.std(1)
498+
xs_stand = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
499+
xs_stand.std(axis=1)
500500
501501
Note that methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod`
502502
preserve the location of ``NaN`` values. This is somewhat different from

doc/source/user_guide/indexing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -952,7 +952,7 @@ To select a row where each column meets its own criterion:
952952
953953
values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}
954954
955-
row_mask = df.isin(values).all(1)
955+
row_mask = df.isin(values).all(axis=1)
956956
957957
df[row_mask]
958958

0 commit comments

Comments
 (0)