Skip to content

Commit 0ce083d

Browse files
Merge branch 'main' into fix/group_by_agg_pyarrow_bool_numpy_same_type
2 parents e1ccef6 + 41f3c2e commit 0ce083d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1095
-391
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.3.4
22+
rev: v0.4.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,12 +46,12 @@ repos:
4646
types_or: [python, rst, markdown, cython, c]
4747
additional_dependencies: [tomli]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.16.0
49+
rev: v0.16.2
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
5353
- repo: https://github.com/pre-commit/pre-commit-hooks
54-
rev: v4.5.0
54+
rev: v4.6.0
5555
hooks:
5656
- id: check-case-conflict
5757
- id: check-toml
@@ -91,7 +91,7 @@ repos:
9191
hooks:
9292
- id: sphinx-lint
9393
- repo: https://github.com/pre-commit/mirrors-clang-format
94-
rev: v18.1.2
94+
rev: v18.1.4
9595
hooks:
9696
- id: clang-format
9797
files: ^pandas/_libs/src|^pandas/_libs/include

ci/code_checks.sh

Lines changed: 3 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -75,24 +75,9 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
7575
-i "pandas.DataFrame.median RT03,SA01" \
7676
-i "pandas.DataFrame.min RT03" \
7777
-i "pandas.DataFrame.plot PR02,SA01" \
78-
-i "pandas.DataFrame.std PR01,RT03,SA01" \
79-
-i "pandas.DataFrame.sum RT03" \
80-
-i "pandas.DataFrame.swaplevel SA01" \
81-
-i "pandas.DataFrame.to_markdown SA01" \
82-
-i "pandas.DataFrame.var PR01,RT03,SA01" \
8378
-i "pandas.Grouper PR02" \
8479
-i "pandas.Index PR07" \
85-
-i "pandas.Index.join PR07,RT03,SA01" \
86-
-i "pandas.Index.names GL08" \
87-
-i "pandas.Index.ravel PR01,RT03" \
88-
-i "pandas.Index.str PR01,SA01" \
8980
-i "pandas.Interval PR02" \
90-
-i "pandas.Interval.closed SA01" \
91-
-i "pandas.Interval.left SA01" \
92-
-i "pandas.Interval.mid SA01" \
93-
-i "pandas.Interval.right SA01" \
94-
-i "pandas.IntervalDtype PR01,SA01" \
95-
-i "pandas.IntervalDtype.subtype SA01" \
9681
-i "pandas.IntervalIndex.closed SA01" \
9782
-i "pandas.IntervalIndex.contains RT03" \
9883
-i "pandas.IntervalIndex.get_loc PR07,RT03,SA01" \
@@ -165,16 +150,12 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
165150
-i "pandas.RangeIndex.start SA01" \
166151
-i "pandas.RangeIndex.step SA01" \
167152
-i "pandas.RangeIndex.stop SA01" \
168-
-i "pandas.Series SA01" \
169-
-i "pandas.Series.__iter__ RT03,SA01" \
170153
-i "pandas.Series.add PR07" \
171-
-i "pandas.Series.backfill PR01,SA01" \
172154
-i "pandas.Series.case_when RT03" \
173-
-i "pandas.Series.cat PR07,SA01" \
155+
-i "pandas.Series.cat PR07" \
174156
-i "pandas.Series.cat.add_categories PR01,PR02" \
175157
-i "pandas.Series.cat.as_ordered PR01" \
176158
-i "pandas.Series.cat.as_unordered PR01" \
177-
-i "pandas.Series.cat.codes SA01" \
178159
-i "pandas.Series.cat.remove_categories PR01,PR02" \
179160
-i "pandas.Series.cat.remove_unused_categories PR01" \
180161
-i "pandas.Series.cat.rename_categories PR01,PR02" \
@@ -185,7 +166,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
185166
-i "pandas.Series.dt.ceil PR01,PR02" \
186167
-i "pandas.Series.dt.components SA01" \
187168
-i "pandas.Series.dt.day_name PR01,PR02" \
188-
-i "pandas.Series.dt.days SA01" \
189169
-i "pandas.Series.dt.days_in_month SA01" \
190170
-i "pandas.Series.dt.daysinmonth SA01" \
191171
-i "pandas.Series.dt.floor PR01,PR02" \
@@ -203,29 +183,20 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
203183
-i "pandas.Series.dt.tz_convert PR01,PR02" \
204184
-i "pandas.Series.dt.tz_localize PR01,PR02" \
205185
-i "pandas.Series.dt.unit GL08" \
206-
-i "pandas.Series.dtype SA01" \
207186
-i "pandas.Series.eq PR07,SA01" \
208187
-i "pandas.Series.floordiv PR07" \
209188
-i "pandas.Series.ge PR07,SA01" \
210189
-i "pandas.Series.gt PR07,SA01" \
211-
-i "pandas.Series.hasnans SA01" \
212-
-i "pandas.Series.is_monotonic_decreasing SA01" \
213-
-i "pandas.Series.is_monotonic_increasing SA01" \
214-
-i "pandas.Series.is_unique SA01" \
215190
-i "pandas.Series.kurt RT03,SA01" \
216191
-i "pandas.Series.kurtosis RT03,SA01" \
217192
-i "pandas.Series.le PR07,SA01" \
218193
-i "pandas.Series.list.__getitem__ SA01" \
219194
-i "pandas.Series.list.flatten SA01" \
220195
-i "pandas.Series.list.len SA01" \
221196
-i "pandas.Series.lt PR07,SA01" \
222-
-i "pandas.Series.max RT03" \
223-
-i "pandas.Series.mean RT03,SA01" \
224-
-i "pandas.Series.median RT03,SA01" \
225197
-i "pandas.Series.min RT03" \
226198
-i "pandas.Series.mod PR07" \
227199
-i "pandas.Series.mode SA01" \
228-
-i "pandas.Series.mul PR07" \
229200
-i "pandas.Series.ne PR07,SA01" \
230201
-i "pandas.Series.pad PR01,SA01" \
231202
-i "pandas.Series.plot PR02,SA01" \
@@ -243,7 +214,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
243214
-i "pandas.Series.rsub PR07" \
244215
-i "pandas.Series.rtruediv PR07" \
245216
-i "pandas.Series.sem PR01,RT03,SA01" \
246-
-i "pandas.Series.shape SA01" \
247217
-i "pandas.Series.skew RT03,SA01" \
248218
-i "pandas.Series.sparse PR01,SA01" \
249219
-i "pandas.Series.sparse.density SA01" \
@@ -253,7 +223,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
253223
-i "pandas.Series.sparse.sp_values SA01" \
254224
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \
255225
-i "pandas.Series.std PR01,RT03,SA01" \
256-
-i "pandas.Series.str PR01,SA01" \
257226
-i "pandas.Series.str.capitalize RT03" \
258227
-i "pandas.Series.str.casefold RT03" \
259228
-i "pandas.Series.str.center RT03,SA01" \
@@ -312,12 +281,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
312281
-i "pandas.Timedelta.view SA01" \
313282
-i "pandas.TimedeltaIndex.as_unit RT03,SA01" \
314283
-i "pandas.TimedeltaIndex.components SA01" \
315-
-i "pandas.TimedeltaIndex.days SA01" \
316284
-i "pandas.TimedeltaIndex.microseconds SA01" \
317285
-i "pandas.TimedeltaIndex.nanoseconds SA01" \
318286
-i "pandas.TimedeltaIndex.seconds SA01" \
319287
-i "pandas.TimedeltaIndex.to_pytimedelta RT03,SA01" \
320-
-i "pandas.Timestamp PR07,SA01" \
321288
-i "pandas.Timestamp.as_unit SA01" \
322289
-i "pandas.Timestamp.asm8 SA01" \
323290
-i "pandas.Timestamp.astimezone SA01" \
@@ -326,13 +293,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
326293
-i "pandas.Timestamp.ctime SA01" \
327294
-i "pandas.Timestamp.date SA01" \
328295
-i "pandas.Timestamp.day GL08" \
329-
-i "pandas.Timestamp.day_name SA01" \
330-
-i "pandas.Timestamp.day_of_week SA01" \
331-
-i "pandas.Timestamp.day_of_year SA01" \
332-
-i "pandas.Timestamp.dayofweek SA01" \
333-
-i "pandas.Timestamp.dayofyear SA01" \
334-
-i "pandas.Timestamp.days_in_month SA01" \
335-
-i "pandas.Timestamp.daysinmonth SA01" \
336296
-i "pandas.Timestamp.dst SA01" \
337297
-i "pandas.Timestamp.floor SA01" \
338298
-i "pandas.Timestamp.fold GL08" \
@@ -343,9 +303,9 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
343303
-i "pandas.Timestamp.isocalendar SA01" \
344304
-i "pandas.Timestamp.isoformat SA01" \
345305
-i "pandas.Timestamp.isoweekday SA01" \
346-
-i "pandas.Timestamp.max PR02,PR07,SA01" \
306+
-i "pandas.Timestamp.max PR02" \
347307
-i "pandas.Timestamp.microsecond GL08" \
348-
-i "pandas.Timestamp.min PR02,PR07,SA01" \
308+
-i "pandas.Timestamp.min PR02" \
349309
-i "pandas.Timestamp.minute GL08" \
350310
-i "pandas.Timestamp.month GL08" \
351311
-i "pandas.Timestamp.month_name SA01" \
@@ -385,11 +345,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
385345
-i "pandas.Timestamp.weekday SA01" \
386346
-i "pandas.Timestamp.weekofyear SA01" \
387347
-i "pandas.Timestamp.year GL08" \
388-
-i "pandas.api.extensions.ExtensionArray SA01" \
389-
-i "pandas.api.extensions.ExtensionArray._accumulate RT03,SA01" \
390-
-i "pandas.api.extensions.ExtensionArray._concat_same_type PR07,SA01" \
391-
-i "pandas.api.extensions.ExtensionArray._formatter SA01" \
392-
-i "pandas.api.extensions.ExtensionArray._from_sequence SA01" \
393348
-i "pandas.api.extensions.ExtensionArray._from_sequence_of_strings SA01" \
394349
-i "pandas.api.extensions.ExtensionArray._hash_pandas_object RT03,SA01" \
395350
-i "pandas.api.extensions.ExtensionArray._pad_or_backfill PR01,RT03,SA01" \

doc/source/user_guide/basics.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -160,11 +160,10 @@ Here is a sample (using 100 column x 100,000 row ``DataFrames``):
160160
.. csv-table::
161161
:header: "Operation", "0.11.0 (ms)", "Prior Version (ms)", "Ratio to Prior"
162162
:widths: 25, 25, 25, 25
163-
:delim: ;
164163

165-
``df1 > df2``; 13.32; 125.35; 0.1063
166-
``df1 * df2``; 21.71; 36.63; 0.5928
167-
``df1 + df2``; 22.04; 36.50; 0.6039
164+
``df1 > df2``, 13.32, 125.35, 0.1063
165+
``df1 * df2``, 21.71, 36.63, 0.5928
166+
``df1 + df2``, 22.04, 36.50, 0.6039
168167

169168
You are highly encouraged to install both libraries. See the section
170169
:ref:`Recommended Dependencies <install.recommended_dependencies>` for more installation info.

doc/source/user_guide/gotchas.rst

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -315,19 +315,8 @@ Why not make NumPy like R?
315315

316316
Many people have suggested that NumPy should simply emulate the ``NA`` support
317317
present in the more domain-specific statistical programming language `R
318-
<https://www.r-project.org/>`__. Part of the reason is the NumPy type hierarchy:
319-
320-
.. csv-table::
321-
:header: "Typeclass","Dtypes"
322-
:widths: 30,70
323-
:delim: |
324-
325-
``numpy.floating`` | ``float16, float32, float64, float128``
326-
``numpy.integer`` | ``int8, int16, int32, int64``
327-
``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
328-
``numpy.object_`` | ``object_``
329-
``numpy.bool_`` | ``bool_``
330-
``numpy.character`` | ``bytes_, str_``
318+
<https://www.r-project.org/>`__. Part of the reason is the
319+
`NumPy type hierarchy <https://numpy.org/doc/stable/user/basics.types.html>`__.
331320

332321
The R language, by contrast, only has a handful of built-in data types:
333322
``integer``, ``numeric`` (floating-point), ``character``, and

doc/source/user_guide/groupby.rst

Lines changed: 37 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -506,29 +506,28 @@ listed below, those with a ``*`` do *not* have an efficient, GroupBy-specific, i
506506
.. csv-table::
507507
:header: "Method", "Description"
508508
:widths: 20, 80
509-
:delim: ;
510-
511-
:meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
512-
:meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
513-
:meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
514-
:meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
515-
:meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
516-
:meth:`~.DataFrameGroupBy.idxmax`;Compute the index of the maximum value in each group
517-
:meth:`~.DataFrameGroupBy.idxmin`;Compute the index of the minimum value in each group
518-
:meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
519-
:meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
520-
:meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
521-
:meth:`~.DataFrameGroupBy.median`;Compute the median of each group
522-
:meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
523-
:meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
524-
:meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
525-
:meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
526-
:meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
527-
:meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
528-
:meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
529-
:meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
530-
:meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
531-
:meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group
509+
510+
:meth:`~.DataFrameGroupBy.any`,Compute whether any of the values in the groups are truthy
511+
:meth:`~.DataFrameGroupBy.all`,Compute whether all of the values in the groups are truthy
512+
:meth:`~.DataFrameGroupBy.count`,Compute the number of non-NA values in the groups
513+
:meth:`~.DataFrameGroupBy.cov` * ,Compute the covariance of the groups
514+
:meth:`~.DataFrameGroupBy.first`,Compute the first occurring value in each group
515+
:meth:`~.DataFrameGroupBy.idxmax`,Compute the index of the maximum value in each group
516+
:meth:`~.DataFrameGroupBy.idxmin`,Compute the index of the minimum value in each group
517+
:meth:`~.DataFrameGroupBy.last`,Compute the last occurring value in each group
518+
:meth:`~.DataFrameGroupBy.max`,Compute the maximum value in each group
519+
:meth:`~.DataFrameGroupBy.mean`,Compute the mean of each group
520+
:meth:`~.DataFrameGroupBy.median`,Compute the median of each group
521+
:meth:`~.DataFrameGroupBy.min`,Compute the minimum value in each group
522+
:meth:`~.DataFrameGroupBy.nunique`,Compute the number of unique values in each group
523+
:meth:`~.DataFrameGroupBy.prod`,Compute the product of the values in each group
524+
:meth:`~.DataFrameGroupBy.quantile`,Compute a given quantile of the values in each group
525+
:meth:`~.DataFrameGroupBy.sem`,Compute the standard error of the mean of the values in each group
526+
:meth:`~.DataFrameGroupBy.size`,Compute the number of values in each group
527+
:meth:`~.DataFrameGroupBy.skew` * ,Compute the skew of the values in each group
528+
:meth:`~.DataFrameGroupBy.std`,Compute the standard deviation of the values in each group
529+
:meth:`~.DataFrameGroupBy.sum`,Compute the sum of the values in each group
530+
:meth:`~.DataFrameGroupBy.var`,Compute the variance of the values in each group
532531

533532
Some examples:
534533

@@ -832,19 +831,18 @@ The following methods on GroupBy act as transformations.
832831
.. csv-table::
833832
:header: "Method", "Description"
834833
:widths: 20, 80
835-
:delim: ;
836-
837-
:meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
838-
:meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
839-
:meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
840-
:meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
841-
:meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
842-
:meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
843-
:meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
844-
:meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
845-
:meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
846-
:meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
847-
:meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group
834+
835+
:meth:`~.DataFrameGroupBy.bfill`,Back fill NA values within each group
836+
:meth:`~.DataFrameGroupBy.cumcount`,Compute the cumulative count within each group
837+
:meth:`~.DataFrameGroupBy.cummax`,Compute the cumulative max within each group
838+
:meth:`~.DataFrameGroupBy.cummin`,Compute the cumulative min within each group
839+
:meth:`~.DataFrameGroupBy.cumprod`,Compute the cumulative product within each group
840+
:meth:`~.DataFrameGroupBy.cumsum`,Compute the cumulative sum within each group
841+
:meth:`~.DataFrameGroupBy.diff`,Compute the difference between adjacent values within each group
842+
:meth:`~.DataFrameGroupBy.ffill`,Forward fill NA values within each group
843+
:meth:`~.DataFrameGroupBy.pct_change`,Compute the percent change between adjacent values within each group
844+
:meth:`~.DataFrameGroupBy.rank`,Compute the rank of each value within each group
845+
:meth:`~.DataFrameGroupBy.shift`,Shift values up or down within each group
848846

849847
In addition, passing any built-in aggregation method as a string to
850848
:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
@@ -1092,11 +1090,10 @@ efficient, GroupBy-specific, implementation.
10921090
.. csv-table::
10931091
:header: "Method", "Description"
10941092
:widths: 20, 80
1095-
:delim: ;
10961093

1097-
:meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
1098-
:meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
1099-
:meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
1094+
:meth:`~.DataFrameGroupBy.head`,Select the top row(s) of each group
1095+
:meth:`~.DataFrameGroupBy.nth`,Select the nth row(s) of each group
1096+
:meth:`~.DataFrameGroupBy.tail`,Select the bottom row(s) of each group
11001097

11011098
Users can also use transformations along with Boolean indexing to construct complex
11021099
filtrations within groups. For example, suppose we are given groups of products and

doc/source/user_guide/indexing.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,14 @@ well). Any of the axes accessors may be the null slice ``:``. Axes left out of
9494
the specification are assumed to be ``:``, e.g. ``p.loc['a']`` is equivalent to
9595
``p.loc['a', :]``.
9696

97-
.. csv-table::
98-
:header: "Object Type", "Indexers"
99-
:widths: 30, 50
100-
:delim: ;
10197

102-
Series; ``s.loc[indexer]``
103-
DataFrame; ``df.loc[row_indexer,column_indexer]``
98+
.. ipython:: python
99+
100+
ser = pd.Series(range(5), index=list("abcde"))
101+
ser.loc[["a", "c", "e"]]
102+
103+
df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list("abcde"), columns=list("abcde"))
104+
df.loc[["a", "c", "e"], ["b", "d"]]
104105
105106
.. _indexing.basics:
106107

@@ -116,10 +117,9 @@ indexing pandas objects with ``[]``:
116117
.. csv-table::
117118
:header: "Object Type", "Selection", "Return Value Type"
118119
:widths: 30, 30, 60
119-
:delim: ;
120120

121-
Series; ``series[label]``; scalar value
122-
DataFrame; ``frame[colname]``; ``Series`` corresponding to colname
121+
Series, ``series[label]``, scalar value
122+
DataFrame, ``frame[colname]``, ``Series`` corresponding to colname
123123

124124
Here we construct a simple time series data set to use for illustrating the
125125
indexing functionality:

0 commit comments

Comments
 (0)