Skip to content

Commit 1617a59

Browse files
committed
DOC: Replace @appender decorator with inline docstrings for groupby method
- Removes @appender decorator from DataFrame.groupby in core/frame.py - Removes @appender decorator from Series.groupby in core/series.py - Replaces with inline docstrings for both methods - Addresses issue #62437
1 parent 3085f9f commit 1617a59

File tree

2 files changed

+355
-2
lines changed

2 files changed

+355
-2
lines changed

pandas/core/frame.py

Lines changed: 187 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9392,7 +9392,6 @@ def update(
93929392
"""
93939393
)
93949394
)
9395-
@Appender(_shared_docs["groupby"] % _shared_doc_kwargs)
93969395
@deprecate_nonkeyword_arguments(
93979396
Pandas4Warning, allowed_args=["self", "by", "level"], name="groupby"
93989397
)
@@ -9406,6 +9405,193 @@ def groupby(
94069405
observed: bool = True,
94079406
dropna: bool = True,
94089407
) -> DataFrameGroupBy:
9408+
"""
9409+
Group DataFrame using a mapper or by a Series of columns.
9410+
9411+
A groupby operation involves some combination of splitting the
9412+
object, applying a function, and combining the results. This can be
9413+
used to group large amounts of data and compute operations on these
9414+
groups.
9415+
9416+
Parameters
9417+
----------
9418+
by : mapping, function, label, pd.Grouper or list of such
9419+
Used to determine the groups for the groupby.
9420+
If ``by`` is a function, it's called on each value of the object's
9421+
index. If a dict or Series is passed, the Series or dict VALUES
9422+
will be used to determine the groups (the Series' values are first
9423+
aligned; see ``.align()`` method). If a list or ndarray of length
9424+
equal to the selected axis is passed (see the `groupby user guide
9425+
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
9426+
the values are used as-is to determine the groups. A label or list
9427+
of labels may be passed to group by the columns in ``self``.
9428+
Notice that a tuple is interpreted as a (single) key.
9429+
axis : {0 or 'index', 1 or 'columns'}, default 0
9430+
Split along rows (0) or columns (1). For `Series` this parameter
9431+
is unused and defaults to 0.
9432+
level : int, level name, or sequence of such, default None
9433+
If the axis is a MultiIndex (hierarchical), group by a particular
9434+
level or levels. Do not specify both ``by`` and ``level``.
9435+
as_index : bool, default True
9436+
For aggregated output, return object with group labels as the
9437+
index. Only relevant for DataFrame input. as_index=False is
9438+
effectively "SQL-style" grouped output.
9439+
sort : bool, default True
9440+
Sort group keys. Get better performance by turning this off.
9441+
Note this does not influence the order of observations within each
9442+
group. Groupby preserves the order of rows within each group.
9443+
9444+
.. versionchanged:: 2.0.0
9445+
9446+
Specifying ``sort=False`` with an ordered categorical grouper will no
9447+
longer sort the values.
9448+
9449+
group_keys : bool, default True
9450+
When calling apply and the ``by`` argument produces a like-indexed
9451+
(i.e. :ref:`a transform <groupby.transform>`) result, add group keys to
9452+
index to identify pieces. By default group keys are not included
9453+
when the result's index (and column) labels match the inputs, and
9454+
are included otherwise.
9455+
9456+
.. versionchanged:: 1.5.0
9457+
9458+
Warns that ``group_keys`` will no longer be ignored when the
9459+
result from ``apply`` is a like-indexed Series or DataFrame.
9460+
Specify ``group_keys`` explicitly to include the group keys or
9461+
not.
9462+
9463+
.. versionchanged:: 2.0.0
9464+
9465+
``group_keys`` now defaults to ``True``.
9466+
9467+
observed : bool, default False
9468+
This only applies if any of the groupers are Categoricals.
9469+
If True: only show observed values for categorical groupers.
9470+
If False: show all values for categorical groupers.
9471+
dropna : bool, default True
9472+
If True, and if group keys contain NA values, NA values together
9473+
with row/column will be dropped.
9474+
If False, NA values will also be treated as the key in groups.
9475+
9476+
.. versionadded:: 1.1.0
9477+
9478+
Returns
9479+
-------
9480+
DataFrameGroupBy
9481+
Returns a groupby object that contains information about the groups.
9482+
9483+
See Also
9484+
--------
9485+
resample : Convenience method for frequency conversion and resampling
9486+
of time series.
9487+
9488+
Notes
9489+
-----
9490+
See the `user guide
9491+
<https://pandas.pydata.org/pandas-docs/stable/groupby.html>`__ for more
9492+
detailed usage and examples, including splitting an object into groups,
9493+
iterating through groups, selecting a group, aggregation, and more.
9494+
9495+
Examples
9496+
--------
9497+
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
9498+
... 'Parrot', 'Parrot'],
9499+
... 'Max Speed': [380., 370., 24., 26.]})
9500+
>>> df
9501+
Animal Max Speed
9502+
0 Falcon 380.0
9503+
1 Falcon 370.0
9504+
2 Parrot 24.0
9505+
3 Parrot 26.0
9506+
>>> df.groupby(['Animal']).mean()
9507+
Max Speed
9508+
Animal
9509+
Falcon 375.0
9510+
Parrot 25.0
9511+
9512+
**Hierarchical Indexes**
9513+
9514+
We can groupby different levels of a hierarchical index
9515+
using the `level` parameter:
9516+
9517+
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
9518+
... ['Captive', 'Wild', 'Captive', 'Wild']]
9519+
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
9520+
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
9521+
... index=index)
9522+
>>> df
9523+
Max Speed
9524+
Animal Type
9525+
Falcon Captive 390.0
9526+
Wild 350.0
9527+
Parrot Captive 30.0
9528+
Wild 20.0
9529+
>>> df.groupby(level=0).mean()
9530+
Max Speed
9531+
Animal
9532+
Falcon 370.0
9533+
Parrot 25.0
9534+
>>> df.groupby(level="Type").mean()
9535+
Max Speed
9536+
Type
9537+
Captive 210.0
9538+
Wild 185.0
9539+
9540+
We can also choose to include NA in group keys or not by setting
9541+
`dropna` parameter, the default setting is `True`.
9542+
9543+
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
9544+
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
9545+
9546+
>>> df.groupby(by=["b"]).sum()
9547+
a c
9548+
b
9549+
1.0 2 3
9550+
2.0 2 5
9551+
9552+
>>> df.groupby(by=["b"], dropna=False).sum()
9553+
a c
9554+
b
9555+
1.0 2 3
9556+
2.0 2 5
9557+
NaN 1 4
9558+
9559+
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
9560+
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
9561+
9562+
>>> df.groupby(by="a").sum()
9563+
b c
9564+
a
9565+
a 13.0 13.0
9566+
b 12.3 123.0
9567+
9568+
>>> df.groupby(by="a", dropna=False).sum()
9569+
b c
9570+
a
9571+
a 13.0 13.0
9572+
b 12.3 123.0
9573+
NaN 12.3 33.0
9574+
9575+
When using ``.apply()``, use ``group_keys`` to include or exclude the group keys.
9576+
The ``group_keys`` argument defaults to ``True`` (include).
9577+
9578+
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
9579+
... 'Parrot', 'Parrot'],
9580+
... 'Max Speed': [380., 370., 24., 26.]})
9581+
>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
9582+
Animal Max Speed
9583+
Animal
9584+
Falcon 0 Falcon 380.0
9585+
1 Falcon 370.0
9586+
Parrot 2 Parrot 24.0
9587+
3 Parrot 26.0
9588+
9589+
>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
9590+
Animal Max Speed
9591+
0 Falcon 380.0
9592+
1 Falcon 370.0
9593+
2 Parrot 24.0
9594+
3 Parrot 26.0"""
94099595
from pandas.core.groupby.generic import DataFrameGroupBy
94109596

94119597
if level is None and by is None:

pandas/core/series.py

Lines changed: 168 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1963,7 +1963,7 @@ def _set_name(
19631963
"""
19641964
)
19651965
)
1966-
@Appender(_shared_docs["groupby"] % _shared_doc_kwargs)
1966+
19671967
@deprecate_nonkeyword_arguments(
19681968
Pandas4Warning, allowed_args=["self", "by", "level"], name="groupby"
19691969
)
@@ -1977,6 +1977,173 @@ def groupby(
19771977
observed: bool = True,
19781978
dropna: bool = True,
19791979
) -> SeriesGroupBy:
1980+
1981+
"""
1982+
Group Series using a mapper or by a Series of columns.
1983+
1984+
A groupby operation involves some combination of splitting the
1985+
object, applying a function, and combining the results. This can be
1986+
used to group large amounts of data and compute operations on these
1987+
groups.
1988+
1989+
Parameters
1990+
----------
1991+
by : mapping, function, label, pd.Grouper or list of such
1992+
Used to determine the groups for the groupby.
1993+
If ``by`` is a function, it's called on each value of the object's
1994+
index. If a dict or Series is passed, the Series or dict VALUES
1995+
will be used to determine the groups (the Series' values are first
1996+
aligned; see ``.align()`` method). If a list or ndarray of length
1997+
equal to the selected axis is passed (see the `groupby user guide
1998+
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
1999+
the values are used as-is to determine the groups. A label or list
2000+
of labels may be passed to group by the columns in ``self``.
2001+
Notice that a tuple is interpreted as a (single) key.
2002+
axis : {0 or 'index', 1 or 'columns'}, default 0
2003+
Split along rows (0) or columns (1). For `Series` this parameter
2004+
is unused and defaults to 0.
2005+
level : int, level name, or sequence of such, default None
2006+
If the axis is a MultiIndex (hierarchical), group by a particular
2007+
level or levels. Do not specify both ``by`` and ``level``.
2008+
as_index : bool, default True
2009+
For aggregated output, return object with group labels as the
2010+
index. Only relevant for DataFrame input. as_index=False is
2011+
effectively "SQL-style" grouped output.
2012+
sort : bool, default True
2013+
Sort group keys. Get better performance by turning this off.
2014+
Note this does not influence the order of observations within each
2015+
group. Groupby preserves the order of rows within each group.
2016+
2017+
.. versionchanged:: 2.0.0
2018+
2019+
Specifying ``sort=False`` with an ordered categorical grouper will no
2020+
longer sort the values.
2021+
2022+
group_keys : bool, default True
2023+
When calling apply and the ``by`` argument produces a like-indexed
2024+
(i.e. :ref:`a transform <groupby.transform>`) result, add group keys to
2025+
index to identify pieces. By default group keys are not included
2026+
when the result's index (and column) labels match the inputs, and
2027+
are included otherwise.
2028+
2029+
.. versionchanged:: 1.5.0
2030+
2031+
Warns that ``group_keys`` will no longer be ignored when the
2032+
result from ``apply`` is a like-indexed Series or DataFrame.
2033+
Specify ``group_keys`` explicitly to include the group keys or
2034+
not.
2035+
2036+
.. versionchanged:: 2.0.0
2037+
2038+
``group_keys`` now defaults to ``True``.
2039+
2040+
observed : bool, default False
2041+
This only applies if any of the groupers are Categoricals.
2042+
If True: only show observed values for categorical groupers.
2043+
If False: show all values for categorical groupers.
2044+
dropna : bool, default True
2045+
If True, and if group keys contain NA values, NA values together
2046+
with row/column will be dropped.
2047+
If False, NA values will also be treated as the key in groups.
2048+
2049+
.. versionadded:: 1.1.0
2050+
2051+
Returns
2052+
-------
2053+
SeriesGroupBy
2054+
Returns a groupby object that contains information about the groups.
2055+
2056+
See Also
2057+
--------
2058+
resample : Convenience method for frequency conversion and resampling
2059+
of time series.
2060+
2061+
Notes
2062+
-----
2063+
See the `user guide
2064+
<https://pandas.pydata.org/pandas-docs/stable/groupby.html>`__ for more
2065+
detailed usage and examples, including splitting an object into groups,
2066+
iterating through groups, selecting a group, aggregation, and more.
2067+
2068+
Examples
2069+
--------
2070+
>>> ser = pd.Series([390., 350., 30., 20.],
2071+
... index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], name="Max Speed")
2072+
>>> ser
2073+
Falcon 390.0
2074+
Falcon 350.0
2075+
Parrot 30.0
2076+
Parrot 20.0
2077+
Name: Max Speed, dtype: float64
2078+
>>> ser.groupby(["a", "b", "a", "b"]).mean()
2079+
a 210.0
2080+
b 185.0
2081+
Name: Max Speed, dtype: float64
2082+
>>> ser.groupby(level=0).mean()
2083+
Falcon 370.0
2084+
Parrot 25.0
2085+
Name: Max Speed, dtype: float64
2086+
>>> ser.groupby(ser > 100).mean()
2087+
Max Speed
2088+
False 25.0
2089+
True 370.0
2090+
Name: Max Speed, dtype: float64
2091+
2092+
**Grouping by Indexes**
2093+
2094+
We can groupby different levels of a hierarchical index
2095+
using the `level` parameter:
2096+
2097+
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
2098+
... ['Captive', 'Wild', 'Captive', 'Wild']]
2099+
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
2100+
>>> ser = pd.Series([390., 350., 30., 20.], index=index, name="Max Speed")
2101+
>>> ser
2102+
Animal Type
2103+
Falcon Captive 390.0
2104+
Wild 350.0
2105+
Parrot Captive 30.0
2106+
Wild 20.0
2107+
Name: Max Speed, dtype: float64
2108+
>>> ser.groupby(level=0).mean()
2109+
Animal
2110+
Falcon 370.0
2111+
Parrot 25.0
2112+
Name: Max Speed, dtype: float64
2113+
>>> ser.groupby(level="Type").mean()
2114+
Type
2115+
Captive 210.0
2116+
Wild 185.0
2117+
Name: Max Speed, dtype: float64
2118+
2119+
We can also choose to include `NA` in group keys or not by defining
2120+
`dropna` parameter, the default setting is `True`.
2121+
2122+
>>> ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
2123+
>>> ser.groupby(level=0).sum()
2124+
a 3
2125+
b 3
2126+
dtype: int64
2127+
2128+
>>> ser.groupby(level=0, dropna=False).sum()
2129+
a 3
2130+
b 3
2131+
NaN 3
2132+
dtype: int64
2133+
2134+
>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
2135+
>>> ser = pd.Series([390., 350., 30., 20.], index=arrays, name="Max Speed")
2136+
>>> ser.groupby(["a", "b", "a", np.nan]).mean()
2137+
a 210.0
2138+
b 350.0
2139+
Name: Max Speed, dtype: float64
2140+
2141+
>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()
2142+
a 210.0
2143+
b 350.0
2144+
NaN 20.0
2145+
Name: Max Speed, dtype: float64"""
2146+
19802147
from pandas.core.groupby.generic import SeriesGroupBy
19812148

19822149
if level is None and by is None:

0 commit comments

Comments
 (0)