Skip to content

DOC: DataFrameGroupBy.filter documentation is misleading #61300

@adamreeve

Description

@adamreeve

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.filter.html and https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.SeriesGroupBy.filter.html

Documentation problem

Both DataFrameGroupBy.filter and SeriesGroupBy.filter state that they "filter elements from groups".

This is not true, these methods filter whole groups. If you attempt to filter individual elements within a group by returning a series of boolean you get an error:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar'],
                   'B' : [1, 2, 3, 4, 5, 6],
                  'C' : [2.0, 5., 8., 1., 2., 9.]})
df.groupby("A").filter(lambda x: x['B'] > 1).sum()
TypeError: filter function returned a Series, but expected a scalar bool

Suggested fix for documentation

Suggested documentation:

Filter groups that don’t satisfy a criterion.

Groups are filtered if they do not satisfy the boolean criterion specified by func.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions