Skip to content

ENH: pd.DataFrame.describe(): rename top to mode #62056

@Krisselack

Description

@Krisselack

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

In pd.DataFrame.describe(), the most frequent value is termed 'top'.

The top is the most common value.

But there exists a statistical term 'mode' (https://en.wikipedia.org/wiki/Mode_(statistics)) depicting the same.
To reduce disambiguity I propose to rename top to mode, both in the docs as well as in the print-out of the function.

Feature Description

I guess it would start here (replacing top with mode):

def describe_categorical_1d(
    data: Series,
    percentiles_ignored: Sequence[float],
) -> Series:
    """Describe series containing categorical data.

    Parameters
    ----------
    data : Series
        Series to be described.
    percentiles_ignored : list-like of numbers
        Ignored, but in place to unify interface.
    """
    names = ["count", "unique", "mode", "freq"]
    objcounts = data.value_counts()
    count_unique = len(objcounts[objcounts != 0])
    if count_unique > 0:
        mode, freq = objcounts.index[0], objcounts.iloc[0]
        dtype = None
    else:
        # If the DataFrame is empty, set 'mode' and 'freq' to None
        # to maintain output shape consistency
        mode, freq = np.nan, np.nan
        dtype = "object"

    result = [data.count(), count_unique, mode, freq]

    from pandas import Series

    return Series(result, index=names, name=data.name, dtype=dtype)

Alternative Solutions

Leave as it is.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions