Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v2.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ Other enhancements
when using ``np.array()`` or ``np.asarray()`` on pandas objects) has been
updated to work correctly with NumPy >= 2 (:issue:`57739`)
- The :meth:`~Series.sum` reduction is now implemented for ``StringDtype`` columns (:issue:`59853`)
-
- Median percentile is only included in :meth:`~Series.describe` when a blank
list is passed (:issue:`60550`).

.. ---------------------------------------------------------------------------
.. _whatsnew_230.notable_bug_fixes:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10795,7 +10795,8 @@ def describe(
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
75th percentiles. If a blank list is passed, then returns
only the 50th percentile value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this should be the behavior. If I pass an empty list, I should not get any percentiles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to preserve the default percentile behavior. I will change this so that percentiles are omitted if a blank list is passed.

Copy link
Member

@asishm asishm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove the 50th percentile without at least a deprecation warning. Reporting the median by default is incredibly helpful (by default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we should not be changing the default behavior. Recommend setting no_default as the default value and maintaining the default behavior when it is not changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so we will add an argument no_default : bool = False as the argument and preserve the behavior with control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach I got confused. the default for percentiles is already None. not sure if it's worth changing to lib.no_default.

@ZenithClown I believe what @rhshadrach means is to change the behavior so that the 50th percentile is removed if a user explicitly passes in percentiles=[] instead of the default None (or the suggestion to change to no_default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asishm - me too! Agreed leaving the default as None is fine.

include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
Expand Down
11 changes: 6 additions & 5 deletions pandas/core/methods/describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def describe_ndframe(
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should fall between 0 and 1.
The default is ``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
75th percentiles. If a blank list is passed, then returns only the
50th percentile value.

Returns
-------
Expand Down Expand Up @@ -351,13 +352,13 @@ def _refine_percentiles(
# explicit conversion of `percentiles` to list
percentiles = list(percentiles)

# median should be included only if blank iterable is passed
if len(percentiles) == 0:
return np.array([0.5])

# get them all to be in [0, 1]
validate_percentile(percentiles)

# median should always be included
if 0.5 not in percentiles:
percentiles.append(0.5)

percentiles = np.asarray(percentiles)

# sort and check for duplicates
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/reductions/test_describe_ndframe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# -*- encoding: utf-8 -*-

"""
We test the describe_ndframe function.
"""

import numpy as np
import pytest

from pandas.core.methods.describe import _refine_percentiles

@pytest.mark.parametrize(
"percentiles_, expected", [
(None, np.array([0.25, 0.5, 0.75])),
([], np.array([0.5])),
([0.3, 0.6], np.array([0.3, 0.6])),
]
)
def test_refine_percentiles(percentiles_, expected):
"""
Check the performance of the _refine_percentiles when multiple
values are passed.
"""

assert np.array_equal(_refine_percentiles(percentiles_), expected)
Loading