Skip to content

BUG: .describe() doesn't work for EAs #61707 #61760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
9b3c6ac
Fix describe() for ExtensionArrays with multiple internal dtypes
kernelism Jul 2, 2025
3550556
chore: remove redundant words in comment (#61759)
ianlv Jul 2, 2025
22f12fc
DEPS: bump pyarrow minimum version from 10.0 to 12.0 (#61723)
jorisvandenbossche Jul 3, 2025
b91fa1d
DEPR: object inference in to_stata (#56536)
jbrockmendel Jul 3, 2025
9dcce63
ENH: Allow third-party packages to register IO engines (#61642)
datapythonista Jul 3, 2025
391107a
Revert "ENH: Allow third-party packages to register IO engines" (#61767)
jbrockmendel Jul 3, 2025
51763f9
BUG: NA.__and__, __or__, __xor__ with np.bool_ objects (#61768)
jbrockmendel Jul 3, 2025
e5a1c10
BUG: Fix unpickling of string dtypes of legacy pandas versions (#61770)
Liam3851 Jul 7, 2025
2b471c8
DOC: add pandas 3.0 migration guide for the string dtype (#61705)
jorisvandenbossche Jul 7, 2025
0faaf5c
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW…
jorisvandenbossche Jul 7, 2025
cf1a11c
BUG[string]: incorrect index downcast in DataFrame.join (#61771)
jbrockmendel Jul 7, 2025
ebca3c5
TST: update expected dtype for sum of decimals with pyarrow 21+ (#61799)
jorisvandenbossche Jul 7, 2025
b9d5732
DOC: Add link to WebGL in pandas ecosystem (#61790)
star1327p Jul 7, 2025
be2cb8c
CLN: remove and udpate for outdated _item_cache (#61789)
chilin0525 Jul 7, 2025
ff8a607
DOC: prepare 2.3.1 whatsnew notes for release (#61794)
jorisvandenbossche Jul 7, 2025
d21ad1a
PERF: avoid object-dtype path in ArrowEA._explode (#61786)
jbrockmendel Jul 7, 2025
16fd208
TST: option_context bug on Mac GH#58055 (#61779)
jbrockmendel Jul 7, 2025
b5e441e
BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim…
jbrockmendel Jul 7, 2025
fea4f5b
REF: remove unreachable, stronger typing in parsers.pyx (#61785)
jbrockmendel Jul 7, 2025
7c2796d
[pre-commit.ci] pre-commit autoupdate (#61802)
pre-commit-ci[bot] Jul 7, 2025
d1a245c
DEPS: Bump NumPy and tzdata (#61806)
mroeschke Jul 8, 2025
d5f97ed
feature #49580: support new-style float_format string in to_csv (#61650)
pedromfdiogo Jul 8, 2025
f94b430
CI: Remove PyPy references in CI testing (#61814)
mroeschke Jul 9, 2025
e635c3e
TST[string]: update expecteds for using_string_dtype to fix xfails (#…
jbrockmendel Jul 10, 2025
b876c67
BUG: Fix Index.equals between object and string (#61541)
sanggon6107 Jul 10, 2025
9da2c8f
BUG: Require sample weights to sum to less than 1 when replace = True…
microslaw Jul 11, 2025
d785a3d
DOC: Update link to pytz documentation (#61821)
star1327p Jul 11, 2025
337d5fe
REF: separate out helpers in libparser (#61832)
jbrockmendel Jul 11, 2025
688e2a0
TST: Fix `test_mask_stringdtype` (#61830)
arthurlw Jul 11, 2025
e1328fc
TST: enable 2D tests for MaskedArrays, fix+test shift (#61826)
jbrockmendel Jul 11, 2025
fd7bfaa
BUG: Fix infer_dtype result for float with embedded pd.NA (#61624)
heoh Jul 11, 2025
e83b820
DOC: Correct error message in AbstractMethodError for methodtype argu…
Maaz-319 Jul 11, 2025
da7f2be
DOC: rm excessive backtick (#61839)
mattwang44 Jul 12, 2025
4f2aa4d
DOC: Update README.md to reference issues related to 'good first issu…
sivasweatha Jul 12, 2025
a2315af
BUG: Fix pivot_table margins to include NaN groups when dropna=False …
iabhi4 Jul 13, 2025
bc6ad14
Remove incorrect line in Series init docstring (#61849)
petern48 Jul 14, 2025
1d153bb
TST(string dtype): Resolve xfails in test_from_dummies (#60694)
rhshadrach Jul 15, 2025
43711d5
API: np.isinf on Index return Index[bool] (#61874)
jbrockmendel Jul 16, 2025
2c89a91
DOC: Add Raises section to to_numeric docstring (#61868)
tisjayy Jul 16, 2025
13bba34
String dtype: turn on by default (#61722)
jorisvandenbossche Jul 16, 2025
598b7d1
DOC: show Parquet examples with default engine (without explicit pyar…
jorisvandenbossche Jul 16, 2025
88cb152
DOC: update Parquet IO user guide on index handling and type support …
jorisvandenbossche Jul 16, 2025
042ac78
ERR: improve exception message from timedelta64-datetime64 (#61876)
jbrockmendel Jul 16, 2025
3e9237c
BUG: Timedelta with invalid keyword (#61883)
jbrockmendel Jul 16, 2025
d5eab1b
API: Index.__cmp__(Series) return NotImplemented (#61884)
jbrockmendel Jul 16, 2025
90b1c5d
DOC: make doc build run with string dtype enabled (#61864)
jorisvandenbossche Jul 17, 2025
6537afe
DOC: fix doctests for string dtype changes (top-level) (#61887)
jorisvandenbossche Jul 17, 2025
6fca116
BUG: disallow exotic np.datetime64 unit (#61882)
jbrockmendel Jul 17, 2025
4b18266
API: IncompatibleFrequency subclass TypeError (#61875)
jbrockmendel Jul 18, 2025
6a6a1ba
BUG: If both index and axis are passed to DataFrame.drop, raise a cle…
khemkaran10 Jul 18, 2025
8de38e8
BUG: fix padding for string categories in CategoricalIndex repr (#61894)
jorisvandenbossche Jul 19, 2025
9edf890
61760: merge with main
kernelism Jul 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -909,6 +909,7 @@ Other
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
- Bug in :meth:`Series.describe` where statistics with multiple dtypes for ExtensionArrays were coerced to ``float64`` which raised a ``DimensionalityError``` (:issue:`61707`)
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
Expand Down
13 changes: 13 additions & 0 deletions pandas/core/methods/describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
)
from typing import (
TYPE_CHECKING,
Any,
cast,
)

Expand Down Expand Up @@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
return names


def has_multiple_internal_dtypes(d: list[Any]) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be inlined since it is only used once

"""Check if the sequence has multiple internal dtypes."""
if not d:
return False

return any(type(item) != type(d[0]) for item in d)


def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
"""Describe series containing numerical data.

Expand Down Expand Up @@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
import pyarrow as pa

dtype = ArrowDtype(pa.float64())
elif has_multiple_internal_dtypes(d):
# GH61707: describe() doesn't work on EAs
# with multiple internal dtypes, so return object dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the relevant characteristic "multiple internal dtypes" or "entries that cant be cast to Float64"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latter makes more sense

dtype = None
else:
dtype = Float64Dtype()
elif series.dtype.kind in "iufb":
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/series/methods/test_describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,32 @@ def test_describe_empty_object(self):
assert np.isnan(result.iloc[2])
assert np.isnan(result.iloc[3])

def test_describe_multiple_dtypes(self):
"""
GH61707: describe() doesn't work on EAs which generate
statistics with multiple dtypes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can this be a comment instead of a docstring

"""
from decimal import Decimal

from pandas.tests.extension.decimal import to_decimal

s = Series(to_decimal([1, 2.5, 3]), dtype="decimal")

expected = Series(
[
3,
Decimal("2.166666666666666666666666667"),
Decimal("0.8498365855987974716713706849"),
Decimal("1"),
Decimal("3"),
],
index=["count", "mean", "std", "min", "max"],
dtype="object",
)

result = s.describe(percentiles=[])
tm.assert_series_equal(result, expected)

def test_describe_with_tz(self, tz_naive_fixture):
# GH 21332
tz = tz_naive_fixture
Expand Down
Loading