Skip to content

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Sep 7, 2024

Context is here: #59307 (comment)

@rhshadrach rhshadrach added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API - Consistency Internal Consistency of API/Behavior labels Sep 7, 2024
@rhshadrach rhshadrach added this to the 3.0 milestone Sep 7, 2024
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this change. LGTM

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just checked the whatsnew note)

@mroeschke mroeschke merged commit c8813ae into pandas-dev:main Oct 6, 2024
51 checks passed
@mroeschke
Copy link
Member

Thanks @rhshadrach

@phofl
Copy link
Member

phofl commented Oct 7, 2024

This broke the following case:

df = pd.DataFrame({"a": [1, 2, 1, 2], "b": np.nan})
df.groupby("a").value_counts()

with

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2024.1/scratches/scratch_1.py", line 100, in <module>
    df.groupby("a").value_counts()
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 2723, in value_counts
    return self._value_counts(subset, normalize, sort, ascending, dropna)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 2535, in _value_counts
    result_series = cast(Series, gb.size())
                                 ^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 2747, in size
    result = self._grouper.size()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 696, in size
    ids = self.ids
          ^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 750, in ids
    return self.result_index_and_ids[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 769, in result_index_and_ids
    result_index, ids = self._ob_index_and_ids(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 884, in _ob_index_and_ids
    ob_ids = np.where(ob_ids == -1, -1, index.take(ob_ids))
                                        ^^^^^^^^^^^^^^^^^^
IndexError: cannot do a non-empty take from an empty axes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API - Consistency Internal Consistency of API/Behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: GroupBy.value_counts doesn't preserve original order for non-grouping rows

4 participants