ENH: Generalize groupby to better support ExtensionArray

### Feature Type

- [ ] Adding new functionality to pandas

- [X] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I have written changes to add `uncertainties` to Pint (https://github.com/hgrecco/pint/pull/1615) and Pint-Pandas (https://github.com/hgrecco/pint-pandas/pull/140).  New developments in Pint and Pint-Pandas now deeply embrace the ExtensionArray API (which I also encouraged), but it's now causing my changes grief.

The uncertainties package uses wrapping functions to interoperate with floats and NumPy (https://pythonhosted.org/uncertainties/index.html).  The uncertainty datatype is `<class 'uncertainties.core.AffineScalarFunc'>`, which is not hashable.  I have largely been able to work around this within the EA framework, but I'm stuck on how to make them work with `groupby` and related.  I wonder whether the groupby functionality can be generalized to work better with unhashable EA types.

### Feature Description

Here's an example of a small change that allows my EA type to interoperate with `groupby`.  Specifically, it does not force the assumption that a NaN value is np.nan, but is whatever value `isna` says is a NaN value.  In the case of uncertainties, it's typically `ufloat(np.nan, 0)`, but it could be a `UFloat` with either a np.nan nominal value or np.nan error value, or both.

```
diff --git a/pandas/core/groupby/groupby.py b/pandas/core/groupby/groupby.py
index 1a17fef071..98e9c53c37 100644
--- a/pandas/core/groupby/groupby.py
+++ b/pandas/core/groupby/groupby.py
@@ -3080,7 +3080,10 @@ class GroupBy(BaseGroupBy[NDFrameT]):
                 """Helper function for first item that isn't NA."""
                 arr = x.array[notna(x.array)]
                 if not len(arr):
-                    return np.nan
+                    nan_arr = x.array[isna(x.array)]
+                    if not len(nan_arr):
+                        return np.nan
+                    return nan_arr[0]
                 return arr[0]
 
             if isinstance(obj, DataFrame):
```

But here's the really sticky problem:

```
diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py
index f0e4484f69..8b7f8e1aee 100644
--- a/pandas/core/groupby/ops.py
+++ b/pandas/core/groupby/ops.py
@@ -587,7 +587,7 @@ class BaseGrouper:
 
     def get_iterator(
         self, data: NDFrameT, axis: AxisInt = 0
-    ) -> Iterator[tuple[Hashable, NDFrameT]]:
+    ) -> Iterator[tuple[Hashable, NDFrameT]]:  # Does not work with non-hashable EA types
         """
         Groupby iterator
```

In the PintArray world (the ExtensionArray implemented in PintPandas) I've been able to make `factorize` functionality work independently of any Pandas changes, but the factorized results don't survive subsequent groupby actions (that come from splitting).  And that's where I'm stuck.

@andrewgsavage @rhshadrach @lebigot @hgrecco

### Alternative Solutions

If the Pandas test framework could xfail unhashable EA types for groupby tests, that might be a workaround acceptable workaround (need to check with Pint and Pint-Pandas maintainers).

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Generalize groupby to better support ExtensionArray #53904

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Generalize groupby to better support ExtensionArray #53904

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions