ENH: Allow custom aggregation functions with multiple return values.

### Feature Type

- [ ] Adding new functionality to pandas

- [X] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I want to aggregate multiple columns with bootstrapping (scipy.stats.bootstrap). However, this aggregation produces multiple scalar outputs, e.g., the lower and upper bound of the confidence interval, and the mean (= at least 3 columns). I want to apply this aggregation to multiple columns independently (hence 'apply is not really suitable). However, `agg` can only handle aggregation function with one scalar output. Bootstrapping is a 101 task in data analysis and I don't really understand why this is so complicated to implement in pandas. I know that theoretically I could write one aggregate function for the lower, upper bound and the mean separately. but that would 1. take three times longer and 2. would mathematically be questionable as the bound and the mean might come from different random processes. Fixing the randomness would be one way, but still three times longer is not good...

### Feature Description

Here is some demo code

```python
import pandas as pd
import numpy as np

def custom_aggregate(data):
    # return 1  # Works
    return pd.Series({
        'mean_ci_lower': [0.3], # hardcoded for demo
        'mean_ci_upper': [0.5], # hardcoded for demo
        'real_mean': np.mean(data),
    })

def main():
    data = {
    'acctrain': [0.496070, 0.579231, 0.1, 0.3],
    'acctest':  [0.455256, 0.147513, 0.1, 0.5],
    'experimentname': ['experimentA', 'experimentB', 'experimentA', 'experimentB']
    }

    df = pd.DataFrame(data)
    print(df)
    print("Aggregated: ")
    df2 = df.groupby(["experimentname"]).agg({
        "acctrain": custom_aggregate,
        "acctest": custom_aggregate,
    }).reset_index()
    print(df2)

if __name__ == '__main__':
    main()
```

I would expect to get multi-indexed data frame like 
```
                acctrain                                               acctest                                                   
                bs_mean_ci_lower    bs_mean_ci_upper    real_mean      bs_mean_ci_lower    bs_mean_ci_upper    real_mean    
experimentname                                                                                                              
experimentA     0.3                 0.4                 0.298035       0.3                 0.4                 0.277628     
experimentB     0.3                 0.4                 0.439616       0.3                 0.4                 0.323757     
```



### Alternative Solutions

The hacky workaround I use right now is something along the lines of 
```python
def custom_aggregate(data):
    return (1,2)
```
and later
```python
df2[['acctrain_ci_lower', 'acctrain_ci_upper']] = pd.DataFrame(df2['acctrain'].tolist(), index=df2.index)
df2[['acctest_ci_lower', 'acctest_ci_upper']] = pd.DataFrame(df2['acctest'].tolist(), index=df2.index)
```

So I return a tuple and then extract the tuple later into multiple columns. This requires a lot hardcoding...


### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Allow custom aggregation functions with multiple return values. #59781

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Allow custom aggregation functions with multiple return values. #59781

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions