-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/77969964/deprecation-warning-with-groupby-apply
Question about pandas
I find it strange that in the docs it is stated that the include_groups option won't allow True as an option starting in 3.0. I use groupby(...).apply(...)
all the time working with tabular data and fully expect the group by columns to be included in the result. I feel like I have to be mis-understanding something about the warning because I can't fathom not having to select all the columns I need from the group by every time I use the operation. This is a very common data pattern I use:
summarize_df = (
df.
groupby(['user_id']).
apply(lambda df: pd.Series(dict(
num_orders = len(df),
aov = np.mean(df['amount']),
first_order = np.min(df['order_date'])
))).
reset_index()
)
Which will give me another dataframe that I can use and has user_id
in the column. The stack overflow question I linked seems to suggest the only way to get the user_id as a column in the future will be to 1.) add it to the index first or 2.) selecting the group by column after the apply call, but I would have to copy and re-select every column I created in the apply statement as well. The documentation for DataFrameGroupBy.apply doesn't make it clear how you'd keep the group by column.
Is this the intended behavior of this change? Is it assumed every user who is doing what I am doing is using agg
instead of apply
? Thanks!