-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
ApplyApply, Aggregate, Transform, MapApply, Aggregate, Transform, MapArrowpyarrow functionalitypyarrow functionalityBugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsGroupbypyarrow dtype retentionop with pyarrow dtype -> expect pyarrow resultop with pyarrow dtype -> expect pyarrow result
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
pd.options.mode.dtype_backend = 'pyarrow'
df = pd.DataFrame({
'tags': pd.Series([1,1,1,2,2,2,3,3,3,4,4,4,5,5,5],dtype='int64[pyarrow]'),
'value': pd.Series(np.random.rand(15),dtype='double[pyarrow]')
})
result1 = df.groupby('tags')['value'].transform(lambda x: x.sum())
print(result1.dtype)
result2 = df.groupby('tags')['value'].transform('sum')
print(result2.dtype)
result3 = df.groupby('tags')['value'].apply(lambda x: x.sum())
print(result3.dtype)
result4 = df.groupby('tags')['value'].apply('sum')
print(result4.dtype)
Issue Description
Currently having a look at the RC for my current work project. I noticed when using the lambda function with groupby
together with apply
or transform
the dtype changes from double[pyarrow] back to float64.
Expected Behavior
I'd expect that we consistently get the same datatype
Installed Versions
2.0.0rc0
Metadata
Metadata
Assignees
Labels
ApplyApply, Aggregate, Transform, MapApply, Aggregate, Transform, MapArrowpyarrow functionalitypyarrow functionalityBugDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsGroupbypyarrow dtype retentionop with pyarrow dtype -> expect pyarrow resultop with pyarrow dtype -> expect pyarrow result