Skip to content

[SPARK-55296][PS][FOLLOW-UP] Fix CoW mode not to break groupby#54392

Closed
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-55296/fix_groupby
Closed

[SPARK-55296][PS][FOLLOW-UP] Fix CoW mode not to break groupby#54392
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-55296/fix_groupby

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Feb 20, 2026

What changes were proposed in this pull request?

This is a follow-up of #54375.

Fixes CoW mode not to break groupby.
Delays to disconnect the anchor to when actually being updated.

Why are the changes needed?

The CoW mode was supported at #54375, but it disconnected the anchor too early, causing to break groupby.

>>> import pandas as pd
>>> import pyspark.pandas as ps
>>>
>>> pdf1 = pd.DataFrame({"C": [0.362, 0.227, 1.267, -0.562], "B": [1, 2, 3, 4]})
>>> pdf2 = pd.DataFrame({"A": [1, 1, 2, 2]})
>>>
>>> psdf1 = ps.from_pandas(pdf1)
>>> psdf2 = ps.from_pandas(pdf2)
>>>
>>> pdf1.groupby([pdf1.C, pdf2.A]).agg("sum").sort_index()
          B
C      A
-0.562 2  4
 0.227 1  2
 0.362 1  1
 1.267 2  3
>>> psdf1.groupby([psdf1.C, psdf2.A]).agg("sum").sort_index()
              C  B
C      A
-0.562 2 -0.562  4
 0.227 1  0.227  2
 0.362 1  0.362  1
 1.267 2  1.267  3

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

Codex (GPT-5.3-Codex)

@ueshin
Copy link
Member Author

ueshin commented Feb 20, 2026

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments