Skip to content

Conversation

@Matt711
Copy link
Contributor

@Matt711 Matt711 commented Jan 26, 2026

Description

Contributes to #18659

xref #19827 The majority of xarray tests should be fixed in pandas 3. See #19827 (comment)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 added bug Something isn't working non-breaking Non-breaking change labels Jan 26, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 26, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jan 26, 2026
@Matt711
Copy link
Contributor Author

Matt711 commented Jan 26, 2026

/ok to test

@GPUtester GPUtester moved this to In Progress in cuDF Python Jan 26, 2026
@Matt711
Copy link
Contributor Author

Matt711 commented Jan 26, 2026

/ok to test 585504c

@Matt711 Matt711 marked this pull request as ready for review January 27, 2026 00:05
@Matt711 Matt711 requested a review from a team as a code owner January 27, 2026 00:05
Comment on lines +319 to +330
def _to_xarray(self):
# Call xarray conversion functions directly with self (the proxy object).
# We must pass the proxy (self), not the slow pandas object, because xarray
# does isinstance checks against pd.MultiIndex and pd.api.extensions.ExtensionArray.
# After cudf.pandas.install(), these refer to proxy classes. The slow object
# contains real pandas types that don't pass isinstance checks against the proxy
# classes.
xr = import_optional_dependency("xarray")
if self.ndim == 1:
return xr.DataArray.from_series(self)
else:
return xr.Dataset.from_dataframe(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it but can't we get around this problem by implementing to_xarray in cudf.DataFrame and cudf.Series and then raise NotImplementedError() instead of this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that at first, but I think I ran into the problem of xarray running an isinstance check between a real and proxy MultiIndex. Let me try it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think this is correct. If we go with your approach,

DataFrame.to_xarray --> cudf.DataFrame.to_xarray --> Fails and fallsback to slow --> disable module acceleration and then call pd.DataFrame.to_xarray --> This will fail in xarray because of the isinstance checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could probably patch xarray somehow to be more cudf.pandas friendly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think this is correct. If we go with your approach,

DataFrame.to_xarray --> cudf.DataFrame.to_xarray --> Fails and fallsback to slow --> disable module acceleration and then call pd.DataFrame.to_xarray --> This will fail in xarray because of the isinstance checks

I see, got it. Can you link me to one such instance check? The current changes look good for me. We can revisit if patching xarray is needed if there are more issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matt711 Matt711 changed the title Call to_xarray on the slow object Vendor Pandas' to_xarray in cudf.pandas Jan 27, 2026
pd.arrays.IntegerArray,
fast_to_slow=_Unusable(),
slow_to_fast=_Unusable(),
bases=(pd.api.extensions.ExtensionArray,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matt711 Can you see if we can proxy the ExtensionArray and then use that proxy class as base here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work (going to let CI run). It also exposed another bug: __array_ufunc__ missing on datetime arrays

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxying ExtensionArrray seems to break alot things. My guess is we run into #19823. Going to revert for now

@Matt711
Copy link
Contributor Author

Matt711 commented Jan 27, 2026

/ok to test 3562282

@Matt711
Copy link
Contributor Author

Matt711 commented Jan 28, 2026

/ok to test efb849b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cudf.pandas Issues specific to cudf.pandas non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants