Skip to content

Conversation

@aryansri05
Copy link

Description

Closes #20746.

This PR fixes an inconsistency between cuDF and Pandas when casting floating-point columns containing NaN values to boolean while mode.pandas_compatible is enabled.

The Issue:

  • Pandas: bool(float('nan')) evaluates to True. Casting a Series [1.0, NaN] to bool results in [True, True].
  • cuDF: Previously, NaN values in float columns were treated as nulls, which propagated as nulls after casting to bool ([True, <NA>]).

The Fix:
Updated as_numerical_column in numerical.py. When mode.pandas_compatible is on, if we detect a cast from Float -> Bool on a column with nulls, we explicitly fill the nulls with np.nan before casting. This ensures the underlying cast logic evaluates them as True, matching Pandas behavior.

Checklist

  • I am adding a new test (see tests/test_issue_20746.py)
  • I have signed off my commits

@aryansri05 aryansri05 requested a review from a team as a code owner December 1, 2025 14:22
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Dec 1, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[BUG] cudf and cudf.pandas casts empty rows from csv files into the bool type differs from Pandas

1 participant