Skip to content

Conversation

Copy link

Copilot AI commented Nov 5, 2025

Fix ak.to_dataframe() converting None to string "nan" in string columns

Problem

When converting an Awkward Array with None values in string columns to a DataFrame, None was converted to the literal string "nan", making it indistinguishable from actual "nan" string values.

Solution

Modified ak_to_dataframe.py to convert masked string/bytestring arrays to object arrays with None values instead of filling with the string "nan". This allows pandas to properly handle missing values while preserving the distinction between None and the string "nan".

Changes Made

  • Modified /src/awkward/operations/ak_to_dataframe.py:
    • Replaced numpy.ma.filled(column, "nan") with numpy.where(column.mask, None, column.data).astype(object) for string/bytestring columns
    • Removed the dtype width checking logic that was only needed for the "nan" fill value
  • Added comprehensive tests in test_3713_to_dataframe_none_vs_nan_string.py (following CONTRIBUTING.md naming convention)
  • Updated existing tests in test_3692_to_dataframe_masked_string_dtype_resize.py and test_0331_pandas_indexedarray.py to expect None instead of "nan"

Verification

  • Original issue resolved: None and "nan" are now distinguishable
  • Bytestrings work correctly: None vs b"nan" are distinguishable
  • Numeric arrays unchanged: None still shows as NaN for numeric types
  • All edge cases tested: nested lists, records, union arrays, short strings
  • Code review completed and feedback addressed
  • Security check passed (CodeQL: 0 alerts)
  • Code formatted with ruff
  • Test file renamed to follow CONTRIBUTING.md convention

Security Summary

No security vulnerabilities introduced. CodeQL analysis found 0 alerts.

Fixes #3713

Original prompt

This section details on the original issue you should resolve

<issue_title>ak.to_dataframe() turns None into the string "nan"</issue_title>
<issue_description>### Version of Awkward Array

2.8.10

Description and code to reproduce

ak.to_dataframe() turns None in the string column into the string "nan", which is indistinguishable from the actual string value "nan".

For example, starting from a list of two elements: [None, "nan"], create an Awkward Array and convert it to a DataFrame:

>>> import awkward as ak
>>> m = [None, "nan"]
>>> a = ak.Array(m)
>>> d = ak.to_dataframe(a)
>>> d.to_dict()
{'values': {0: 'nan', 1: 'nan'}}

The result is two values both string "nan".

This issue is related to #3694
</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Fix ak.to_dataframe() handling of None values Fix ak.to_dataframe() converting None to string "nan" in string columns Nov 5, 2025
Copilot AI requested a review from TaiSakuma November 5, 2025 19:19
Copilot finished work on behalf of TaiSakuma November 5, 2025 19:19
@TaiSakuma
Copy link
Collaborator

Can you follow CONTRIBUTING.md in the repo?

@TaiSakuma
Copy link
Collaborator

@copilot Please update this PR to follow CONTRIBUTING.md.

Copy link
Author

Copilot AI commented Nov 5, 2025

@copilot Please update this PR to follow CONTRIBUTING.md.

Renamed test file to follow CONTRIBUTING.md naming convention. Test file is now test_3713_to_dataframe_none_vs_nan_string.py where 3713 is the issue number. (commit 80b57c5)

Copilot finished work on behalf of TaiSakuma November 5, 2025 20:23
@ianna ianna changed the title Fix ak.to_dataframe() converting None to string "nan" in string columns fix: ak.to_dataframe() converting None to string "nan" in string columns Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ak.to_dataframe() turns None into the string "nan"

2 participants