fix: ak.to_dataframe() converting None to string "nan" in string columns
#3716
+98
−32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix
ak.to_dataframe()convertingNoneto string"nan"in string columnsProblem
When converting an Awkward Array with
Nonevalues in string columns to a DataFrame,Nonewas converted to the literal string"nan", making it indistinguishable from actual"nan"string values.Solution
Modified
ak_to_dataframe.pyto convert masked string/bytestring arrays to object arrays withNonevalues instead of filling with the string"nan". This allows pandas to properly handle missing values while preserving the distinction betweenNoneand the string"nan".Changes Made
/src/awkward/operations/ak_to_dataframe.py:numpy.ma.filled(column, "nan")withnumpy.where(column.mask, None, column.data).astype(object)for string/bytestring columnstest_3713_to_dataframe_none_vs_nan_string.py(following CONTRIBUTING.md naming convention)test_3692_to_dataframe_masked_string_dtype_resize.pyandtest_0331_pandas_indexedarray.pyto expectNoneinstead of"nan"Verification
Noneand"nan"are now distinguishableNonevsb"nan"are distinguishableNonestill shows asNaNfor numeric typesSecurity Summary
No security vulnerabilities introduced. CodeQL analysis found 0 alerts.
Fixes #3713
Original prompt
ak.to_dataframe()turnsNoneinto the string"nan"#3713✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.