Skip to content

BUG: fix writing some non-string object columns with arrow#630

Open
theroggy wants to merge 26 commits intogeopandas:mainfrom
theroggy:ENH-also-try-to-convert-object-columns-to-string-for-use_arrow=True
Open

BUG: fix writing some non-string object columns with arrow#630
theroggy wants to merge 26 commits intogeopandas:mainfrom
theroggy:ENH-also-try-to-convert-object-columns-to-string-for-use_arrow=True

Conversation

@theroggy
Copy link
Member

@theroggy theroggy commented Jan 19, 2026

In write_dataframe with use_arrow=False, object dtype columns are implicitly serialised to string to be able to write them to the output file.

With use_arrow=True, the pyarrow to_table function supports treatment of some datatypes (e.g. lists,...), but for other cases such columns rather given an error. For datatypes that are not supported by pyarrow a good default behaviour would be to just convert them to string as is done without arrow.

This PR explicitly converts object columns to string for columns that aren't supported to be interpreted by pyarrow.

resolves #631

@theroggy theroggy changed the title ENH: in write_dataframe, convert object columns to string with arrow ENH: improve support of writing object columns with arrow Jan 19, 2026
@theroggy theroggy self-assigned this Jan 19, 2026
@theroggy theroggy changed the title ENH: improve support of writing object columns with arrow BUG: fix writing non-string object columns with arrow Jan 19, 2026
@theroggy theroggy marked this pull request as ready for review January 19, 2026 23:31
@theroggy theroggy closed this Feb 19, 2026
@theroggy theroggy reopened this Feb 19, 2026
@theroggy theroggy added this to the 0.12.2 milestone Feb 19, 2026
@theroggy theroggy changed the title BUG: fix writing non-string object columns with arrow BUG: fix writing some non-string object columns with arrow Feb 20, 2026
Comment on lines +2562 to +2566
# Verify that object_col is actually inferred as object dtype for this test.
str_dtype = (
"str"
if PANDAS_GE_30 or (PANDAS_GE_23 and pd.options.future.infer_string)
else "object"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of verifying this here, I would instead specify dtype=object in the construction above to ensure the input data is always using object dtype (also for purely strings).

Strings as the str dtype is already covered by other tests (and also does not go through the code path you changed), so I would have this test just focus on object dtype (and we should cover the case of all strings as object dtype anyway as well)

]
elif isinstance(object_col_data[0], bytes):
# byte objects are read back as byte objects with arrow
expected_dtype = "object"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can move the expected_dtype as an additional value in the parametrization to avoid most of this while if/else block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected dtype as well as the expected data depends on use_arrow as well as the pandas version, so I don't think it will become more readable that way...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: writing a column with Path objects gives an error with use_arrow

2 participants