BUG: fix writing some non-string object columns with arrow#630
BUG: fix writing some non-string object columns with arrow#630theroggy wants to merge 26 commits intogeopandas:mainfrom
Conversation
…vert-object-columns-to-string-for-use_arrow=True
pyogrio/tests/test_geopandas_io.py
Outdated
| # Verify that object_col is actually inferred as object dtype for this test. | ||
| str_dtype = ( | ||
| "str" | ||
| if PANDAS_GE_30 or (PANDAS_GE_23 and pd.options.future.infer_string) | ||
| else "object" |
There was a problem hiding this comment.
Instead of verifying this here, I would instead specify dtype=object in the construction above to ensure the input data is always using object dtype (also for purely strings).
Strings as the str dtype is already covered by other tests (and also does not go through the code path you changed), so I would have this test just focus on object dtype (and we should cover the case of all strings as object dtype anyway as well)
| ] | ||
| elif isinstance(object_col_data[0], bytes): | ||
| # byte objects are read back as byte objects with arrow | ||
| expected_dtype = "object" |
There was a problem hiding this comment.
Maybe you can move the expected_dtype as an additional value in the parametrization to avoid most of this while if/else block?
There was a problem hiding this comment.
The expected dtype as well as the expected data depends on use_arrow as well as the pandas version, so I don't think it will become more readable that way...
…vert-object-columns-to-string-for-use_arrow=True
In
write_dataframewithuse_arrow=False, object dtype columns are implicitly serialised to string to be able to write them to the output file.With
use_arrow=True, the pyarrowto_tablefunction supports treatment of some datatypes (e.g. lists,...), but for other cases such columns rather given an error. For datatypes that are not supported by pyarrow a good default behaviour would be to just convert them to string as is done without arrow.This PR explicitly converts object columns to string for columns that aren't supported to be interpreted by pyarrow.
resolves #631