-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hello,
When I try to include an optional field in a pyarrow.struct inside of a Schema and call aggregate_polars_all, the latest version (>=1.6.0) throws the following error:
ArrowInvalid: Mismatching child array lengths
For reference, I include the code used to produce this below. field2 is the list that contains multiple structs in which value2 is an optional value.
pa_struct = {
"value1": pyarrow.float64(),
"value2": pyarrow.float64()
}
mongo_projection = {
"field1": "$field1",
"field2": "$field2",
}
mongo_schema = {
"field1": pyarrow.string(),
"field2": pyarrow.list_(pyarrow.struct(pa_struct))
}
df = col.aggregate_polars_all([
{"$match": mongo_query},
{"$project": mongo_projection}
], schema=Schema(mongo_schema))Here is how the earlier version (<=1.5.2) correctly and conveniently assigns null if the value is not there:
If I try to aggregate it to a pandas DataFrame instead using the aggregate_pandas_all function, it always gives me the following error, regardless of which version I use:
TypeError: Cannot convert numpy.ndarray to numpy.ndarray
Is this intended behavior? Is there possibly a better and intended way to achieve this in the latest version?
Let me know if I can help in providing any more information.
Thanks!
