-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Description
Hey,
The schema with a nested struct inside a list causes an error. Related to #257 (assuming - thanks for the quick fix!).
Here is the minimal example:
schema = pymongoarrow.api.Schema({
'_id': bson.ObjectId,
'test_list_struct': [{
'field1': {
'sub_field1': pyarrow.string(),
'sub_field2': pyarrow.string(),
}
}],
})
temp_collection.insert_one({
"_id": bson.objectid.ObjectId("000000000000000000000001"),
"test_list_struct": [
{"field1": {
"sub_field1": "test_data",
# "sub_field2": "test_data", # missing and causes the error, when uncommented it works fine
}},
{"field1": "test_data", }
],
})
df = pymongoarrow.api.aggregate_polars_all(temp_collection._collection,
schema=schema,
pipeline=[],
)Produces:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/gugleta/.cache/pypoetry/virtualenvs/cxbi-LF4XS2gW-py3.12/lib/python3.12/site-packages/pymongoarrow/api.py:368: in aggregate_polars_all
return _arrow_to_polars(aggregate_arrow_all(collection, pipeline, schema=schema, **kwargs))
/home/gugleta/.cache/pypoetry/virtualenvs/cxbi-LF4XS2gW-py3.12/lib/python3.12/site-packages/pymongoarrow/api.py:152: in aggregate_arrow_all
return context.finish()
/home/gugleta/.cache/pypoetry/virtualenvs/cxbi-LF4XS2gW-py3.12/lib/python3.12/site-packages/pymongoarrow/context.py:50: in finish
array_map = _parse_builder_map(self.manager.finish())
/home/gugleta/.cache/pypoetry/virtualenvs/cxbi-LF4XS2gW-py3.12/lib/python3.12/site-packages/pymongoarrow/context.py:73: in _parse_builder_map
builder_map[key] = StructArray.from_arrays(arrs, names=names)
pyarrow/array.pxi:4108: in pyarrow.lib.StructArray.from_arrays
???
pyarrow/error.pxi:155: in pyarrow.lib.pyarrow_internal_check_status
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E pyarrow.lib.ArrowInvalid: Mismatching child array lengths
pyarrow/error.pxi:92: ArrowInvalidTested with version 1.6.3
Thanks in advance!
Metadata
Metadata
Assignees
Labels
No labels