-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Describe the bug
Recently, I encountered an assertion failure when passing a sliced RecordBatch to the Python FFI. This problem seems to pop out, only when working with slices of more RecordBatches containing complex data - nested lists and/or structs.
To Reproduce
Add a following test case to arrow-pyarrow-integration-testing/tests/test_sql.py:
def test_nested_struct_with_list_slice():
"""
Test round-tripping sliced record batches with deeply nested struct types.
This tests struct<struct<list<struct>>> with variable-length lists,
ensuring that slicing at different row offsets works correctly.
"""
# Build the nested type: struct<struct<list<struct>>>
item_type = pa.struct([("x", pa.int64())])
inner_struct_type = pa.struct([("items", pa.list_(item_type))])
outer_struct_type = pa.struct([("inner", inner_struct_type)])
# Key: variable-length inner lists (1, 2, 1 items)
batch = pa.record_batch(
[
pa.array([1, 2, 3], type=pa.int64()),
pa.array([
{"inner": {"items": [{"x": 1}]}},
{"inner": {"items": [{"x": 2}, {"x": 3}]}},
{"inner": {"items": [{"x": 4}]}},
], type=outer_struct_type),
],
names=["id", "outer"]
)
# Test round-trip of each sliced row
for i in range(batch.num_rows):
print(i)
sliced = batch.slice(i, 1)
result = rust.round_trip_record_batch(sliced)
result.validate(full=True)
assert result.to_pydict() == sliced.to_pydict()
assert result.schema == sliced.schemaWhen I run pytest -v .:
# Test round-trip of each sliced row
for i in range(batch.num_rows):
print(i)
sliced = batch.slice(i, 1)
> result = rust.round_trip_record_batch(sliced)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E pyo3_runtime.PanicException: assertion failed: (offset + length) <= self.len()
tests\test_sql.py:757: PanicException
------------------------------------------- Captured stdout call --------------------------------------------
0
1
------------------------------------------- Captured stderr call --------------------------------------------
thread '<unnamed>' (12544) panicked at C:\Code\arrow-rs\arrow-data\src\data.rs:581:9:
assertion failed: (offset + length) <= self.len()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Expected behavior
The provided test should pass.