Skip to content

Commit d76a509

Browse files
committed
Update test description for arrow_c_stream_large_dataset to clarify streaming method and usage of public API
1 parent d66d496 commit d76a509

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

python/tests/test_io.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,12 +99,14 @@ def test_read_avro():
9999

100100

101101
def test_arrow_c_stream_large_dataset(ctx):
102-
"""DataFrame.__arrow_c_stream__ yields batches incrementally.
102+
"""DataFrame streaming yields batches incrementally using Arrow APIs.
103103
104104
This test constructs a DataFrame that would be far larger than available
105-
memory if materialized. The ``__arrow_c_stream__`` method should expose a
106-
stream of record batches without collecting the full dataset, so reading a
107-
handful of batches should not exhaust process memory.
105+
memory if materialized. Use the public API
106+
``pa.RecordBatchReader.from_stream(df)`` (which is same as
107+
``pa.RecordBatchReader._import_from_c_capsule(df.__arrow_c_stream__())``)
108+
to read record batches incrementally without collecting the full dataset,
109+
so reading a handful of batches should not exhaust process memory.
108110
"""
109111
# Create a very large DataFrame using range; this would be terabytes if collected
110112
df = range_table(ctx, 0, 1 << 40)

0 commit comments

Comments
 (0)