File tree Expand file tree Collapse file tree 1 file changed +6
-4
lines changed Expand file tree Collapse file tree 1 file changed +6
-4
lines changed Original file line number Diff line number Diff line change @@ -99,12 +99,14 @@ def test_read_avro():
99
99
100
100
101
101
def test_arrow_c_stream_large_dataset (ctx ):
102
- """DataFrame.__arrow_c_stream__ yields batches incrementally.
102
+ """DataFrame streaming yields batches incrementally using Arrow APIs .
103
103
104
104
This test constructs a DataFrame that would be far larger than available
105
- memory if materialized. The ``__arrow_c_stream__`` method should expose a
106
- stream of record batches without collecting the full dataset, so reading a
107
- handful of batches should not exhaust process memory.
105
+ memory if materialized. Use the public API
106
+ ``pa.RecordBatchReader.from_stream(df)`` (which is same as
107
+ ``pa.RecordBatchReader._import_from_c_capsule(df.__arrow_c_stream__())``)
108
+ to read record batches incrementally without collecting the full dataset,
109
+ so reading a handful of batches should not exhaust process memory.
108
110
"""
109
111
# Create a very large DataFrame using range; this would be terabytes if collected
110
112
df = range_table (ctx , 0 , 1 << 40 )
You can’t perform that action at this time.
0 commit comments