@@ -149,15 +149,26 @@ To materialize the results of your DataFrame operations:
149149 # Count rows
150150 count = df.count()
151151
152- PyArrow Streaming
152+ Zero-copy streaming to Arrow-based Python libraries
153153-----------------
154154
155155DataFusion DataFrames implement the ``__arrow_c_stream__ `` protocol, enabling
156- zero-copy streaming into libraries like `PyArrow <https://arrow.apache.org/ >`_.
157- Earlier versions eagerly converted the entire DataFrame when exporting to
158- PyArrow, which could exhaust memory on large datasets. With streaming, batches
159- are produced lazily so you can process arbitrarily large results without
160- out-of-memory errors.
156+ zero-copy, lazy streaming into Arrow-based Python libraries. Earlier versions
157+ eagerly converted the entire DataFrame when exporting to Python Arrow APIs,
158+ which could exhaust memory on large results. With the streaming protocol,
159+ batches are produced on demand so you can process arbitrarily large results
160+ without out-of-memory errors.
161+
162+ .. note ::
163+
164+ The protocol is implementation-agnostic and works with any Python library
165+ that understands the Arrow C streaming interface (for example, PyArrow
166+ or other Arrow-compatible implementations). The sections below provide a
167+ short PyArrow-specific example and general guidance for other
168+ implementations.
169+
170+ PyArrow
171+ -------
161172
162173.. code-block :: python
163174
@@ -170,7 +181,7 @@ out-of-memory errors.
170181
171182 DataFrames are also iterable, yielding :class: `datafusion.RecordBatch `
172183objects lazily so you can loop over results directly without importing
173- PyArrow:
184+ PyArrow::
174185
175186.. code-block :: python
176187
@@ -179,24 +190,23 @@ PyArrow:
179190
180191 Each batch exposes ``to_pyarrow() ``, allowing conversion to a PyArrow
181192table. ``pa.table(df) `` collects the entire DataFrame eagerly into a
182- PyArrow table:
193+ PyArrow table::
183194
184195.. code-block :: python
185196
186197 import pyarrow as pa
187198 table = pa.table(df)
188199
189200 Asynchronous iteration is supported as well, allowing integration with
190- ``asyncio `` event loops:
201+ ``asyncio `` event loops::
191202
192203.. code-block :: python
193204
194205 async for batch in df:
195206 ... # process each batch as it is produced
196207
197- To work with the stream directly, use
198- ``execute_stream() ``, which returns a
199- :class: `~datafusion.RecordBatchStream `:
208+ To work with the stream directly, use ``execute_stream() ``, which returns a
209+ :class: `~datafusion.RecordBatchStream `::
200210
201211.. code-block :: python
202212
0 commit comments