Skip to content
Merged
20 changes: 18 additions & 2 deletions doc/source/data/transforming-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ In this case, your function would look like:
# yield the same batch multiple times
for _ in range(10):
yield batch

Choosing the right batch format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -249,6 +249,22 @@ program might run into out-of-memory (OOM) errors.

If you encounter an OOM errors, try decreasing your ``batch_size``.

Enabling ``Polars`` operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and elsewhere -- I think it makes more sense for Polars to not be code text, especially since it's not part of the glossary or anything.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can enable ``Polars`` globally to optimize certain Ray Data operations. Ray Data uses ``Polars`` internally for better performance when processing data.

To enable ``Polars`` operations, configure the :class:`~ray.data.DataContext`:

.. testcode::

import ray

ctx = ray.data.DataContext.get_current()
ctx.use_polars_sort = True

When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code, you can still use any batch format in :meth:`~ray.data.Dataset.map_batches`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What're the user-facing Ray Data APIs that benefit from the polars feature?

IIUC it doesn't improve performance for most UDFs except for map_groups, and that's because of an implementation detail where we perform a sort.

Would this information be more appropriate in a different user guide(s)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to performance-tips, WDYT?



.. _stateful_transforms:

Expand Down Expand Up @@ -365,7 +381,7 @@ You can read more about resources in Ray here: :ref:`resource-requirements`.
:hide:

import ray

ds = ray.data.range(1)

.. testcode::
Expand Down