-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Data] Add polars usage instruction to docs #60029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
5cbd0ef
1b7503d
81ae02c
09d992f
a578b1c
078e2fc
0daf051
6de8f94
2c4e464
222d468
86082dc
21da8d0
390f150
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -212,7 +212,7 @@ In this case, your function would look like: | |
| # yield the same batch multiple times | ||
| for _ in range(10): | ||
| yield batch | ||
|
|
||
| Choosing the right batch format | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
|
|
@@ -249,6 +249,22 @@ program might run into out-of-memory (OOM) errors. | |
|
|
||
| If you encounter an OOM errors, try decreasing your ``batch_size``. | ||
|
|
||
| Enabling ``Polars`` operations | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| You can enable ``Polars`` globally to optimize certain Ray Data operations. Ray Data uses ``Polars`` internally for better performance when processing data. | ||
|
|
||
| To enable ``Polars`` operations, configure the :class:`~ray.data.DataContext`: | ||
owenowenisme marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| .. testcode:: | ||
|
|
||
owenowenisme marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| import ray | ||
|
|
||
| ctx = ray.data.DataContext.get_current() | ||
| ctx.use_polars_sort = True | ||
|
|
||
| When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code, you can still use any batch format in :meth:`~ray.data.Dataset.map_batches`. | ||
owenowenisme marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
owenowenisme marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| .. _stateful_transforms: | ||
|
|
||
|
|
@@ -365,7 +381,7 @@ You can read more about resources in Ray here: :ref:`resource-requirements`. | |
| :hide: | ||
|
|
||
| import ray | ||
|
|
||
| ds = ray.data.range(1) | ||
|
|
||
| .. testcode:: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Here and elsewhere -- I think it makes more sense for Polars to not be code text, especially since it's not part of the glossary or anything.