Skip to content

[Data] Add polars usage instruction to docs#60029

Open
peterxcli wants to merge 12 commits intoray-project:masterfrom
peterxcli:docs/add-polars-detail-in-transforming-data
Open

[Data] Add polars usage instruction to docs#60029
peterxcli wants to merge 12 commits intoray-project:masterfrom
peterxcli:docs/add-polars-detail-in-transforming-data

Conversation

@peterxcli
Copy link

Description

We can use polars to make operations more efficient by

Related issues

Closes: #59224

Additional information

No

@peterxcli peterxcli requested a review from a team as a code owner January 10, 2026 16:52
… section

Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli peterxcli force-pushed the docs/add-polars-detail-in-transforming-data branch from f3b3875 to 5cbd0ef Compare January 10, 2026 16:52
@peterxcli
Copy link
Author

cc @owenowenisme

@owenowenisme owenowenisme added data Ray Data-related issues community-contribution Contributed by the community labels Jan 10, 2026
@owenowenisme
Copy link
Member

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation on how to enable Polars operations in Ray Data via DataContext. The added section is clear and helpful. I've found a minor issue in one sentence which contains a typo and a grammatical error. I've provided a suggestion to fix it for better clarity.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation on how to enable Polars-based operations in Ray Data, specifically for sorting, by setting use_polars_sort in the DataContext. The change is clear and useful. I've found a minor grammatical issue in the new documentation and suggested a correction to improve clarity.

@owenowenisme
Copy link
Member

@peterxcli please fix the suggestion from gemini, thanks

@owenowenisme
Copy link
Member

owenowenisme commented Jan 10, 2026

And also, if we want to use glossary like "Polars", please use it with backtick like Polars or vale will raise error in CI.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 25, 2026
peterxcli and others added 3 commits January 26, 2026 00:38
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Peter Lee <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli
Copy link
Author

@owenowenisme please take another look. thanks!

@ryankert01 ryankert01 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 25, 2026
owenowenisme and others added 3 commits January 26, 2026 14:12
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme added the go add ONLY when ready to merge, run all tests label Jan 26, 2026
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@peterxcli
Copy link
Author

@owenowenisme Thanks for the review🙏


If you encounter an OOM errors, try decreasing your ``batch_size``.

Enabling ``Polars`` operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and elsewhere -- I think it makes more sense for Polars to not be code text, especially since it's not part of the glossary or anything.

ctx = ray.data.DataContext.get_current()
ctx.use_polars_sort = True

When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code, you can still use any batch format in :meth:`~ray.data.Dataset.map_batches`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What're the user-facing Ray Data APIs that benefit from the polars feature?

IIUC it doesn't improve performance for most UDFs except for map_groups, and that's because of an implementation detail where we perform a sort.

Would this information be more appropriate in a different user guide(s)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to performance-tips, WDYT?

@iamjustinhsu
Copy link
Contributor

Hi @peterxcli, are you still working on this?

Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 9, 2026 03:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation to the Ray Data “Transforming data” guide describing how to enable polars-backed optimizations via DataContext, in response to #59224.

Changes:

  • Add a new “Enabling Polars operations” subsection explaining DataContext.use_polars_sort.
  • Minor whitespace cleanup in existing .. testcode:: blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 255 to 266
You can enable ``Polars`` globally to optimize certain Ray Data operations. Ray Data uses ``Polars`` internally for better performance when processing data.

To enable ``Polars`` operations, configure the :class:`~ray.data.DataContext`:

.. testcode::

import ray

ctx = ray.data.DataContext.get_current()
ctx.use_polars_sort = True

When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code. You can still use any batch format in :meth:`~ray.data.Dataset.map_batches`.
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text implies Ray Data generally uses Polars internally, but the code path is only selected when DataContext.use_polars_sort (or deprecated use_polars) is enabled, and it currently applies to sort-related transforms. Consider rewording to clarify the scope (sorting only) and that it’s opt-in via this flag.

Copilot uses AI. Check for mistakes.
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the docs/add-polars-detail-in-transforming-data branch from 2a239fa to 86082dc Compare February 9, 2026 04:29
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Add polars usage instruction to docs

5 participants