Skip to content

[Enh]: add LazyFrame.collect_batches #3360

@danielgafni

Description

@danielgafni

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

The user should be given an option to ORDER BY deterministically first, while the default behavior may produce batches in non-deterministic ordering (including rows within batches of course).

Please describe the purpose of the new feature or describe the problem to solve.

The corresponding Polars functionality is incredibly useful for parallel batch processing. Often there is a main entrypoint that can collect these batches and send them to (remote) parallel workers. I'm typically doing this kind of work with Ray.

Suggest a solution if possible.

For backends that do not have a corresponding method natively, I assume it's reasonable to implement it via with_row_index.

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

Happy to implement this feature as my first Narwhals contribution given it makes sense to have it

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions