-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The current collect_fold/scan() are collect_* functions because they don't return an Expr or LazyFrame, they actually run the calculation. This can result in duplicate calculations if they are used inside a lazy calculation. On the plus side, they support streaming.
On the other hand, plumba.fold()/scan() are for Expr, so they can do things like group_by() and allow for common subquery elimination. But, they don't support streaming due to limitations in Expr.map_batches().
It's possible one could get both limited memory usage and laziness (and corresponding reduction in duplicate calculations) in the latter with some tweaks to Polars' APIs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels