Skip to content

Require using rapidsmpf Stream Pool with rapidsmpf runtime #20756

@TomAugspurger

Description

@TomAugspurger

Spotted in #20662 (comment), rapidsmpf's native read_parquet node will produce data that's stream ordered on some CUDA stream from rapidsmpf's stream pool. It's not clear to me how this interacts with cudf-polars non-pool CUDAStreamPolicy options ("new", or "default"): we would need to ensure that the data from rapidsmpf's native nodes are synchronized with the stream we attach to the dataframe.

I'd recommend just requiring that the rapidsmpf runtime uses the rapidsmpf stream pool (erroring when creating the ConfigOptions from the polars engine if not).

Metadata

Metadata

Assignees

No one assigned

    Labels

    cudf-polarsIssues specific to cudf-polars

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions