Skip to content

Commit af4604e

Browse files
committed
minor
1 parent 31ed970 commit af4604e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/hub/datasets-dask.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,9 @@ Note that you also need to provide `meta` which is the type of the pandas Series
8989
This is needed because Dask DataFrame is a lazy API. Since Dask will only run the data processing once `.compute()` is called, it needs
9090
the `meta` argument to know the type of the new column in the meantime.
9191

92-
# Column and Filter Pushdown
92+
# Predicate and Projection Pushdown
9393

94-
When reading Parquet data from Hugging Face, Dask automatically leverages the metadata in Parquet files to skip entire files or row groups if they are not needed. For example if you apply a filter on a Hugging Face Dataset in Parquet format or if you select a subset of the columns, Dask will read the metadata of the Paquet files to discard the parts that are not needed without downloading them.
94+
When reading Parquet data from Hugging Face, Dask automatically leverages the metadata in Parquet files to skip entire files or row groups if they are not needed. For example if you apply a filter (predicate) on a Hugging Face Dataset in Parquet format or if you select a subset of the columns (projection), Dask will read the metadata of the Paquet files to discard the parts that are not needed without downloading them.
9595

9696
This is possible thanks to the `dask-expr` package which is generally installed by default with Dask.
9797

0 commit comments

Comments
 (0)