Skip to content

Commit 5ea62a6

Browse files
authored
Update datasets-dask.md
1 parent 19452ea commit 5ea62a6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/hub/datasets-dask.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ the `meta` argument to know the type of the new column in the meantime.
9393

9494
When reading Parquet data from Hugging Face, Dask automatically leverages the metadata in Parquet files to skip entire files or row groups if they are not needed. For example if you apply a filter (predicate) on a Hugging Face Dataset in Parquet format or if you select a subset of the columns (projection), Dask will read the metadata of the Parquet files to discard the parts that are not needed without downloading them.
9595

96-
This is possible thanks to a [reimplmentation of the Dask DataFrame API](https://docs.coiled.io/blog/dask-dataframe-is-fast.html?utm_source=hf-docs) to support query optimization, which makes Dask faster and more robust.
96+
This is possible thanks to a [reimplementation of the Dask DataFrame API](https://docs.coiled.io/blog/dask-dataframe-is-fast.html?utm_source=hf-docs) to support query optimization, which makes Dask faster and more robust.
9797

9898
For example this subset of FineWeb-Edu contains many Parquet files. If you can filter the dataset to keep the text from recent CC dumps, Dask will skip most of the files and only download the data that match the filter:
9999

0 commit comments

Comments
 (0)