Skip to content

Commit ba4745c

Browse files
committed
update wording
1 parent eeda69d commit ba4745c

File tree

3 files changed

+9
-7
lines changed

3 files changed

+9
-7
lines changed

docs/hub/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,8 @@
175175
sections:
176176
- local: datasets-argilla
177177
title: Argilla
178+
- local: datasets-daft
179+
title: Daft
178180
- local: datasets-dask
179181
title: Dask
180182
- local: datasets-usage

docs/hub/datasets-daft.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ pip install 'daft[hugggingface]'
1717

1818
## Read
1919

20-
Daft is able to read datasets directly from Hugging Face using the [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface) function or via the `hf://datasets/` protocol.
20+
Daft is able to read datasets directly from the Hugging Face Hub using the [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface) function or via the `hf://datasets/` protocol.
2121

2222
### Reading an Entire Dataset
2323

24-
Using [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface), you can easily read a Hugging Face dataset.
24+
Using [`daft.read_huggingface()`](https://docs.daft.ai/en/stable/api/io/#daft.read_huggingface), you can easily load a dataset.
2525

2626

2727
```python
@@ -34,7 +34,7 @@ This will read the entire dataset into a DataFrame.
3434

3535
### Reading Specific Files
3636

37-
Not only can you read entire datasets, but you can also read individual files from a dataset. Using a read function that takes in a path (such as [`daft.read_parquet()`](https://docs.daft.ai/en/stable/api/io/#daft.read_parquet), [`daft.read_csv()`](https://docs.daft.ai/en/stable/api/io/#daft.read_csv), or [`daft.read_json()`](https://docs.daft.ai/en/stable/api/io/#daft.read_json)), specify a Hugging Face dataset path via the `hf://datasets/` prefix:
37+
Not only can you read entire datasets, but you can also read individual files from a dataset repository. Using a read function that takes in a path (such as [`daft.read_parquet()`](https://docs.daft.ai/en/stable/api/io/#daft.read_parquet), [`daft.read_csv()`](https://docs.daft.ai/en/stable/api/io/#daft.read_csv), or [`daft.read_json()`](https://docs.daft.ai/en/stable/api/io/#daft.read_json)), specify a Hugging Face dataset path via the `hf://datasets/` prefix:
3838

3939
```python
4040
import daft
@@ -51,7 +51,7 @@ df = daft.read_parquet("hf://datasets/username/dataset_name/**/*.parquet")
5151

5252
## Write
5353

54-
Daft is able to write Parquet files to Hugging Face datasets using [`daft.DataFrame.write_huggingface`](https://docs.daft.ai/en/stable/api/dataframe/#daft.DataFrame.write_deltalake). Daft supports [Content-Defined Chunking](https://huggingface.co/blog/parquet-cdc) and [Xet](https://huggingface.co/blog/xet-on-the-hub) for faster, deduplicated writes.
54+
Daft is able to write Parquet files to a Hugging Face dataset repository using [`daft.DataFrame.write_huggingface`](https://docs.daft.ai/en/stable/api/dataframe/#daft.DataFrame.write_deltalake). Daft supports [Content-Defined Chunking](https://huggingface.co/blog/parquet-cdc) and [Xet](https://huggingface.co/blog/xet-on-the-hub) for faster, deduplicated writes.
5555

5656
Basic usage:
5757

@@ -67,9 +67,9 @@ See the [`DataFrame.write_huggingface`](https://docs.daft.ai/en/stable/api/dataf
6767

6868
## Authentication
6969

70-
The `token` parameter in [`daft.io.HuggingFaceConfig`](https://docs.daft.ai/en/stable/api/config/#daft.io.HuggingFaceConfig) can be used to specify a Hugging Face access token for requests that require authentication (e.g. reading private datasets or writing to a dataset).
70+
The `token` parameter in [`daft.io.HuggingFaceConfig`](https://docs.daft.ai/en/stable/api/config/#daft.io.HuggingFaceConfig) can be used to specify a Hugging Face access token for requests that require authentication (e.g. reading private dataset repositories or writing to a dataset repository).
7171

72-
Example of reading a dataset with a specified token:
72+
Example of loading a dataset with a specified token:
7373

7474
```python
7575
from daft.io import IOConfig, HuggingFaceConfig

docs/hub/datasets-libraries.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ Examples of this kind of integration:
8888

8989
#### Rely on an existing libraries integration with the Hub
9090

91-
Polars, Pandas, Dask, Spark, DuckDB, and Daft all can write to a Hugging Face Hub repository. See [datasets libraries](https://huggingface.co/docs/hub/datasets-libraries) for more details.
91+
Polars, Pandas, Dask, Spark, DuckDB, and Daft can all write to a Hugging Face Hub repository. See [datasets libraries](https://huggingface.co/docs/hub/datasets-libraries) for more details.
9292

9393
If you are already using one of these libraries in your code, adding the ability to push to the Hub is straightforward. For example, if you have a synthetic data generation library that can return a Pandas DataFrame, here is the code you would need to write to the Hub:
9494

0 commit comments

Comments
 (0)