Skip to content

Commit c6064b4

Browse files
committed
Update ecosystem.md
1 parent a40a166 commit c6064b4

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

web/pandas/community/ecosystem.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -468,6 +468,29 @@ df.dtypes
468468

469469
ArcticDB also supports appending, updating, and querying data from storage to a pandas DataFrame. Please find more information [here](https://docs.arcticdb.io/latest/api/query_builder/).
470470

471+
### [Hugging Face](https://huggingface.co/datasets)
472+
473+
The Hugging Face Dataset Hub provides a large collection of ready-to-use datasets for machine learning shared by the community. The platform offers a user-friendly interface to explore, discover and visualize datasets, and provides tools to easily load and work with these datasets in Python thanks to the [huggingface_hub](https://github.com/huggingface/huggingface_hub) library.
474+
475+
You can access datasets on Hugging Face using `hf://` paths in pandas, in the form `hf://datasets/username/dataset_name/...`.
476+
477+
For example, here is how to load the [stanfordnlp/imdb dataset](https://huggingface.co/datasets/stanfordnlp/imdb):
478+
479+
```python
480+
import pandas as pd
481+
482+
# Load the IMDB dataset
483+
df = pd.read_parquet("hf://datasets/stanfordnlp/imdb/plain_text/train-00000-of-00001.parquet")
484+
```
485+
486+
Tip: on a dataset page, click on "Use this dataset" to get the code to load it in pandas.
487+
488+
To save a dataset on Hugging Face you need to [create a public or private dataset](https://huggingface.co/new-dataset) and [login](https://huggingface.co/docs/huggingface_hub/quick-start#login-command), and then you can use `df.to_csv/to_json/to_parquet`:
489+
490+
```python
491+
# Save the dataset to my Hugging Face account
492+
df.to_parquet("hf://datasets/username/dataset_name/train.parquet")
493+
```
471494

472495
## Out-of-core
473496

0 commit comments

Comments
 (0)