You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Pandas](https://github.com/pandas-dev/pandas) is a widely used Python data analysis toolkit.
4
-
Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub:
4
+
Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub.
5
5
6
-
First you need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using:
6
+
## Load a DataFrame
7
+
8
+
You can load data from local files or from remote storage like Hugging Face Datasets. Pandas supports many formats including CSV, JSON and Paequet:
9
+
10
+
```python
11
+
>>>import pandas as pd
12
+
>>> df = pd.read_csv("path/to/data.csv")
13
+
```
14
+
15
+
To load a file from Hugging Face, the path needs to start with `hf://`. For example, the path to the [stanfordnlp/imdb](https://huggingface.co/datasets/stanfordnlp/imdb) dataset repository is `hf://datasets/stanfordnlp/imdb`. The dataset on Hugging Face contains multiple Parquet files. The Parquet file format is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy. Here is how to load the file `plain_text/train-00000-of-00001.parquet`:
0 I rented I AMCURIOUS-YELLOWfrom my video sto...0
23
+
1"I Am Curious: Yellow"is a risible and preten...0
24
+
2 If only to avoid making this type of film in t...0
25
+
3 This film was probably inspired by Godard's Ma... 0
26
+
4 Oh, brother...after hearing about this ridicul...0
27
+
.........
28
+
24995 A hit at the time but now better categorised a...1
29
+
24996 I love this movie like no other. Another time ...1
30
+
24997 This film and it's sequel Barry Mckenzie holds... 1
31
+
24998'The Adventures Of Barry McKenzie' started lif...1
32
+
24999 The story centers around Barry McKenzie who mu...1
33
+
```
34
+
35
+
To have more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](/docs/huggingface_hub/guides/hf_file_system).
36
+
37
+
## Save a DataFrame
38
+
39
+
You can save a pandas DataFrame using `to_csv/to_json/to_parquet` to a local file or to Hugging Face directly.
40
+
41
+
To save the DataFrame on Hugging Face, you first need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using:
7
42
8
43
```
9
44
huggingface-cli login
@@ -22,26 +57,93 @@ Finally, you can use [Hugging Face paths](/docs/huggingface_hub/guides/hf_file_s
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on HF.
91
+
92
+
```python
93
+
from huggingface_hub import HfApi
94
+
api = HfApi()
95
+
96
+
api.upload_folder(
97
+
folder_path=folder_path,
98
+
repo_id="username/my_image_dataset",
99
+
repo_type="dataset",
100
+
)
45
101
```
46
102
47
-
To have more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](/docs/huggingface_hub/guides/hf_file_system).
103
+
Using [pandas-image-methods](https://github.com/lhoestq/pandas-image-methods) you enable `PIL.Image` methods on an image column. It also enables saving the dataset as one single Parquet file containing both the images and the metadata:
0 commit comments