Skip to content

Commit 170848f

Browse files
committed
minor changes
1 parent f63a5c6 commit 170848f

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

docs/hub/datasets-pandas.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write
55

66
## Load a DataFrame
77

8-
You can load data from local files or from remote storage like Hugging Face Datasets. Pandas supports many formats including CSV, JSON and Paequet:
8+
You can load data from local files or from remote storage like Hugging Face Datasets. Pandas supports many formats including CSV, JSON and Parquet:
99

1010
```python
1111
>>> import pandas as pd
@@ -67,7 +67,7 @@ df_test .to_parquet("hf://datasets/username/my_dataset/test.parquet")
6767

6868
## Use Images
6969

70-
From a folder with a metadata file containing a "file_name" field for the names or paths to the images:
70+
You can load a folder with a metadata file containing a field for the names or paths to the images, structured like this:
7171

7272
```
7373
Example 1: Example 2:
@@ -79,6 +79,8 @@ folder/ folder/
7979
└── imgNNN.png └── imgNNN.png
8080
```
8181

82+
You can iterate on the images paths like this:
83+
8284
```python
8385
import pandas as pd
8486

@@ -88,7 +90,7 @@ for image_path in (folder_path + df["file_name"]):
8890
...
8991
```
9092

91-
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on Hugging Face.
93+
Since the dataset is in a supported structure ("metadata.csv" file with "file_name" field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on Hugging Face.
9294

9395
```python
9496
from huggingface_hub import HfApi
@@ -123,7 +125,7 @@ df["image"] = df["image"].pil.rotate(90)
123125

124126
## Use Audios
125127

126-
From a folder with a metadata file containing a "file_name" field for the names or paths to the audios:
128+
You can load a folder with a metadata file containing a field for the names or paths to the audios, structured like this:
127129

128130
```
129131
Example 1: Example 2:
@@ -135,6 +137,8 @@ folder/ folder/
135137
└── recNNN.wav └── recNNN.wav
136138
```
137139

140+
You can iterate on the audios paths like this:
141+
138142
```python
139143
import pandas as pd
140144

@@ -144,7 +148,7 @@ for audio_path in (folder_path + df["file_name"]):
144148
...
145149
```
146150

147-
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and audios on Hugging Face.
151+
Since the dataset is in a supported structure ("metadata.csv" file with "file_name" field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and audios on Hugging Face.
148152

149153
```python
150154
from huggingface_hub import HfApi
@@ -181,7 +185,13 @@ df["audio"] = df["audio"].sf.write()
181185
## Use Transformers
182186

183187
You can use `transformers` pipelines on pandas DataFrames to classify, generate text, images, etc.
184-
This section shows a few examples.
188+
This section shows a few examples with `tqdm` for progress bars.
189+
190+
<Tip>
191+
192+
Pipelines don't accept a `tqdm` object as input but you can use a python generator instead, in the form `x for x in tqdm(...)`
193+
194+
</Tip>
185195

186196
### Text Classification
187197

0 commit comments

Comments
 (0)