You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From a metadata file containing a "file_name" field for the names or paths to the images:
70
+
From a folder with a metadata file containing a "file_name" field for the names or paths to the images:
71
71
72
72
```
73
-
data/ data/
73
+
Example 1: Example 2:
74
+
folder/ folder/
74
75
├── metadata.csv ├── metadata.csv
75
76
├── img000.png └── images
76
77
├── img001.png ├── img000.png
@@ -81,13 +82,13 @@ data/ data/
81
82
```python
82
83
import pandas as pd
83
84
84
-
folder_path ="path/to/data/"
85
+
folder_path ="path/to/folder/"
85
86
df = pd.read_csv(folder_path +"metadata.csv")
86
87
for image_path in (folder_path + df["file_name"]):
87
88
...
88
89
```
89
90
90
-
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on HF.
91
+
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on Hugging Face.
91
92
92
93
```python
93
94
from huggingface_hub import HfApi
@@ -100,6 +101,8 @@ api.upload_folder(
100
101
)
101
102
```
102
103
104
+
### Image methods and Parquet
105
+
103
106
Using [pandas-image-methods](https://github.com/lhoestq/pandas-image-methods) you enable `PIL.Image` methods on an image column. It also enables saving the dataset as one single Parquet file containing both the images and the metadata:
104
107
105
108
```python
@@ -118,6 +121,63 @@ All the `PIL.Image` methods are available, e.g.
118
121
df["image"] = df["image"].pil.rotate(90)
119
122
```
120
123
124
+
## Use Audios
125
+
126
+
From a folder with a metadata file containing a "file_name" field for the names or paths to the audios:
127
+
128
+
```
129
+
Example 1: Example 2:
130
+
folder/ folder/
131
+
├── metadata.csv ├── metadata.csv
132
+
├── rec000.wav └── audios
133
+
├── rec001.wav ├── rec000.wav
134
+
... ...
135
+
└── recNNN.wav └── recNNN.wav
136
+
```
137
+
138
+
```python
139
+
import pandas as pd
140
+
141
+
folder_path ="path/to/folder/"
142
+
df = pd.read_csv(folder_path +"metadata.csv")
143
+
for audio_path in (folder_path + df["file_name"]):
144
+
...
145
+
```
146
+
147
+
Since the dataset is in a supported structure, you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and audios on Hugging Face.
148
+
149
+
```python
150
+
from huggingface_hub import HfApi
151
+
api = HfApi()
152
+
153
+
api.upload_folder(
154
+
folder_path=folder_path,
155
+
repo_id="username/my_audio_dataset",
156
+
repo_type="dataset",
157
+
)
158
+
```
159
+
160
+
### Audio methods and Parquet
161
+
162
+
Using [pandas-audio-methods](https://github.com/lhoestq/pandas-audio-methods) you enable `soundfile` methods on an audio column. It also enables saving the dataset as one single Parquet file containing both the audios and the metadata:
0 commit comments