You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/datasets-pyarrow.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ To save the Table on Hugging Face, you first need to [Login with your Hugging Fa
58
58
huggingface-cli login
59
59
```
60
60
61
-
Then you can [Create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
61
+
Then you can [create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
62
62
63
63
```python
64
64
from huggingface_hub import HfApi
@@ -118,7 +118,7 @@ for file_name in table["file_name"].to_pylist():
118
118
...
119
119
```
120
120
121
-
Since the dataset is in a [supported structure](https://huggingface.co/docs/hub/en/datasets-image#additional-columns) (a `metadata.parquet` file with a `file_name` field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on Hugging Face.
121
+
Since the dataset is in a [supported structure](https://huggingface.co/docs/hub/en/datasets-image#additional-columns) (a `metadata.parquet` file with a `file_name` field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images.
122
122
123
123
```python
124
124
from huggingface_hub import HfApi
@@ -202,9 +202,9 @@ api.upload_folder(
202
202
)
203
203
```
204
204
205
-
### Embed Audios inside Parquet
205
+
### Embed Audio inside Parquet
206
206
207
-
PyArrow has a binary type which allows to have the audios bytes in Arrow tables. Therefore it enables saving the dataset as one single Parquet file containing both the audios (bytes and path) and the samples metadata:
207
+
PyArrow has a binary type which allows for having audio bytes in Arrow tables. Therefore, it enables saving the dataset as one single Parquet file containing both the audio (bytes and path) and the samples metadata:
Setting the Audio type in the Arrow schema metadata allows other libraries and the Hugging Face Dataset Viewer to know that "audio" contains audios and not just binary data.
234
+
Setting the Audio type in the Arrow schema metadata enables other libraries and the Hugging Face Dataset Viewer to recognise that "audio" contains audio data, not just binary data.
0 commit comments