Skip to content

Commit 7d367b1

Browse files
lhoestqdavanstrienjulien-c
authored
Apply suggestions from code review
Co-authored-by: Daniel van Strien <[email protected]> Co-authored-by: Julien Chaumond <[email protected]>
1 parent 9396eb7 commit 7d367b1

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/hub/datasets-pyarrow.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ To save the Table on Hugging Face, you first need to [Login with your Hugging Fa
5858
huggingface-cli login
5959
```
6060

61-
Then you can [Create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
61+
Then you can [create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
6262

6363
```python
6464
from huggingface_hub import HfApi
@@ -118,7 +118,7 @@ for file_name in table["file_name"].to_pylist():
118118
...
119119
```
120120

121-
Since the dataset is in a [supported structure](https://huggingface.co/docs/hub/en/datasets-image#additional-columns) (a `metadata.parquet` file with a `file_name` field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images on Hugging Face.
121+
Since the dataset is in a [supported structure](https://huggingface.co/docs/hub/en/datasets-image#additional-columns) (a `metadata.parquet` file with a `file_name` field), you can save this dataset to Hugging Face and the Dataset Viewer shows both the metadata and images.
122122

123123
```python
124124
from huggingface_hub import HfApi
@@ -202,9 +202,9 @@ api.upload_folder(
202202
)
203203
```
204204

205-
### Embed Audios inside Parquet
205+
### Embed Audio inside Parquet
206206

207-
PyArrow has a binary type which allows to have the audios bytes in Arrow tables. Therefore it enables saving the dataset as one single Parquet file containing both the audios (bytes and path) and the samples metadata:
207+
PyArrow has a binary type which allows for having audio bytes in Arrow tables. Therefore, it enables saving the dataset as one single Parquet file containing both the audio (bytes and path) and the samples metadata:
208208

209209
```python
210210
import pyarrow as pa
@@ -231,4 +231,4 @@ table = table.replace_schema_metadata(schema_metadata)
231231
pq.write_table(table, "data.parquet", use_content_defined_chunking=True, row_group_size=100)
232232
```
233233

234-
Setting the Audio type in the Arrow schema metadata allows other libraries and the Hugging Face Dataset Viewer to know that "audio" contains audios and not just binary data.
234+
Setting the Audio type in the Arrow schema metadata enables other libraries and the Hugging Face Dataset Viewer to recognise that "audio" contains audio data, not just binary data.

0 commit comments

Comments
 (0)