Skip to content

Commit 23fcf55

Browse files
committed
don't fix row group size
1 parent 14af0ae commit 23fcf55

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

docs/hub/datasets-pyarrow.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -156,8 +156,7 @@ table = table.replace_schema_metadata(schema_metadata)
156156

157157
# Save to Parquet
158158
# (Optional) with use_content_defined_chunking for faster uploads and downloads
159-
# (Optional) with row_group_size to allow loading 100 images at a time
160-
pq.write_table(table, "data.parquet", use_content_defined_chunking=True, row_group_size=100)
159+
pq.write_table(table, "data.parquet", use_content_defined_chunking=True)
161160
```
162161

163162
Setting the Image type in the Arrow schema metadata allows other libraries and the Hugging Face Dataset Viewer to know that "image" contains images and not just binary data.
@@ -227,8 +226,7 @@ table = table.replace_schema_metadata(schema_metadata)
227226

228227
# Save to Parquet
229228
# (Optional) with use_content_defined_chunking for faster uploads and downloads
230-
# (Optional) with row_group_size to allow loading 100 audios at a time
231-
pq.write_table(table, "data.parquet", use_content_defined_chunking=True, row_group_size=100)
229+
pq.write_table(table, "data.parquet", use_content_defined_chunking=True)
232230
```
233231

234232
Setting the Audio type in the Arrow schema metadata enables other libraries and the Hugging Face Dataset Viewer to recognise that "audio" contains audio data, not just binary data.

0 commit comments

Comments
 (0)