Skip to content

Commit f9cba5a

Browse files
committed
fix broken link
1 parent 0c8e857 commit f9cba5a

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

articles/machine-learning/how-to-read-write-data-v2.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -968,7 +968,9 @@ Files are read in *blocks* of 1-4 MB in size. Files smaller than a block are rea
968968

969969
For small files, the latency interval mostly involves handling the requests to storage, instead of data transfers. Therefore, we offer these recommendations to increase the file size:
970970

971-
- For unstructured data (images, video, etc.), archive (zip/tar) small files together, to store them as a larger file that can be read in multiple chunks. These larger archived files can be opened in the compute resource, and [PyTorch Archive DataPipes](https://pytorch.org/data/0.9/dp_tutorial.html) can extract the smaller files.
971+
- For unstructured data (images, video, etc.), archive (zip/tar) small files together, to store them as a larger file that can be read in multiple chunks. These larger archived files can be opened in the compute resource, and [PyTorch Archive DataPipes](https://meta-pytorch.org/data/0.9/dp_tutorial.html)
972+
973+
can extract the smaller files.
972974
- For structured data (CSV, parquet, etc.), examine your ETL process, to make sure that it coalesces files to increase size. Spark has `repartition()` and `coalesce()` methods to help increase file sizes.
973975

974976
If you can't increase your file sizes, explore your [Azure Storage options](#azure-storage-options).

0 commit comments

Comments
 (0)