Skip to content

Commit 8b5bbd7

Browse files
Merge pull request #1551 from fbsolo-ms1/ai-studio-UUF-repair-branch
Fix detected bugs reported in a UUF DevOps item . . .
2 parents 56ea0a3 + 33cce2f commit 8b5bbd7

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

articles/machine-learning/how-to-create-data-assets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ environment: azureml://registries/azureml/environments/sklearn-1.1/versions/4
427427
inputs:
428428
input_data:
429429
mode: ro_mount
430-
path: azureml:wasbs://[email protected]/titanic.csv
430+
path: wasbs://[email protected]/titanic.csv
431431
type: uri_file
432432
outputs:
433433
output_data:

articles/machine-learning/how-to-read-write-data-v2.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ environment: azureml://registries/azureml/environments/sklearn-1.1/versions/4
186186
inputs:
187187
input_data:
188188
mode: ro_mount
189-
path: azureml:wasbs://[email protected]/titanic.csv
189+
path: wasbs://[email protected]/titanic.csv
190190
type: uri_file
191191
```
192192
@@ -321,7 +321,7 @@ environment: azureml://registries/azureml/environments/sklearn-1.1/versions/4
321321
inputs:
322322
input_data:
323323
mode: ro_mount
324-
path: azureml:wasbs://[email protected]/titanic.csv
324+
path: wasbs://[email protected]/titanic.csv
325325
type: uri_file
326326
outputs:
327327
output_data:
@@ -967,7 +967,7 @@ Files are usually read in *blocks* of 1-4 MB in size. Files smaller than a block
967967

968968
For small files, the latency interval mostly involves handling the requests to storage, instead of data transfers. Therefore, we offer these recommendations to increase the file size:
969969

970-
- For unstructured data (images, text, video, etc.), archive (zip/tar) small files together, to store them as a larger file that can be read in multiple chunks. These larger archived files can be opened in the compute resource, and [PyTorch Archive DataPipes](https://pytorch.org/data/main/torchdata.datapipes.iter.html#archive-datapipes) can extract the smaller files.
970+
- For unstructured data (images, text, video, etc.), archive (zip/tar) small files together, to store them as a larger file that can be read in multiple chunks. These larger archived files can be opened in the compute resource, and [PyTorch Archive DataPipes](https://pytorch.org/data/0.9/dp_tutorial.html) can extract the smaller files.
971971
- For structured data (CSV, parquet, etc.), examine your ETL process, to make sure that it coalesces files to increase size. Spark has `repartition()` and `coalesce()` methods to help increase file sizes.
972972

973973
If you can't increase your file sizes, explore your [Azure Storage options](#azure-storage-options).

0 commit comments

Comments
 (0)