Skip to content

Commit 3833df7

Browse files
authored
Merge pull request #105912 from nibaccam/mount-dset
Dataset | mount code sample
2 parents e7fcafc + 66ecd55 commit 3833df7

File tree

1 file changed

+20
-3
lines changed

1 file changed

+20
-3
lines changed

articles/machine-learning/how-to-train-with-datasets.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,29 @@ experiment_run.wait_for_completion(show_output=True)
104104
If you want to make your data files available on the compute target for training, use [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) to mount or download files referred by it.
105105

106106
### Mount vs. Download
107-
When you mount a dataset, you attach the files referenced by the dataset to a directory (mount point) and make it available on the compute target. Mounting is supported for Linux-based computes, including Azure Machine Learning Compute, virtual machines, and HDInsight. If your data size exceeds the compute disk size, or you are only loading part of dataset in your script, mounting is recommended. Because downloading a dataset bigger than the disk size will fail, and mounting will only load the part of data used by your script at the time of processing.
108-
109-
When you download a dataset, all the files referenced by the dataset will be downloaded to the compute target. Downloading is supported for all compute types. If your script process all files referenced by the dataset, and your compute disk can fit in your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services.
110107

111108
Mounting or downloading files of any format are supported for datasets created from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL.
112109

110+
When you mount a dataset, you attach the files referenced by the dataset to a directory (mount point) and make it available on the compute target. Mounting is supported for Linux-based computes, including Azure Machine Learning Compute, virtual machines, and HDInsight. When you download a dataset, all the files referenced by the dataset will be downloaded to the compute target. Downloading is supported for all compute types.
111+
112+
If your script processes all files referenced by the dataset, and your compute disk can fit your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services. If your data size exceeds the compute disk size, downloading is not possible. For this scenario, we recommend mounting since only the data files used by your script are loaded at the time of processing.
113+
114+
The following code mounts `dataset` to the temp directory at `mounted_path`
115+
116+
```python
117+
import tempfile
118+
mounted_path = tempfile.mkdtemp()
119+
120+
# mount dataset onto the mounted_path of a Linux-based compute
121+
mount_context = dataset.mount(mounted_path)
122+
123+
mount_context.start()
124+
125+
import os
126+
print(os.listdir(mounted_path))
127+
print (mounted_path)
128+
```
129+
113130
### Create a FileDataset
114131

115132
The following example creates an unregistered FileDataset from web urls. Learn more about [how to create datasets](https://aka.ms/azureml/howto/createdatasets) from other sources.

0 commit comments

Comments
 (0)