You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-train-with-datasets.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -235,7 +235,7 @@ When you **mount** a dataset, you attach the files referenced by the dataset to
235
235
When you **download** a dataset, all the files referenced by the dataset are downloaded to the compute target. Downloading is supported for all compute types. If your script processes all files referenced by the dataset, and your compute disk can fit your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services. For multi-node downloads, see [how to avoid throttling](#troubleshooting).
236
236
237
237
> [!NOTE]
238
-
> The download path name should not be longer than 255 alpha-numeric characters for Windows OS. For Linux OS, the download path name should not be longer than 4,096 alpha-numeric characters. Also, for Linux OS the file name (which is the last segment of the download path `/path/to/file/{filename}`) should not be longer than 255 alpha-numeric characters.
238
+
> The download path name shouldn't be longer than 255 alpha-numeric characters for Windows OS. For Linux OS, the download path name shouldn't be longer than 4,096 alpha-numeric characters. Also, for Linux OS the file name (which is the last segment of the download path `/path/to/file/{filename}`) shouldn't be longer than 255 alpha-numeric characters.
239
239
240
240
The following code mounts `dataset` to the temp directory at `mounted_path`
241
241
@@ -255,7 +255,7 @@ print (mounted_path)
255
255
256
256
## Get datasets in machine learning scripts
257
257
258
-
Registered datasets are accessible both locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered dataset across experiments, use the following code to access your workspace and get the dataset that was used in your previously submitted run. By default, the [`get_by_name()`](/python/api/azureml-core/azureml.core.dataset.dataset#get-by-name-workspace--name--version--latest--) method on the `Dataset` class returns the latest version of the dataset that's registered with the workspace.
258
+
Registered datasets are accessible both locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered dataset across experiments, use the following code to access your workspace and get the dataset that was used in your previously submitted run. By default, the [`get_by_name()`](/python/api/azureml-core/azureml.core.dataset.dataset#get-by-name-workspace--name--version--latest--) method on the `Dataset` class returns the latest version of the dataset registered with the workspace.
Azure Blob storage has higher throughput speeds than an Azure file share, and will scale to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files.
279
+
Azure Blob storage has higher throughput speeds than an Azure file share, and scales to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files.
280
280
281
281
The following code example specifies in the run configuration which blob datastore to use for source code transfers.
**Unable to upload project files to working directory in AzureFile because the storage is overloaded**:
315
315
316
-
* If you use file share for other workloads, such as data transfer, the recommendation is to use blobs so that file share is free to be used for submitting runs.
316
+
* If you use file share for other workloads, such as data transfer, we recommend that you use blobs so that file share is free to be used for submitting runs.
317
317
318
318
* You can also split the workload between two different workspaces.
0 commit comments