Skip to content

Commit f4671ee

Browse files
Merge pull request #3757 from fbsolo-ms1/freshness-updates
Update ms.date for this V1 document. See the relevant user story for more information about this update.
2 parents 85e1401 + 44aacc2 commit f4671ee

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/machine-learning/v1/how-to-train-with-datasets.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: mldata
88
ms.author: yogipandey
99
author: ynpandey
1010
ms.reviewer: ssalgado
11-
ms.date: 10/21/2021
11+
ms.date: 03/26/2025
1212
ms.topic: how-to
1313
ms.custom: UpdateFrequency5, data4ml, sdkv1
1414
#Customer intent: As an experienced Python developer, I need to make my data available to my local or remote compute target to train my machine learning models.
@@ -235,7 +235,7 @@ When you **mount** a dataset, you attach the files referenced by the dataset to
235235
When you **download** a dataset, all the files referenced by the dataset are downloaded to the compute target. Downloading is supported for all compute types. If your script processes all files referenced by the dataset, and your compute disk can fit your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services. For multi-node downloads, see [how to avoid throttling](#troubleshooting).
236236

237237
> [!NOTE]
238-
> The download path name should not be longer than 255 alpha-numeric characters for Windows OS. For Linux OS, the download path name should not be longer than 4,096 alpha-numeric characters. Also, for Linux OS the file name (which is the last segment of the download path `/path/to/file/{filename}`) should not be longer than 255 alpha-numeric characters.
238+
> The download path name shouldn't be longer than 255 alpha-numeric characters for Windows OS. For Linux OS, the download path name shouldn't be longer than 4,096 alpha-numeric characters. Also, for Linux OS the file name (which is the last segment of the download path `/path/to/file/{filename}`) shouldn't be longer than 255 alpha-numeric characters.
239239
240240
The following code mounts `dataset` to the temp directory at `mounted_path`
241241

@@ -255,7 +255,7 @@ print (mounted_path)
255255

256256
## Get datasets in machine learning scripts
257257

258-
Registered datasets are accessible both locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered dataset across experiments, use the following code to access your workspace and get the dataset that was used in your previously submitted run. By default, the [`get_by_name()`](/python/api/azureml-core/azureml.core.dataset.dataset#get-by-name-workspace--name--version--latest--) method on the `Dataset` class returns the latest version of the dataset that's registered with the workspace.
258+
Registered datasets are accessible both locally and remotely on compute clusters like the Azure Machine Learning compute. To access your registered dataset across experiments, use the following code to access your workspace and get the dataset that was used in your previously submitted run. By default, the [`get_by_name()`](/python/api/azureml-core/azureml.core.dataset.dataset#get-by-name-workspace--name--version--latest--) method on the `Dataset` class returns the latest version of the dataset registered with the workspace.
259259

260260
```Python
261261
%%writefile $script_folder/train.py
@@ -276,7 +276,7 @@ df = titanic_ds.to_pandas_dataframe()
276276

277277
## Access source code during training
278278

279-
Azure Blob storage has higher throughput speeds than an Azure file share, and will scale to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files.
279+
Azure Blob storage has higher throughput speeds than an Azure file share, and scales to large numbers of jobs started in parallel. For this reason, we recommend configuring your runs to use Blob storage for transferring source code files.
280280

281281
The following code example specifies in the run configuration which blob datastore to use for source code transfers.
282282

@@ -313,7 +313,7 @@ myenv.environment_variables = {"AZUREML_DOWNLOAD_CONCURRENCY":64}
313313

314314
**Unable to upload project files to working directory in AzureFile because the storage is overloaded**:
315315

316-
* If you use file share for other workloads, such as data transfer, the recommendation is to use blobs so that file share is free to be used for submitting runs.
316+
* If you use file share for other workloads, such as data transfer, we recommend that you use blobs so that file share is free to be used for submitting runs.
317317

318318
* You can also split the workload between two different workspaces.
319319

0 commit comments

Comments
 (0)