Skip to content

Commit be003cb

Browse files
Merge pull request #276848 from Blackmist/208351-fresh
freshness
2 parents 2031c56 + 9150093 commit be003cb

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/machine-learning/v1/how-to-save-write-experiment-files.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,27 @@ manager: danielsc
99
ms.service: machine-learning
1010
ms.subservice: core
1111
ms.topic: how-to
12-
ms.date: 01/25/2023
12+
ms.date: 05/31/2024
1313

1414
---
1515
# Where to save and write files for Azure Machine Learning experiments
1616
[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)]
1717

1818
In this article, you learn where to save input files, and where to write output files from your experiments to prevent storage limit errors and experiment latency.
1919

20-
When launching training jobs on a [compute target](../concept-compute-target.md), they are isolated from outside environments. The purpose of this design is to ensure reproducibility and portability of the experiment. If you run the same script twice, on the same or another compute target, you receive the same results. With this design, you can treat compute targets as stateless computation resources, each having no affinity to the jobs that are running after they are finished.
20+
When you run training jobs on a [compute target](../concept-compute-target.md), they're isolated from outside environments. The purpose of this design is to ensure reproducibility and portability of the experiment. If you run the same script twice, on the same or another compute target, you receive the same results. With this design, you can treat compute targets as stateless computation resources, each having no affinity to the jobs that are running after they're finished.
2121

2222
## Where to save input files
2323

24-
Before you can initiate an experiment on a compute target or your local machine, you must ensure that the necessary files are available to that compute target, such as dependency files and data files your code needs to run.
24+
Before you can initiate an experiment on a compute target or your local machine, you must ensure that the necessary files are available to that compute target. For example, dependency files and data files your code needs to run.
2525

2626
Azure Machine Learning jobs training scripts by copying the entire source directory. If you have sensitive data that you don't want to upload, use a [.ignore file](how-to-save-write-experiment-files.md#storage-limits-of-experiment-snapshots) or don't include it in the source directory. Instead, access your data using a [datastore](/python/api/azureml-core/azureml.data).
2727

28-
The storage limit for experiment snapshots is 300 MB and/or 2000 files.
28+
The storage limit for experiment snapshots is 300 MB and/or 2,000 files.
2929

3030
For this reason, we recommend:
3131

32-
* **Storing your files in an Azure Machine Learning [dataset](/python/api/azureml-core/azureml.data).** This prevents experiment latency issues, and has the advantages of accessing data from a remote compute target, which means authentication and mounting are managed by Azure Machine Learning. Learn more about how to specify a dataset as your input data source in your training script with [Train with datasets](how-to-train-with-datasets.md).
32+
* **Storing your files in an Azure Machine Learning [dataset](/python/api/azureml-core/azureml.data).** Using datasets prevents experiment latency issues, and has the advantages of accessing data from a remote compute target. Azure Machine Learning handles authentication and mounting of the dataset. Learn more about how to specify a dataset as your input data source in your training script with [Train with datasets](how-to-train-with-datasets.md).
3333

3434
* **If you only need a couple data files and dependency scripts and can't use a datastore,** place the files in the same folder directory as your training script. Specify this folder as your `source_directory` directly in your training script, or in the code that calls your training script.
3535

@@ -39,7 +39,7 @@ For this reason, we recommend:
3939

4040
For experiments, Azure Machine Learning automatically makes an experiment snapshot of your code based on the directory you suggest when you configure the job. For a pipeline, the directory is configured for each step.
4141

42-
This has a total limit of 300 MB and/or 2000 files. If you exceed this limit, you'll see the following error:
42+
This has a total limit of 300 MB and/or 2,000 files. If you exceed this limit, you see the following error:
4343

4444
```Python
4545
While attempting to take snapshot of .
@@ -50,23 +50,23 @@ To resolve this error, store your experiment files on a datastore. If you can't
5050

5151
Experiment description|Storage limit solution
5252
---|---
53-
Less than 2000 files & can't use a datastore| Override snapshot size limit with <br> `azureml._restclient.snapshots_client.SNAPSHOT_MAX_SIZE_BYTES = 'insert_desired_size'` and `azureml._restclient.constants.SNAPSHOT_MAX_SIZE_BYTES = 'insert_desired_size'`<br> This may take several minutes depending on the number and size of files.
53+
Less than 2,000 files & can't use a datastore| Override snapshot size limit with <br> `azureml._restclient.snapshots_client.SNAPSHOT_MAX_SIZE_BYTES = 'insert_desired_size'` and `azureml._restclient.constants.SNAPSHOT_MAX_SIZE_BYTES = 'insert_desired_size'`<br> This might take several minutes depending on the number and size of files.
5454
Must use specific script directory| [!INCLUDE [amlinclude-info](../includes/machine-learning-amlignore-gitignore.md)]
5555
Pipeline|Use a different subdirectory for each step
56-
Jupyter notebooks| Create a `.amlignore` file or move your notebook into a new, empty, subdirectory and run your code again.
56+
Jupyter notebooks| Create a `.amlignore` file or move your notebook into a new, empty, subdirectory, and run your code again.
5757

5858
## Where to write files
5959

60-
Due to the isolation of training experiments, the changes to files that happen during jobs are not necessarily persisted outside of your environment. If your script modifies the files local to compute, the changes are not persisted for your next experiment job, and they're not propagated back to the client machine automatically. Therefore, the changes made during the first experiment job don't and shouldn't affect those in the second.
60+
Due to the isolation of training experiments, the changes to files that happen during jobs aren't necessarily persisted outside of your environment. If your script modifies the files local to compute, the changes aren't persisted for your next experiment job, and they're not propagated back to the client machine automatically. Therefore, the changes made during the first experiment job don't and shouldn't affect those in the second.
6161

6262
When writing changes, we recommend writing files to storage via an Azure Machine Learning dataset with an [OutputFileDatasetConfig object](/python/api/azureml-core/azureml.data.output_dataset_config.outputfiledatasetconfig). See [how to create an OutputFileDatasetConfig](how-to-train-with-datasets.md#where-to-write-training-output).
6363

6464
Otherwise, write files to the `./outputs` and/or `./logs` folder.
6565

66-
>[!Important]
66+
> [!IMPORTANT]
6767
> Two folders, *outputs* and *logs*, receive special treatment by Azure Machine Learning. During training, when you write files to`./outputs` and`./logs` folders, the files will automatically upload to your job history, so that you have access to them once your job is finished.
6868
69-
* **For output such as status messages or scoring results,** write files to the `./outputs` folder, so they are persisted as artifacts in job history. Be mindful of the number and size of files written to this folder, as latency may occur when the contents are uploaded to job history. If latency is a concern, writing files to a datastore is recommended.
69+
* **For output such as status messages or scoring results,** write files to the `./outputs` folder, so they're persisted as artifacts in job history. Be mindful of the number and size of files written to this folder, as latency might occur when the contents are uploaded to job history. If latency is a concern, writing files to a datastore is recommended.
7070

7171
* **To save written file as logs in job history,** write files to `./logs` folder. The logs are uploaded in real time, so this method is suitable for streaming live updates from a remote job.
7272

0 commit comments

Comments
 (0)