Skip to content

Commit a67b4c8

Browse files
Merge pull request #211338 from lgayhardt/patch-98
Deleting intermediate data update
2 parents fe0b91e + dbec787 commit a67b4c8

File tree

1 file changed

+11
-3
lines changed

1 file changed

+11
-3
lines changed

articles/machine-learning/v1/how-to-move-data-in-out-of-pipelines.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -235,9 +235,17 @@ step1_output_ds = step1_output_data.register_on_complete(name='processed_data',
235235

236236
Azure does not automatically delete intermediate data written with `OutputFileDatasetConfig`. To avoid storage charges for large amounts of unneeded data, you should either:
237237

238-
* Programmatically delete intermediate data at the end of a pipeline job, when it is no longer needed
239-
* Use blob storage with a short-term storage policy for intermediate data (see [Optimize costs by automating Azure Blob Storage access tiers](/azure/storage/blobs/lifecycle-management-overview))
240-
* Regularly review and delete no-longer-needed data
238+
> [!CAUTION]
239+
> Only delete intermediate data after 30 days from the last change date of the data. Deleting the data earlier could cause the pipeline run to fail because the pipeline will assume the intermediate data exists within 30 day period for reuse.
240+
241+
* Programmatically delete intermediate data at the end of a pipeline job, when it is no longer needed.
242+
* Use blob storage with a short-term storage policy for intermediate data (see [Optimize costs by automating Azure Blob Storage access tiers](/azure/storage/blobs/lifecycle-management-overview)). This policy can only be set to a workspace's non-default datastore. Use `OutputFileDatasetConfig` to export intermediate data to another datastore that isn't the default.
243+
```Python
244+
# Get adls gen 2 datastore already registered with the workspace
245+
datastore = workspace.datastores['my_adlsgen2']
246+
step1_output_data = OutputFileDatasetConfig(name="processed_data", destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()
247+
```
248+
* Regularly review and delete no-longer-needed data.
241249

242250
For more information, see [Plan and manage costs for Azure Machine Learning](../concept-plan-manage-cost.md).
243251

0 commit comments

Comments
 (0)