You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-move-data-in-out-of-pipelines.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ This article provides code for importing data, transforming data, and moving dat
24
24
25
25
This article shows how to:
26
26
27
-
- Use `Dataset` objects for pre-existing data
27
+
- Use `Dataset` objects for preexisting data
28
28
- Access data within your steps
29
29
- Split `Dataset` data into subsets, such as training and validation subsets
30
30
- Create `OutputFileDatasetConfig` objects to transfer data to the next pipeline step
@@ -48,11 +48,11 @@ This article shows how to:
48
48
ws = Workspace.from_config()
49
49
```
50
50
51
-
- Some pre-existing data. This article briefly shows the use of an [Azure blob container](/azure/storage/blobs/storage-blobs-overview).
51
+
- Some preexisting data. This article briefly shows the use of an [Azure blob container](/azure/storage/blobs/storage-blobs-overview).
52
52
53
53
- Optional: An existing machine learning pipeline, such as the one described in [Create and run machine learning pipelines with Azure Machine Learning SDK](./how-to-create-machine-learning-pipelines.md).
54
54
55
-
## Use `Dataset` objects for pre-existing data
55
+
## Use `Dataset` objects for preexisting data
56
56
57
57
The preferred way to ingest data into a pipeline is to use a [Dataset](/python/api/azureml-core/azureml.core.dataset%28class%29) object. `Dataset` objects represent persistent data that's available throughout a workspace.
Azure doesn't automatically delete intermediate data that's written with `OutputFileDatasetConfig`. To avoid storage charges for large amounts of unneeded data, you should take one of the following actions:
251
251
252
252
* Programmatically delete intermediate data at the end of a pipeline job, when it's no longer needed.
253
-
* Use blob storage with a short-term storage policy for intermediate data. (See [Optimize costs by automating Azure Blob Storage access tiers](/azure/storage/blobs/lifecycle-management-overview).) This policy can be set only on a workspace's non-default datastore. Use `OutputFileDatasetConfig` to export intermediate data to another datastore that isn't the default.
253
+
* Use blob storage with a short-term storage policy for intermediate data. (See [Optimize costs by automating Azure Blob Storage access tiers](/azure/storage/blobs/lifecycle-management-overview).) This policy can be set only on a workspace's nondefault datastore. Use `OutputFileDatasetConfig` to export intermediate data to another datastore that isn't the default.
254
254
255
255
```Python
256
256
# Get Data Lake Storage Gen2 datastore that's already registered with the workspace
@@ -263,7 +263,7 @@ Azure doesn't automatically delete intermediate data that's written with `Output
263
263
> [!CAUTION]
264
264
> Only delete intermediate data after 30 days from the last change date of the data. Deleting intermediate data earlier could cause the pipeline run to fail because the pipeline assumes the data exists for a 30 day period for reuse.
265
265
266
-
For more information, see [Plan and manage costs for Azure Machine Learning](../concept-plan-manage-cost.md).
266
+
For more information, see [Plan to manage costs for Azure Machine Learning](../concept-plan-manage-cost.md).
0 commit comments