Skip to content

Commit f2cfa64

Browse files
author
Larry O'Brien
committed
Added reference doc links
1 parent 2a6e229 commit f2cfa64

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

articles/machine-learning/how-to-move-data-in-and-out-of-pipelines.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ You'll need:
5151

5252
## Use `Dataset` objects for pre-existing data
5353

54-
The preferred way to ingest data into a pipeline is to use a `Dataset` object. `Dataset` objects represent persistent data available throughout a workspace.
54+
The preferred way to ingest data into a pipeline is to use a [Dataset](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset%28class%29?view=azure-ml-py) object. `Dataset` objects represent persistent data available throughout a workspace.
5555

5656
There are many ways to create and register `Dataset` objects. Tabular datasets are for delimited data available in one or more files. File datasets are for binary data (such as images) or for data that you'll parse. The simplest programmatic ways to create `Dataset` objects are to use existing blobs in workspace storage or public URLs:
5757

@@ -65,7 +65,7 @@ cats_dogs_dataset = Dataset.File.from_files(
6565
)
6666
```
6767

68-
For more options on creating datasets with different options and from different sources, registering them and reviewing them in the Azure Machine Learning UI, understanding how data size interacts with compute capacity, and versioning them, see [Create Azure Machine Learning datasets](how-to-create-register-datasets.md).
68+
For more options on creating datasets with different options and from different sources, registering them and reviewing them in the Azure Machine Learning UI, understanding how data size interacts with compute capacity, and versioning them, see [Create Azure Machine Learning datasets](how-to-create-register-datasets.md).
6969

7070
### Pass a dataset to your script
7171

@@ -142,7 +142,7 @@ ds = Dataset.get_by_name(workspace=ws, name='mnist_opendataset')
142142

143143
## Use `PipelineData` for intermediate data
144144

145-
While `Dataset` objects represent persistent data, `PipelineData` objects are used for temporary data that is output from pipeline steps. Because the lifespan of a `PipelineData` object is longer than a single pipeline step, you define them in the pipeline definition script. When you create a `PipelineData` object, you must provide a name and a datastore at which the data will reside. Pass your `PipelineData` object(s) to your `PythonScriptStep` using _both_ the `arguments` and the `outputs` arguments:
145+
While `Dataset` objects represent persistent data, [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) objects are used for temporary data that is output from pipeline steps. Because the lifespan of a `PipelineData` object is longer than a single pipeline step, you define them in the pipeline definition script. When you create a `PipelineData` object, you must provide a name and a datastore at which the data will reside. Pass your `PipelineData` object(s) to your `PythonScriptStep` using _both_ the `arguments` and the `outputs` arguments:
146146

147147
```python
148148
default_datastore = workspace.get_default_datastore()
@@ -179,6 +179,8 @@ with open(args.output_path, 'w') as f:
179179
f.write("Step 1's output")
180180
```
181181

182+
If you created your `PipelineData` with the `is_directory` argument set to `True` it would be enough to just perform the `os.makedirs()` call and then you would be free to write whatever files you wished to the path. For more details and other options, see the [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) reference documentation.
183+
182184
### Read `PipelineData` as an input to non-initial steps
183185

184186
After the initial pipeline step writes some data to the `PipelineData` path and it becomes an output of that initial step, it can be used as an input to a later step:

0 commit comments

Comments
 (0)