Skip to content

Commit d528ccb

Browse files
author
Larry O'Brien
committed
Parallel verbs in h2/h3s
1 parent 95fab50 commit d528ccb

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/machine-learning/how-to-move-data-in-and-out-of-pipelines.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@ This article will show you how to:
2323
- Use `Dataset` objects for pre-existing data
2424
- Access data within your steps
2525
- Split `Dataset` data into subsets, such as training and validation subsets
26-
- Create a `PipelineData` object to transfer data to the next pipeline step
26+
- Create `PipelineData` objects to transfer data to the next pipeline step
2727
- Use `PipelineData` objects as input to pipeline steps
28-
- Create a new `Dataset` object from `PipelineData` you wish to persist
28+
- Create new `Dataset` objects from `PipelineData` you wish to persist
2929

3030
## Prerequisites
3131

@@ -68,7 +68,7 @@ cats_dogs_dataset = Dataset.File.from_files(
6868

6969
For more options on creating datasets with different options and from different sources, registering them and reviewing them in the Azure Machine Learning UI, understanding how data size interacts with compute capacity, and versioning them, see [Create Azure Machine Learning datasets](how-to-create-register-datasets.md).
7070

71-
### Pass a dataset to your script
71+
### Pass datasets to your script
7272

7373
To pass the dataset's path to your script, use the `Dataset` object's `as_named_input()` method. You can either pass the resulting `DatasetConsumptionConfig` object to your script as an argument or, by using the `inputs` argument to your pipeline script, you can retrieve the dataset using `Run.get_context().input_datasets[]`.
7474

@@ -107,7 +107,7 @@ train_step = PythonScriptStep(
107107
)
108108
```
109109

110-
### Access a dataset within your script
110+
### Access datasets within your script
111111

112112
Named inputs to your pipeline step script are available as a dictionary within the `Run` object. Retrieve the active `Run` object using `Run.get_context()` and then retrieve the dictionary of named inputs using `input_datasets`. If you passed the `DatasetConsumptionConfig` object using the `arguments` argument rather than the `inputs` argument, access the data using `ArgParser` code. Both techniques are demonstrated in the following snippet.
113113

@@ -165,7 +165,7 @@ You may choose to create your `PipelineData` object using an access mode that pr
165165
PipelineData("clean_data", datastore=def_blob_store, output_mode="upload", output_path_on_compute="clean_data_output/")
166166
```
167167

168-
### Use `PipelineData` as an output of a training step
168+
### Use `PipelineData` as outputs of a training step
169169

170170
Within your pipeline's `PythonScriptStep`, you can retrieve the available output paths using the program's arguments. If this step is the first and will initialize the output data, you must create the directory at the specified path. You can then write whatever files you wish to be contained in the `PipelineData`.
171171

@@ -182,7 +182,7 @@ with open(args.output_path, 'w') as f:
182182

183183
If you created your `PipelineData` with the `is_directory` argument set to `True`, it would be enough to just perform the `os.makedirs()` call and then you would be free to write whatever files you wished to the path. For more details, see the [PipelineData](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?view=azure-ml-py) reference documentation.
184184

185-
### Read `PipelineData` as an input to non-initial steps
185+
### Read `PipelineData` as inputs to non-initial steps
186186

187187
After the initial pipeline step writes some data to the `PipelineData` path and it becomes an output of that initial step, it can be used as an input to a later step:
188188

@@ -221,7 +221,7 @@ with open(args.pd) as f:
221221
print(f.read())
222222
```
223223

224-
## Convert a `PipelineData` object into a registered `Dataset` for further processing
224+
## Convert `PipelineData` objects into registered `Dataset`s for further processing
225225

226226
If you'd like to make your `PipelineData` available for longer than the duration of a run, use its `as_dataset()` function to convert it to a `Dataset`. You may then register the `Dataset`, making it a first-class citizen in your workspace. Since your `PipelineData` object will have a different path every time the pipeline runs, it's highly recommended that you set `create_new_version` to `True` when registering a `Dataset` created from a `PipelineData` object.
227227

0 commit comments

Comments
 (0)