Skip to content

Commit d94b46d

Browse files
authored
Update how-to-use-automlstep-in-pipelines.md
1 parent 6432ae3 commit d94b46d

File tree

1 file changed

+13
-15
lines changed

1 file changed

+13
-15
lines changed

articles/machine-learning/how-to-use-automlstep-in-pipelines.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ ms.topic: how-to
1313
ms.custom: devx-track-python, automl, sdkv1, event-tier1-build-2022
1414
---
1515

16-
[//]: # (needs PM review; is stepjob correct?; jobconfiguration?)
17-
1816
# Use automated ML in an Azure Machine Learning pipeline in Python
1917

2018
[!INCLUDE [sdk v1](../../includes/machine-learning-sdk-v1.md)]
@@ -35,11 +33,11 @@ Automated ML in a pipeline is represented by an `AutoMLStep` object. The `AutoML
3533

3634
There are several subclasses of `PipelineStep`. In addition to the `AutoMLStep`, this article will show a `PythonScriptStep` for data preparation and another for registering the model.
3735

38-
The preferred way to initially move data _into_ an ML pipeline is with `Dataset` objects. To move data _between_ steps and possible save data output from jobs, the preferred way is with [`OutputFileDatasetConfig`](/python/api/azureml-core/azureml.data.outputfiledatasetconfig) and [`OutputTabularDatasetConfig`](/python/api/azureml-core/azureml.data.output_dataset_config.outputtabulardatasetconfig) objects. To be used with `AutoMLStep`, the `PipelineData` object must be transformed into a `PipelineOutputTabularDataset` object. For more information, see [Input and output data from ML pipelines](how-to-move-data-in-out-of-pipelines.md).
36+
The preferred way to initially move data _into_ an ML pipeline is with `Dataset` objects. To move data _between_ steps and possible save data output from runs, the preferred way is with [`OutputFileDatasetConfig`](/python/api/azureml-core/azureml.data.outputfiledatasetconfig) and [`OutputTabularDatasetConfig`](/python/api/azureml-core/azureml.data.output_dataset_config.outputtabulardatasetconfig) objects. To be used with `AutoMLStep`, the `PipelineData` object must be transformed into a `PipelineOutputTabularDataset` object. For more information, see [Input and output data from ML pipelines](how-to-move-data-in-out-of-pipelines.md).
3937

4038
The `AutoMLStep` is configured via an `AutoMLConfig` object. `AutoMLConfig` is a flexible class, as discussed in [Configure automated ML experiments in Python](./how-to-configure-auto-train.md#configure-your-experiment-settings).
4139

42-
A `Pipeline` runs in an `Experiment`. The pipeline `Job` has, for each step, a child `StepJob`. The outputs of the automated ML `StepJob` are the training metrics and highest-performing model.
40+
A `Pipeline` runs in an `Experiment`. The pipeline `Run` has, for each step, a child `StepRun`. The outputs of the automated ML `StepRun` are the training metrics and highest-performing model.
4341

4442
To make things concrete, this article creates a simple pipeline for a classification task. The task is predicting Titanic survival, but we won't be discussing the data or task except in passing.
4543

@@ -104,9 +102,9 @@ After that, the code checks if the AML compute target `'cpu-cluster'` already ex
104102

105103
The code blocks until the target is provisioned and then prints some details of the just-created compute target. Finally, the named compute target is retrieved from the workspace and assigned to `compute_target`.
106104

107-
### Configure the training job
105+
### Configure the training run
108106

109-
The runtime context is set by creating and configuring a `JobConfiguration` object. Here we set the compute target.
107+
The runtime context is set by creating and configuring a `RunConfiguration` object. Here we set the compute target.
110108

111109
```python
112110
from azureml.core.runconfig import RunConfiguration
@@ -251,7 +249,7 @@ Comparing the two techniques:
251249
|-|-|
252250
|`OutputTabularDatasetConfig`| Higher performance |
253251
|| Natural route from `OutputFileDatasetConfig` |
254-
|| Data isn't persisted after pipeline job |
252+
|| Data isn't persisted after pipeline run |
255253
|| |
256254
| Registered `Dataset` | Lower performance |
257255
| | Can be generated in many ways |
@@ -313,7 +311,7 @@ train_step = AutoMLStep(name='AutoML_Classification',
313311
enable_default_metrics_output=False,
314312
allow_reuse=True)
315313
```
316-
The snippet shows an idiom commonly used with `AutoMLConfig`. Arguments that are more fluid (hyperparameter-ish) are specified in a separate dictionary while the values less likely to change are specified directly in the `AutoMLConfig` constructor. In this case, the `automl_settings` specify a brief job: the job will stop after only 2 iterations or 15 minutes, whichever comes first.
314+
The snippet shows an idiom commonly used with `AutoMLConfig`. Arguments that are more fluid (hyperparameter-ish) are specified in a separate dictionary while the values less likely to change are specified directly in the `AutoMLConfig` constructor. In this case, the `automl_settings` specify a brief run: the run will stop after only 2 iterations or 15 minutes, whichever comes first.
317315

318316
The `automl_settings` dictionary is passed to the `AutoMLConfig` constructor as kwargs. The other parameters aren't complex:
319317

@@ -402,7 +400,7 @@ run = experiment.submit(pipeline, show_output=True)
402400
run.wait_for_completion()
403401
```
404402

405-
The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the job completes.
403+
The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the run completes.
406404

407405
### Examine pipeline results
408406

@@ -456,11 +454,11 @@ with open(model_filename, "rb" ) as f:
456454

457455
For more information on loading and working with existing models, see [Use an existing model with Azure Machine Learning](how-to-deploy-and-where.md).
458456

459-
### Download the results of an automated ML job
457+
### Download the results of an automated ML run
460458

461-
If you've been following along with the article, you'll have an instantiated `job` object. But you can also retrieve completed `Job` objects from the `Workspace` by way of an `Experiment` object.
459+
If you've been following along with the article, you'll have an instantiated `run` object. But you can also retrieve completed `Run` objects from the `Workspace` by way of an `Experiment` object.
462460

463-
The workspace contains a complete record of all your experiments and jobs. You can either use the portal to find and download the outputs of experiments or use code. To access the records from a historic job, use Azure Machine Learning to find the ID of the job in which you are interested. With that ID, you can choose the specific `job` by way of the `Workspace` and `Experiment`.
461+
The workspace contains a complete record of all your experiments and runs. You can either use the portal to find and download the outputs of experiments or use code. To access the records from a historic run, use Azure Machine Learning to find the ID of the run in which you are interested. With that ID, you can choose the specific `run` by way of the `Workspace` and `Experiment`.
464462

465463
```python
466464
# Retrieved from Azure Machine Learning web UI
@@ -469,9 +467,9 @@ experiment = ws.experiments['titanic_automl']
469467
run = next(run for run in ex.get_runs() if run.id == run_id)
470468
```
471469

472-
You would have to change the strings in the above code to the specifics of your historical job. The snippet above assumes that you've assigned `ws` to the relevant `Workspace` with the normal `from_config()`. The experiment of interest is directly retrieved and then the code finds the `Job` of interest by matching the `run.id` value.
470+
You would have to change the strings in the above code to the specifics of your historical run. The snippet above assumes that you've assigned `ws` to the relevant `Workspace` with the normal `from_config()`. The experiment of interest is directly retrieved and then the code finds the `Run` of interest by matching the `run.id` value.
473471

474-
Once you have a `Job` object, you can download the metrics and model.
472+
Once you have a `Run` object, you can download the metrics and model.
475473

476474
```python
477475
automl_run = next(r for r in run.get_children() if r.name == 'AutoML_Classification')
@@ -483,7 +481,7 @@ metrics.get_port_data_reference().download('.')
483481
model.get_port_data_reference().download('.')
484482
```
485483

486-
Each `Job` object contains `StepRun` objects that contain information about the individual pipeline step job. The `job` is searched for the `StepRun` object for the `AutoMLStep`. The metrics and model are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
484+
Each `Run` object contains `StepRun` objects that contain information about the individual pipeline step run. The `run` is searched for the `StepRun` object for the `AutoMLStep`. The metrics and model are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
487485

488486
Finally, the actual metrics and model are downloaded to your local machine, as was discussed in the "Examine pipeline results" section above.
489487

0 commit comments

Comments
 (0)