Skip to content

Commit 6432ae3

Browse files
committed
Runs to Jobs
1 parent 53c7983 commit 6432ae3

8 files changed

+64
-62
lines changed

articles/machine-learning/how-to-use-automlstep-in-pipelines.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ ms.topic: how-to
1313
ms.custom: devx-track-python, automl, sdkv1, event-tier1-build-2022
1414
---
1515

16+
[//]: # (needs PM review; is stepjob correct?; jobconfiguration?)
17+
1618
# Use automated ML in an Azure Machine Learning pipeline in Python
1719

1820
[!INCLUDE [sdk v1](../../includes/machine-learning-sdk-v1.md)]
@@ -33,11 +35,11 @@ Automated ML in a pipeline is represented by an `AutoMLStep` object. The `AutoML
3335

3436
There are several subclasses of `PipelineStep`. In addition to the `AutoMLStep`, this article will show a `PythonScriptStep` for data preparation and another for registering the model.
3537

36-
The preferred way to initially move data _into_ an ML pipeline is with `Dataset` objects. To move data _between_ steps and possible save data output from runs, the preferred way is with [`OutputFileDatasetConfig`](/python/api/azureml-core/azureml.data.outputfiledatasetconfig) and [`OutputTabularDatasetConfig`](/python/api/azureml-core/azureml.data.output_dataset_config.outputtabulardatasetconfig) objects. To be used with `AutoMLStep`, the `PipelineData` object must be transformed into a `PipelineOutputTabularDataset` object. For more information, see [Input and output data from ML pipelines](how-to-move-data-in-out-of-pipelines.md).
38+
The preferred way to initially move data _into_ an ML pipeline is with `Dataset` objects. To move data _between_ steps and possible save data output from jobs, the preferred way is with [`OutputFileDatasetConfig`](/python/api/azureml-core/azureml.data.outputfiledatasetconfig) and [`OutputTabularDatasetConfig`](/python/api/azureml-core/azureml.data.output_dataset_config.outputtabulardatasetconfig) objects. To be used with `AutoMLStep`, the `PipelineData` object must be transformed into a `PipelineOutputTabularDataset` object. For more information, see [Input and output data from ML pipelines](how-to-move-data-in-out-of-pipelines.md).
3739

3840
The `AutoMLStep` is configured via an `AutoMLConfig` object. `AutoMLConfig` is a flexible class, as discussed in [Configure automated ML experiments in Python](./how-to-configure-auto-train.md#configure-your-experiment-settings).
3941

40-
A `Pipeline` runs in an `Experiment`. The pipeline `Run` has, for each step, a child `StepRun`. The outputs of the automated ML `StepRun` are the training metrics and highest-performing model.
42+
A `Pipeline` runs in an `Experiment`. The pipeline `Job` has, for each step, a child `StepJob`. The outputs of the automated ML `StepJob` are the training metrics and highest-performing model.
4143

4244
To make things concrete, this article creates a simple pipeline for a classification task. The task is predicting Titanic survival, but we won't be discussing the data or task except in passing.
4345

@@ -102,9 +104,9 @@ After that, the code checks if the AML compute target `'cpu-cluster'` already ex
102104

103105
The code blocks until the target is provisioned and then prints some details of the just-created compute target. Finally, the named compute target is retrieved from the workspace and assigned to `compute_target`.
104106

105-
### Configure the training run
107+
### Configure the training job
106108

107-
The runtime context is set by creating and configuring a `RunConfiguration` object. Here we set the compute target.
109+
The runtime context is set by creating and configuring a `JobConfiguration` object. Here we set the compute target.
108110

109111
```python
110112
from azureml.core.runconfig import RunConfiguration
@@ -249,7 +251,7 @@ Comparing the two techniques:
249251
|-|-|
250252
|`OutputTabularDatasetConfig`| Higher performance |
251253
|| Natural route from `OutputFileDatasetConfig` |
252-
|| Data isn't persisted after pipeline run |
254+
|| Data isn't persisted after pipeline job |
253255
|| |
254256
| Registered `Dataset` | Lower performance |
255257
| | Can be generated in many ways |
@@ -311,7 +313,7 @@ train_step = AutoMLStep(name='AutoML_Classification',
311313
enable_default_metrics_output=False,
312314
allow_reuse=True)
313315
```
314-
The snippet shows an idiom commonly used with `AutoMLConfig`. Arguments that are more fluid (hyperparameter-ish) are specified in a separate dictionary while the values less likely to change are specified directly in the `AutoMLConfig` constructor. In this case, the `automl_settings` specify a brief run: the run will stop after only 2 iterations or 15 minutes, whichever comes first.
316+
The snippet shows an idiom commonly used with `AutoMLConfig`. Arguments that are more fluid (hyperparameter-ish) are specified in a separate dictionary while the values less likely to change are specified directly in the `AutoMLConfig` constructor. In this case, the `automl_settings` specify a brief job: the job will stop after only 2 iterations or 15 minutes, whichever comes first.
315317

316318
The `automl_settings` dictionary is passed to the `AutoMLConfig` constructor as kwargs. The other parameters aren't complex:
317319

@@ -400,7 +402,7 @@ run = experiment.submit(pipeline, show_output=True)
400402
run.wait_for_completion()
401403
```
402404

403-
The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the run completes.
405+
The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the job completes.
404406

405407
### Examine pipeline results
406408

@@ -454,11 +456,11 @@ with open(model_filename, "rb" ) as f:
454456

455457
For more information on loading and working with existing models, see [Use an existing model with Azure Machine Learning](how-to-deploy-and-where.md).
456458

457-
### Download the results of an automated ML run
459+
### Download the results of an automated ML job
458460

459-
If you've been following along with the article, you'll have an instantiated `run` object. But you can also retrieve completed `Run` objects from the `Workspace` by way of an `Experiment` object.
461+
If you've been following along with the article, you'll have an instantiated `job` object. But you can also retrieve completed `Job` objects from the `Workspace` by way of an `Experiment` object.
460462

461-
The workspace contains a complete record of all your experiments and runs. You can either use the portal to find and download the outputs of experiments or use code. To access the records from a historic run, use Azure Machine Learning to find the ID of the run in which you are interested. With that ID, you can choose the specific `run` by way of the `Workspace` and `Experiment`.
463+
The workspace contains a complete record of all your experiments and jobs. You can either use the portal to find and download the outputs of experiments or use code. To access the records from a historic job, use Azure Machine Learning to find the ID of the job in which you are interested. With that ID, you can choose the specific `job` by way of the `Workspace` and `Experiment`.
462464

463465
```python
464466
# Retrieved from Azure Machine Learning web UI
@@ -467,9 +469,9 @@ experiment = ws.experiments['titanic_automl']
467469
run = next(run for run in ex.get_runs() if run.id == run_id)
468470
```
469471

470-
You would have to change the strings in the above code to the specifics of your historical run. The snippet above assumes that you've assigned `ws` to the relevant `Workspace` with the normal `from_config()`. The experiment of interest is directly retrieved and then the code finds the `Run` of interest by matching the `run.id` value.
472+
You would have to change the strings in the above code to the specifics of your historical job. The snippet above assumes that you've assigned `ws` to the relevant `Workspace` with the normal `from_config()`. The experiment of interest is directly retrieved and then the code finds the `Job` of interest by matching the `run.id` value.
471473

472-
Once you have a `Run` object, you can download the metrics and model.
474+
Once you have a `Job` object, you can download the metrics and model.
473475

474476
```python
475477
automl_run = next(r for r in run.get_children() if r.name == 'AutoML_Classification')
@@ -481,7 +483,7 @@ metrics.get_port_data_reference().download('.')
481483
model.get_port_data_reference().download('.')
482484
```
483485

484-
Each `Run` object contains `StepRun` objects that contain information about the individual pipeline step run. The `run` is searched for the `StepRun` object for the `AutoMLStep`. The metrics and model are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
486+
Each `Job` object contains `StepRun` objects that contain information about the individual pipeline step job. The `job` is searched for the `StepRun` object for the `AutoMLStep`. The metrics and model are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
485487

486488
Finally, the actual metrics and model are downloaded to your local machine, as was discussed in the "Examine pipeline results" section above.
487489

articles/machine-learning/how-to-use-batch-endpoint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ Follow the below steps to view the scoring results in Azure Storage Explorer whe
257257
258258
:::code language="azurecli" source="~/azureml-examples-main/cli/batch-score.sh" ID="show_job_in_studio" :::
259259
260-
1. In the graph of the run, select the `batchscoring` step.
260+
1. In the graph of the job, select the `batchscoring` step.
261261
1. Select the __Outputs + logs__ tab and then select **Show data outputs**.
262262
1. From __Data outputs__, select the icon to open __Storage Explorer__.
263263

articles/machine-learning/how-to-use-batch-endpoints-studio.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ To change where the results are stored, providing a blob store and output path w
118118

119119
### Summary of all submitted jobs
120120

121-
To see a summary of all the submitted jobs for an endpoint, select the endpoint and then select the **Runs** tab.
121+
To see a summary of all the submitted jobs for an endpoint, select the endpoint and then select the **Jobs** tab.
122122

123123
:::image type="content" source="media/how-to-use-batch-endpoints-studio/summary-jobs.png" alt-text="Screenshot of summary of jobs submitted to a batch endpoint":::
124124
## Check batch scoring results

0 commit comments

Comments
 (0)