Skip to content

Commit 9b8c935

Browse files
committed
Added 'Examine pipeline results' H3
1 parent 66af786 commit 9b8c935

File tree

1 file changed

+62
-10
lines changed

1 file changed

+62
-10
lines changed

articles/machine-learning/how-to-use-automlstep-in-pipelines.md

Lines changed: 62 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ model_data = PipelineData(name='best_model_data',
282282
training_output=TrainingOutput(type='Model'))
283283
```
284284

285-
The snippet above creates the two `PipelineData` objects for the metrics and model output. Each is named, assigned to the default datastore retrieved earlier, and associated with the particular `type` of `TrainingOutput` from the `AutoMLStep`.
285+
The snippet above creates the two `PipelineData` objects for the metrics and model output. Each is named, assigned to the default datastore retrieved earlier, and associated with the particular `type` of `TrainingOutput` from the `AutoMLStep`. Because we assign `pipeline_output_name` on these `PipelineData` objects, their values will be available not just from the individual pipeline step, but from the pipeline as a whole, as will be discussed below in the section "Examine pipeline results."
286286

287287
### Configure and create the automated ML pipeline step
288288

@@ -407,16 +407,71 @@ run.wait_for_completion()
407407

408408
The code above combines the data preparation, automated ML, and model-registering steps into a `Pipeline` object. It then creates an `Experiment` object. The `Experiment` constructor will retrieve the named experiment if it exists or create it if necessary. It submits the `Pipeline` to the `Experiment`, creating a `Run` object that will asynchronously run the pipeline. The `wait_for_completion()` function blocks until the run completes.
409409

410+
### Examine pipeline results
411+
412+
Once the `run` completes, you can retrieve `PipelineData` objects that have been assigned a `pipeline_output_name`.
413+
414+
```python
415+
metrics_output = run.get_pipeline_output('metrics_output')
416+
model_output = run.get_pipeline_output('model_output')
417+
```
418+
419+
You can work directly with the results or download and reload them at a later time for further processing.
420+
421+
```python
422+
metrics_output.download('.', show_progress=True)
423+
model_output.download('.', show_progress=True)
424+
```
425+
426+
Downloaded files are written to the sub-directory `azureml/{run.id}/`. The metrics file is JSON-formatted and can be converted into a Pandas dataframe for examination.
427+
428+
```python
429+
import pandas as pd
430+
import json
431+
432+
metrics_filename = metrics_output._path_on_datastore
433+
# metrics_filename = path to downloaded file
434+
with open(metrics_filename) as f:
435+
metrics_output_result = f.read()
436+
437+
deserialized_metrics_output = json.loads(metrics_output_result)
438+
df = pd.DataFrame(deserialized_metrics_output)
439+
```
440+
441+
The code snippet above shows the metrics file being loaded from it's location on the Azure datastore. You can also load it from the downloaded file, as shown in the comment. Once you've deserialized it and converted it to a Pandas DataFrame, you can see detailed metrics for each of the iterations of the automated ML step.
442+
443+
The model file can be deserialized into a `Model` object that you can use for inferencing, further metrics analysis, and so forth.
444+
445+
```python
446+
import pickle
447+
448+
model_filename = model_output._path_on_datastore
449+
# model_filename = path to downloaded file
450+
451+
with open(model_filename, "rb" ) as f:
452+
best_model = pickle.load(f)
453+
454+
# ... inferencing code not shown ...
455+
```
456+
410457
### Download the results of an automated ML run
411458

412-
While the `run` object in the code above is from the actively running context, you can also retrieve completed `Run` objects from the `Workspace` by way of an `Experiment` object.
459+
If you've been following along with the article, you'll have an instantiated `run` object. But you can also retrieve completed `Run` objects from the `Workspace` by way of an `Experiment` object.
413460

414-
The workspace contains a complete record of all your experiments and runs. You can either use the portal to find and download the outputs of experiments or use code.
461+
The workspace contains a complete record of all your experiments and runs. You can either use the portal to find and download the outputs of experiments or use code. To access the records from a historic run, use Azure Machine Learning to find the id of the run in which you are interested. With that, you can choose the specific `run` by way of the `Workspace` and `Experiment`.
415462

416463
```python
417-
# Run on local machine
464+
# Retrieved from Azure Machine Learning web UI
465+
run_id = 'aaaaaaaa-bbbb-cccc-dddd-0123456789AB'
418466
experiment = ws.experiments['titanic_automl']
419-
run = next(run for run in ex.get_runs() if run.id == 'aaaaaaaa-bbbb-cccc-dddd-0123456789AB')
467+
run = next(run for run in ex.get_runs() if run.id == run_id)
468+
```
469+
470+
You would have to change the strings in the above code to the specifics of your historical run. The snippet above assumes that you've assigned `ws` to the relevant `Workspace` with the normal `from_config()`. The experiment of interest is directly retrieved and then the code finds the `Run` of interest by matching the `run.id` value.
471+
472+
Once you have a `Run` object, you can download the metrics and model.
473+
474+
```python
420475
automl_run = next(r for r in run.get_children() if r.name == 'AutoML_Classification')
421476
outputs = automl_run.get_outputs()
422477
metrics = outputs['default_metrics_AutoML_Classification']
@@ -426,12 +481,9 @@ metrics.get_port_data_reference().download('.')
426481
model.get_port_data_reference().download('.')
427482
```
428483

429-
The above snippet would run on your local machine. First, it logs on to the workspace. It retrieves the `Experiment` named `titanic_automl` and from that `Experiment`, the `Run` in which you're interested. Notice that you'd set the value being compared to `run.id` to that of the run in which you're interested.
430-
431-
Each `Run` object contains `StepRun` objects that contain information about the individual pipeline step run. The `run` is searched for the `StepRun` object for the `AutoMLStep`. The outputs are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
432-
433-
Finally, the actual metrics and model are downloaded to your local machine for further processing.
484+
Each `Run` object contains `StepRun` objects that contain information about the individual pipeline step run. The `run` is searched for the `StepRun` object for the `AutoMLStep`. The metrics and model are retrieved using their default names, which are available even if you don't pass `PipelineData` objects to the `outputs` parameter of the `AutoMLStep`.
434485

486+
Finally, the actual metrics and model are downloaded to your local machine, as was discussed in the "Examine pipeline results" section above.
435487

436488
## Next Steps
437489

0 commit comments

Comments
 (0)