Skip to content

Commit 1ef698a

Browse files
authored
Merge pull request #269560 from cdpark/msakande-customize-outputs
User Story 229636: Q&M: Azure Machine Learning freshness updates - Custom outputs
2 parents 7c37678 + e4994ef commit 1ef698a

File tree

3 files changed

+54
-53
lines changed

3 files changed

+54
-53
lines changed

articles/machine-learning/how-to-deploy-model-custom-output.md

Lines changed: 47 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: inferencing
88
ms.topic: how-to
99
author: santiagxf
1010
ms.author: fasantia
11-
ms.date: 10/10/2022
11+
ms.date: 03/18/2024
1212
ms.reviewer: mopeakande
1313
ms.custom: devplatv2, update-code
1414
---
@@ -17,21 +17,21 @@ ms.custom: devplatv2, update-code
1717

1818
[!INCLUDE [ml v2](includes/machine-learning-dev-v2.md)]
1919

20-
Sometimes you need to execute inference having a higher control of what is being written as output of the batch job. Those cases include:
20+
This guide explains how to create deployments that generate custom outputs and files. Sometimes you need more control over what's written as output from batch inference jobs. These cases include the following situations:
2121

2222
> [!div class="checklist"]
23-
> * You need to control how the predictions are being written in the output. For instance, you want to append the prediction to the original data (if data is tabular).
24-
> * You need to write your predictions in a different file format from the one supported out-of-the-box by batch deployments.
23+
> * You need to control how predictions are written in the output. For instance, you want to append the prediction to the original data if the data is tabular.
24+
> * You need to write your predictions in a different file format than the one supported out-of-the-box by batch deployments.
2525
> * Your model is a generative model that can't write the output in a tabular format. For instance, models that produce images as outputs.
26-
> * Your model produces multiple tabular files instead of a single one. This is the case for instance of models that perform forecasting considering multiple scenarios.
26+
> * Your model produces multiple tabular files instead of a single one. For example, models that perform forecasting by considering multiple scenarios.
2727
28-
In any of those cases, Batch Deployments allow you to take control of the output of the jobs by allowing you to write directly to the output of the batch deployment job. In this tutorial, we'll see how to deploy a model to perform batch inference and writes the outputs in `parquet` format by appending the predictions to the original input data.
28+
Batch deployments allow you to take control of the output of the jobs by letting you write directly to the output of the batch deployment job. In this tutorial, you learn how to deploy a model to perform batch inference and write the outputs in *parquet* format by appending the predictions to the original input data.
2929

3030
## About this sample
3131

32-
This example shows how you can deploy a model to perform batch inference and customize how your predictions are written in the output. This example uses a model based on the [UCI Heart Disease Data Set](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The database contains 76 attributes, but we are using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence).
32+
This example shows how you can deploy a model to perform batch inference and customize how your predictions are written in the output. The model is based on the [UCI Heart Disease dataset](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The database contains 76 attributes, but this example uses a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It's integer valued from 0 (no presence) to 1 (presence).
3333

34-
The model has been trained using an `XGBBoost` classifier and all the required preprocessing has been packaged as a `scikit-learn` pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.
34+
The model was trained using an `XGBBoost` classifier and all the required preprocessing was packaged as a `scikit-learn` pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.
3535

3636
[!INCLUDE [machine-learning-batch-clone](includes/azureml-batch-clone-samples.md)]
3737

@@ -41,34 +41,35 @@ The files for this example are in:
4141
cd endpoints/batch/deploy-models/custom-outputs-parquet
4242
```
4343

44-
### Follow along in Jupyter Notebooks
44+
### Follow along in a Jupyter notebook
4545

46-
You can follow along this sample in a Jupyter Notebook. In the cloned repository, open the notebook: [custom-output-batch.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb).
46+
There's a Jupyter notebook that you can use to follow this example. In the cloned repository, open the notebook called [custom-output-batch.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb).
4747

4848
## Prerequisites
4949

5050
[!INCLUDE [machine-learning-batch-prereqs](includes/azureml-batch-prereqs.md)]
5151

52-
## Creating a batch deployment with a custom output
52+
## Create a batch deployment with a custom output
5353

54-
In this example, we are going to create a deployment that can write directly to the output folder of the batch deployment job. The deployment will use this feature to write custom parquet files.
54+
In this example, you create a deployment that can write directly to the output folder of the batch deployment job. The deployment uses this feature to write custom parquet files.
5555

56-
### Registering the model
56+
### Register the model
57+
58+
You can only deploy registered models using a batch endpoint. In this case, you already have a local copy of the model in the repository, so you only need to publish the model to the registry in the workspace. You can skip this step if the model you're trying to deploy is already registered.
5759

58-
Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.
59-
6060
# [Azure CLI](#tab/cli)
6161

6262
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deploy-and-run.sh" ID="register_model" :::
6363

6464
# [Python](#tab/python)
6565

6666
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=register_model)]
67+
6768
---
6869

69-
### Creating a scoring script
70+
### Create a scoring script
7071

71-
We need to create a scoring script that can read the input data provided by the batch deployment and return the scores of the model. We are also going to write directly to the output folder of the job. In summary, the proposed scoring script does as follows:
72+
You need to create a scoring script that can read the input data provided by the batch deployment and return the scores of the model. You're also going to write directly to the output folder of the job. In summary, the proposed scoring script does as follows:
7273

7374
1. Reads the input data as CSV files.
7475
2. Runs an MLflow model `predict` function over the input data.
@@ -81,31 +82,31 @@ __code/batch_driver.py__
8182

8283
__Remarks:__
8384
* Notice how the environment variable `AZUREML_BI_OUTPUT_PATH` is used to get access to the output path of the deployment job.
84-
* The `init()` function is populating a global variable called `output_path` that can be used later to know where to write.
85-
* The `run` method returns a list of the processed files. It is required for the `run` function to return a `list` or a `pandas.DataFrame` object.
85+
* The `init()` function populates a global variable called `output_path` that can be used later to know where to write.
86+
* The `run` method returns a list of the processed files. It's required for the `run` function to return a `list` or a `pandas.DataFrame` object.
8687

8788
> [!WARNING]
88-
> Take into account that all the batch executors will have write access to this path at the same time. This means that you need to account for concurrency. In this case, we are ensuring each executor writes its own file by using the input file name as the name of the output folder.
89+
> Take into account that all the batch executors have write access to this path at the same time. This means that you need to account for concurrency. In this case, ensure that each executor writes its own file by using the input file name as the name of the output folder.
8990
90-
## Creating the endpoint
91+
## Create the endpoint
9192

92-
We are going to create a batch endpoint named `heart-classifier-batch` where to deploy the model.
93+
You now create a batch endpoint named `heart-classifier-batch` where the model is deployed.
9394

94-
1. Decide on the name of the endpoint. The name of the endpoint will end-up in the URI associated with your endpoint. Because of that, __batch endpoint names need to be unique within an Azure region__. For example, there can be only one batch endpoint with the name `mybatchendpoint` in `westus2`.
95+
1. Decide on the name of the endpoint. The name of the endpoint appears in the URI associated with your endpoint, so *batch endpoint names need to be unique within an Azure region*. For example, there can be only one batch endpoint with the name `mybatchendpoint` in `westus2`.
9596

9697
# [Azure CLI](#tab/cli)
9798

98-
In this case, let's place the name of the endpoint in a variable so we can easily reference it later.
99+
In this case, place the name of the endpoint in a variable so you can easily reference it later.
99100

100101
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deploy-and-run.sh" ID="name_endpoint" :::
101102

102103
# [Python](#tab/python)
103104

104-
In this case, let's place the name of the endpoint in a variable so we can easily reference it later.
105+
In this case, place the name of the endpoint in a variable so you can easily reference it later.
105106

106107
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=name_endpoint)]
107108

108-
1. Configure your batch endpoint
109+
1. Configure your batch endpoint.
109110

110111
# [Azure CLI](#tab/cli)
111112

@@ -129,32 +130,32 @@ We are going to create a batch endpoint named `heart-classifier-batch` where to
129130

130131
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=create_endpoint)]
131132

132-
### Creating the deployment
133+
### Create the deployment
133134

134135
Follow the next steps to create a deployment using the previous scoring script:
135136

136-
1. First, let's create an environment where the scoring script can be executed:
137+
1. First, create an environment where the scoring script can be executed:
137138

138139
# [Azure CLI](#tab/cli)
139140

140-
No extra step is required for the Azure Machine Learning CLI. The environment definition will be included in the deployment file.
141+
No extra step is required for the Azure Machine Learning CLI. The environment definition is included in the deployment file.
141142

142143
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deployment.yml" range="7-10":::
143144

144145
# [Python](#tab/python)
145146

146-
Let's get a reference to the environment:
147+
Get a reference to the environment:
147148

148149
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=configure_environment)]
149150

150-
2. Create the deployment. Notice that now `output_action` is set to `SUMMARY_ONLY`.
151+
2. Create the deployment. Notice that `output_action` is now set to `SUMMARY_ONLY`.
151152

152153
> [!NOTE]
153-
> This example assumes you have aa compute cluster with name `batch-cluster`. Change that name accordinly.
154+
> This example assumes you have a compute cluster with name `batch-cluster`. Change that name accordingly.
154155
155156
# [Azure CLI](#tab/cli)
156157

157-
To create a new deployment under the created endpoint, create a `YAML` configuration like the following. You can check the [full batch endpoint YAML schema](reference-yaml-endpoint-batch.md) for extra properties.
158+
To create a new deployment under the created endpoint, create a YAML configuration like the following. You can check the [full batch endpoint YAML schema](reference-yaml-endpoint-batch.md) for extra properties.
158159

159160
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deployment.yml":::
160161

@@ -174,18 +175,18 @@ Follow the next steps to create a deployment using the previous scoring script:
174175

175176
3. At this point, our batch endpoint is ready to be used.
176177

177-
## Testing out the deployment
178+
## Test the deployment
178179

179-
For testing our endpoint, we are going to use a sample of unlabeled data located in this repository and that can be used with the model. Batch endpoints can only process data that is located in the cloud and that is accessible from the Azure Machine Learning workspace. In this example, we are going to upload it to an Azure Machine Learning data store. Particularly, we are going to create a data asset that can be used to invoke the endpoint for scoring. However, notice that batch endpoints accept data that can be placed in multiple type of locations.
180+
To test your endpoint, use a sample of unlabeled data located in this repository, which can be used with the model. Batch endpoints can only process data that's located in the cloud and is accessible from the Azure Machine Learning workspace. In this example, you upload it to an Azure Machine Learning data store. You're going to create a data asset that can be used to invoke the endpoint for scoring. However, notice that batch endpoints accept data that can be placed in multiple type of locations.
180181

181-
1. Let's invoke the endpoint with data from a storage account:
182+
1. Invoke the endpoint with data from a storage account:
182183

183184
# [Azure CLI](#tab/cli)
184185

185186
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deploy-and-run.sh" ID="start_batch_scoring_job" :::
186187

187188
> [!NOTE]
188-
> The utility `jq` may not be installed on every installation. You can get instructions in [this link](https://stedolan.github.io/jq/download/).
189+
> The utility `jq` might not be installed on every installation. You can [get instructions](https://jqlang.github.io/jq/download) on GitHub.
189190
190191
# [Python](#tab/python)
191192

@@ -210,12 +211,12 @@ For testing our endpoint, we are going to use a sample of unlabeled data located
210211

211212
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=get_job)]
212213

213-
## Analyzing the outputs
214+
## Analyze the outputs
214215

215-
The job generates a named output called `score` where all the generated files are placed. Since we wrote into the directory directly, one file per each input file, then we can expect to have the same number of files. In this particular example we decided to name the output files the same as the inputs, but they will have a parquet extension.
216+
The job generates a named output called `score` where all the generated files are placed. Since you wrote into the directory directly, one file per each input file, then you can expect to have the same number of files. In this particular example, name the output files the same as the inputs, but they have a parquet extension.
216217

217218
> [!NOTE]
218-
> Notice that a file `predictions.csv` is also included in the output folder. This file contains the summary of the processed files.
219+
> Notice that a file *predictions.csv* is also included in the output folder. This file contains the summary of the processed files.
219220
220221
You can download the results of the job by using the job name:
221222

@@ -228,6 +229,7 @@ To download the predictions, use the following command:
228229
# [Python](#tab/python)
229230

230231
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=download_outputs)]
232+
231233
---
232234

233235
Once the file is downloaded, you can open it using your favorite tool. The following example loads the predictions using `Pandas` dataframe.
@@ -247,20 +249,19 @@ The output looks as follows:
247249

248250
# [Azure CLI](#tab/cli)
249251

250-
Run the following code to delete the batch endpoint and all the underlying deployments. Batch scoring jobs won't be deleted.
252+
Run the following code to delete the batch endpoint and all the underlying deployments. Batch scoring jobs aren't deleted.
251253

252254
::: code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/custom-outputs-parquet/deploy-and-run.sh" ID="delete_endpoint" :::
253255

254256
# [Python](#tab/python)
255257

256-
Run the following code to delete the batch endpoint and all the underlying deployments. Batch scoring jobs won't be deleted.
258+
Run the following code to delete the batch endpoint and all the underlying deployments. Batch scoring jobs aren't deleted.
257259

258260
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/custom-outputs-parquet/custom-output-batch.ipynb?name=delete_endpoint)]
259261

260262
---
261263

264+
## Related content
262265

263-
## Next steps
264-
265-
* [Using batch deployments for image file processing](how-to-image-processing-batch.md)
266-
* [Using batch deployments for NLP processing](how-to-nlp-processing-batch.md)
266+
* [Image processing with batch model deployments](how-to-image-processing-batch.md)
267+
* [Deploy language models in batch endpoints](how-to-nlp-processing-batch.md)

articles/machine-learning/includes/azureml-batch-clone-samples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ cd azureml-examples/cli
1818
# [Python](#tab/python)
1919

2020
```azurecli
21-
!git clone https://github.com/Azure/azureml-examples --depth 1
22-
!cd azureml-examples/sdk/python
21+
git clone https://github.com/Azure/azureml-examples --depth 1
22+
cd azureml-examples/sdk/python
2323
```
2424
---

articles/machine-learning/includes/azureml-batch-prereqs.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@ Before following the steps in this article, make sure you have the following pre
1010

1111
* An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).
1212

13-
* An Azure Machine Learning workspace. If you don't have one, use the steps in the [How to manage workspaces](../how-to-manage-workspace.md) article to create one.
13+
* An Azure Machine Learning workspace. If you don't have one, use the steps in the [Manage Azure Machine Learning workspaces](../how-to-manage-workspace.md) article to create one.
1414

15-
* Ensure you have the following permissions in the workspace:
15+
* Ensure that you have the following permissions in the workspace:
1616

17-
* Create/manage batch endpoints and deployments: Use roles Owner, contributor, or custom role allowing `Microsoft.MachineLearningServices/workspaces/batchEndpoints/*`.
17+
* Create or manage batch endpoints and deployments: Use an Owner, Contributor, or Custom role that allows `Microsoft.MachineLearningServices/workspaces/batchEndpoints/*`.
1818

19-
* Create ARM deployments in the workspace resource group: Use roles Owner, contributor, or custom role allowing `Microsoft.Resources/deployments/write` in the resource group where the workspace is deployed.
19+
* Create ARM deployments in the workspace resource group: Use an Owner, Contributor, or Custom role that allows `Microsoft.Resources/deployments/write` in the resource group where the workspace is deployed.
2020

21-
* You will need to install the following software to work with Azure Machine Learning:
21+
* You need to install the following software to work with Azure Machine Learning:
2222

2323
# [Azure CLI](#tab/cli)
2424

0 commit comments

Comments
 (0)