Skip to content

Commit 2548c6a

Browse files
Merge pull request #267544 from lgayhardt/patch-254
Code Update how-to-mlflow-batch.md
2 parents 13929db + d4c5f35 commit 2548c6a

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

articles/machine-learning/how-to-mlflow-batch.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ author: santiagxf
1010
ms.author: fasantia
1111
ms.date: 10/10/2022
1212
ms.reviewer: mopeakande
13-
ms.custom: devplatv2, update-code
13+
ms.custom: update-code, devplatv2
1414
---
1515

1616
# Deploy MLflow models in batch deployments
@@ -27,7 +27,7 @@ In this article, learn how to deploy [MLflow](https://www.mlflow.org) models to
2727
2828
## About this example
2929

30-
This example shows how you can deploy an MLflow model to a batch endpoint to perform batch predictions. This example uses an MLflow model based on the [UCI Heart Disease Data Set](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The database contains 76 attributes, but we are using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence).
30+
This example shows how you can deploy an MLflow model to a batch endpoint to perform batch predictions. This example uses an MLflow model based on the [UCI Heart Disease Data Set](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The database contains 76 attributes, but we're using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence).
3131

3232
The model has been trained using an `XGBBoost` classifier and all the required preprocessing has been packaged as a `scikit-learn` pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.
3333

@@ -51,7 +51,7 @@ You can follow along this sample in the following notebooks. In the cloned repos
5151

5252
Follow these steps to deploy an MLflow model to a batch endpoint for running batch inference over new data:
5353

54-
1. Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.
54+
1. Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you're trying to deploy is already registered.
5555

5656
# [Azure CLI](#tab/cli)
5757

@@ -61,7 +61,7 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
6161

6262
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/heart-classifier-mlflow/mlflow-for-batch-tabular.ipynb?name=register_model)]
6363

64-
1. Before moving any forward, we need to make sure the batch deployments we are about to create can run on some infrastructure (compute). Batch deployments can run on any Azure Machine Learning compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an Azure Machine Learning compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.
64+
1. Before moving any forward, we need to make sure the batch deployments we're about to create can run on some infrastructure (compute). Batch deployments can run on any Azure Machine Learning compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we're going to work on an Azure Machine Learning compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.
6565

6666
# [Azure CLI](#tab/cli)
6767

@@ -142,7 +142,7 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
142142
> [!IMPORTANT]
143143
> Configure `timeout` in your deployment based on how long it takes for your model to run inference on a single batch. The bigger the batch size the longer this value has to be. Remeber that `mini_batch_size` indicates the number of files in a batch, not the number of samples. When working with tabular data, each file may contain multiple rows which will increase the time it takes for the batch endpoint to process each file. Use high values on those cases to avoid time out errors.
144144
145-
7. Although you can invoke a specific deployment inside of an endpoint, you will usually want to invoke the endpoint itself and let the endpoint decide which deployment to use. Such deployment is named the "default" deployment. This gives you the possibility of changing the default deployment and hence changing the model serving the deployment without changing the contract with the user invoking the endpoint. Use the following instruction to update the default deployment:
145+
7. Although you can invoke a specific deployment inside of an endpoint, you'll usually want to invoke the endpoint itself and let the endpoint decide which deployment to use. Such deployment is named the "default" deployment. This gives you the possibility of changing the default deployment and hence changing the model serving the deployment without changing the contract with the user invoking the endpoint. Use the following instruction to update the default deployment:
146146

147147
# [Azure CLI](#tab/cli)
148148

@@ -156,7 +156,7 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
156156

157157
## Testing out the deployment
158158

159-
For testing our endpoint, we are going to use a sample of unlabeled data located in this repository and that can be used with the model. Batch endpoints can only process data that is located in the cloud and that is accessible from the Azure Machine Learning workspace. In this example, we are going to upload it to an Azure Machine Learning data store. Particularly, we are going to create a data asset that can be used to invoke the endpoint for scoring. However, notice that batch endpoints accept data that can be placed in various locations.
159+
For testing our endpoint, we're going to use a sample of unlabeled data located in this repository and that can be used with the model. Batch endpoints can only process data that is located in the cloud and that is accessible from the Azure Machine Learning workspace. In this example, we're going to upload it to an Azure Machine Learning data store. Particularly, we're going to create a data asset that can be used to invoke the endpoint for scoring. However, notice that batch endpoints accept data that can be placed in various locations.
160160

161161
1. Let's create the data asset first. This data asset consists of a folder with multiple CSV files that we want to process in parallel using batch endpoints. You can skip this step is your data is already registered as a data asset or you want to use a different input type.
162162

@@ -205,7 +205,7 @@ For testing our endpoint, we are going to use a sample of unlabeled data located
205205
---
206206

207207
> [!TIP]
208-
> Notice how we are not indicating the deployment name in the invoke operation. That's because the endpoint automatically routes the job to the default deployment. Since our endpoint only has one deployment, then that one is the default one. You can target an specific deployment by indicating the argument/parameter `deployment_name`.
208+
> Notice how we're not indicating the deployment name in the invoke operation. That's because the endpoint automatically routes the job to the default deployment. Since our endpoint only has one deployment, then that one is the default one. You can target an specific deployment by indicating the argument/parameter `deployment_name`.
209209
210210
3. A batch job is started as soon as the command returns. You can monitor the status of the job until it finishes:
211211

@@ -270,7 +270,7 @@ Azure Machine Learning supports deploying MLflow models to batch endpoints witho
270270
Batch Endpoints distribute work at the file level, for both structured and unstructured data. As a consequence, only [URI file](reference-yaml-data.md) and [URI folders](reference-yaml-data.md) are supported for this feature. Each worker processes batches of `Mini batch size` files at a time. For tabular data, batch endpoints don't take into account the number of rows inside of each file when distributing the work.
271271

272272
> [!WARNING]
273-
> Nested folder structures are not explored during inference. If you are partitioning your data using folders, make sure to flatten the structure beforehand.
273+
> Nested folder structures are not explored during inference. If you're partitioning your data using folders, make sure to flatten the structure beforehand.
274274
275275
Batch deployments will call the `predict` function of the MLflow model once per file. For CSV files containing multiple rows, this may impose a memory pressure in the underlying compute and may increase the time it takes for the model to score a single file (specially for expensive models like large language models). If you encounter several out-of-memory exceptions or time-out entries in logs, consider splitting the data in smaller files with less rows or implement batching at the row level inside of the model/scoring script.
276276

@@ -284,7 +284,7 @@ The following data types are supported for batch inference when deploying MLflow
284284
| `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | `np.ndarray` | `TensorSpec`. Input is reshaped to match tensors shape if available. If no signature is available, tensors of type `np.uint8` are inferred. For additional guidance read [Considerations for MLflow models processing images](how-to-image-processing-batch.md#considerations-for-mlflow-models-processing-images). |
285285

286286
> [!WARNING]
287-
> Be advised that any unsupported file that may be present in the input data will make the job to fail. You will see an error entry as follows: *"ERROR:azureml:Error processing input file: '/mnt/batch/tasks/.../a-given-file.avro'. File type 'avro' is not supported."*.
287+
> Be advised that any unsupported file that may be present in the input data will make the job to fail. You'll see an error entry as follows: *"ERROR:azureml:Error processing input file: '/mnt/batch/tasks/.../a-given-file.avro'. File type 'avro' is not supported."*.
288288
289289
### Signature enforcement for MLflow models
290290

@@ -303,7 +303,7 @@ Batch deployments only support deploying MLflow models with a `pyfunc` flavor. I
303303

304304
MLflow models can be deployed to batch endpoints without indicating a scoring script in the deployment definition. However, you can opt in to indicate this file (usually referred as the *batch driver*) to customize how inference is executed.
305305

306-
You will typically select this workflow when:
306+
You'll typically select this workflow when:
307307
> [!div class="checklist"]
308308
> * You need to process a file type not supported by batch deployments MLflow deployments.
309309
> * You need to customize the way the model is run, for instance, use an specific flavor to load it with `mlflow.<flavor>.load()`.
@@ -312,7 +312,7 @@ You will typically select this workflow when:
312312
> * You model can't process each file at once because of memory constrains and it needs to read it in chunks.
313313
314314
> [!IMPORTANT]
315-
> If you choose to indicate a scoring script for an MLflow model deployment, you will also have to specify the environment where the deployment will run.
315+
> If you choose to indicate a scoring script for an MLflow model deployment, you'll also have to specify the environment where the deployment will run.
316316
317317

318318
### Steps
@@ -325,7 +325,7 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
325325

326326
b. Go to the section __Models__.
327327

328-
c. Select the model you are trying to deploy and click on the tab __Artifacts__.
328+
c. Select the model you're trying to deploy and click on the tab __Artifacts__.
329329

330330
d. Take note of the folder that is displayed. This folder was indicated when the model was registered.
331331

@@ -337,7 +337,7 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
337337

338338
:::code language="python" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deployment-custom/code/batch_driver.py" :::
339339

340-
1. Let's create an environment where the scoring script can be executed. Since our model is MLflow, the conda requirements are also specified in the model package (for more details about MLflow models and the files included on it see [The MLmodel format](concept-mlflow-models.md#the-mlmodel-format)). We are going then to build the environment using the conda dependencies from the file. However, __we need also to include__ the package `azureml-core` which is required for Batch Deployments.
340+
1. Let's create an environment where the scoring script can be executed. Since our model is MLflow, the conda requirements are also specified in the model package (for more details about MLflow models and the files included on it see [The MLmodel format](concept-mlflow-models.md#the-mlmodel-format)). We're going then to build the environment using the conda dependencies from the file. However, __we need also to include__ the package `azureml-core` which is required for Batch Deployments.
341341

342342
> [!TIP]
343343
> If your model is already registered in the model registry, you can download/copy the `conda.yml` file associated with your model by going to [Azure Machine Learning studio](https://ml.azure.com) > Models > Select your model from the list > Artifacts. Open the root folder in the navigation and select the `conda.yml` file listed. Click on Download or copy its content.

0 commit comments

Comments
 (0)