Skip to content

Commit 09d2cb9

Browse files
Merge pull request #233328 from santiagxf/santiagxf/azureml-mlflow-batch
Update how-to-mlflow-batch.md
2 parents 35b929a + 2559608 commit 09d2cb9

File tree

1 file changed

+48
-124
lines changed

1 file changed

+48
-124
lines changed

articles/machine-learning/how-to-mlflow-batch.md

Lines changed: 48 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,16 @@ This example shows how you can deploy an MLflow model to a batch endpoint to per
3333

3434
The model has been trained using an `XGBBoost` classifier and all the required preprocessing has been packaged as a `scikit-learn` pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.
3535

36-
The information in this article is based on code samples contained in the [azureml-examples](https://github.com/azure/azureml-examples) repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the `cli/endpoints/batch` if you are using the Azure CLI or `sdk/endpoints/batch` if you are using our SDK for Python.
36+
The information in this article is based on code samples contained in the [azureml-examples](https://github.com/azure/azureml-examples) repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the `cli/endpoints/batch/deploy-models/heart-classifier-mlflow` if you are using the Azure CLI or `sdk/python/endpoints/batch/deploy-models/heart-classifier-mlflow` if you are using our SDK for Python.
3737

3838
```azurecli
3939
git clone https://github.com/Azure/azureml-examples --depth 1
40-
cd azureml-examples/cli/endpoints/batch
40+
cd azureml-examples/cli/endpoints/batch/deploy-models/heart-classifier-mlflow
4141
```
4242

4343
### Follow along in Jupyter Notebooks
4444

45-
You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: [mlflow-for-batch-tabular.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/mlflow-for-batch-tabular.ipynb).
45+
You can follow along this sample in the following notebooks. In the cloned repository, open the notebook: [mlflow-for-batch-tabular.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-models/heart-classifier-mlflow/mlflow-for-batch-tabular.ipynb).
4646

4747
## Prerequisites
4848

@@ -85,15 +85,15 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
8585

8686
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)
8787
```
88-
8988

90-
2. Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.
89+
90+
1. Batch Endpoint can only deploy registered models. In this case, we already have a local copy of the model in the repository, so we only need to publish the model to the registry in the workspace. You can skip this step if the model you are trying to deploy is already registered.
9191

9292
# [Azure CLI](#tab/cli)
9393

9494
```azurecli
9595
MODEL_NAME='heart-classifier'
96-
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "heart-classifier-mlflow/model"
96+
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"
9797
```
9898

9999
# [Python](#tab/sdk)
@@ -106,7 +106,7 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
106106
)
107107
```
108108

109-
3. Before moving any forward, we need to make sure the batch deployments we are about to create can run on some infrastructure (compute). Batch deployments can run on any Azure Machine Learning compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an Azure Machine Learning compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.
109+
1. Before moving any forward, we need to make sure the batch deployments we are about to create can run on some infrastructure (compute). Batch deployments can run on any Azure Machine Learning compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an Azure Machine Learning compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.
110110

111111
# [Azure CLI](#tab/cli)
112112

@@ -141,33 +141,43 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
141141
ml_client.begin_create_or_update(compute_cluster)
142142
```
143143

144-
4. Now it is time to create the batch endpoint and deployment. Let's start with the endpoint first. Endpoints only require a name and a description to be created:
144+
1. Now it is time to create the batch endpoint and deployment. Let's start with the endpoint first. Endpoints only require a name and a description to be created. The name of the endpoint will end-up in the URI associated with your endpoint. Because of that, __batch endpoint names need to be unique within an Azure region__. For example, there can be only one batch endpoint with the name `mybatchendpoint` in `westus2`.
145+
146+
# [Azure CLI](#tab/cli)
147+
148+
In this case, let's place the name of the endpoint in a variable so we can easily reference it later.
149+
150+
```azurecli
151+
ENDPOINT_NAME="heart-classifier-batch"
152+
```
153+
154+
# [Python](#tab/sdk)
155+
156+
In this case, let's place the name of the endpoint in a variable so we can easily reference it later.
157+
158+
```python
159+
endpoint_name="heart-classifier-batch"
160+
```
161+
162+
1. Create the endpoint:
145163
146164
# [Azure CLI](#tab/cli)
147165
148166
To create a new endpoint, create a `YAML` configuration like the following:
149167
150-
```yaml
151-
$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
152-
name: heart-classifier-batch
153-
description: A heart condition classifier for batch inference
154-
auth_mode: aad_token
155-
```
168+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/endpoint.yml" :::
156169
157170
Then, create the endpoint with the following command:
158171
159-
```azurecli
160-
ENDPOINT_NAME='heart-classifier-batch'
161-
az ml batch-endpoint create -n $ENDPOINT_NAME -f endpoint.yml
162-
```
172+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="create_batch_endpoint" :::
163173
164174
# [Python](#tab/sdk)
165175
166176
To create a new endpoint, use the following script:
167177
168178
```python
169179
endpoint = BatchEndpoint(
170-
name="heart-classifier-batch",
180+
name=endpoint_name,
171181
description="A heart condition classifier for batch inference",
172182
)
173183
```
@@ -184,32 +194,11 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
184194

185195
To create a new deployment under the created endpoint, create a `YAML` configuration like the following:
186196

187-
```yaml
188-
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
189-
endpoint_name: heart-classifier-batch
190-
name: classifier-xgboost-mlflow
191-
description: A heart condition classifier based on XGBoost
192-
model: azureml:heart-classifier@latest
193-
compute: azureml:cpu-cluster
194-
resources:
195-
instance_count: 2
196-
max_concurrency_per_instance: 2
197-
mini_batch_size: 2
198-
output_action: append_row
199-
output_file_name: predictions.csv
200-
retry_settings:
201-
max_retries: 3
202-
timeout: 300
203-
error_threshold: -1
204-
logging_level: info
205-
```
197+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deployment-simple/deployment.yml" :::
206198

207199
Then, create the deployment with the following command:
208200

209-
```azurecli
210-
DEPLOYMENT_NAME="classifier-xgboost-mlflow"
211-
az ml batch-deployment create -n $DEPLOYMENT_NAME -f endpoint.yml
212-
```
201+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="create_batch_deployment_set_default" :::
213202

214203
# [Python](#tab/sdk)
215204

@@ -246,9 +235,7 @@ Follow these steps to deploy an MLflow model to a batch endpoint for running bat
246235

247236
# [Azure CLI](#tab/cli)
248237

249-
```azurecli
250-
az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
251-
```
238+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="update_default_deployment" :::
252239

253240
# [Python](#tab/sdk)
254241

@@ -271,26 +258,19 @@ For testing our endpoint, we are going to use a sample of unlabeled data located
271258
a. Create a data asset definition in `YAML`:
272259

273260
__heart-dataset-unlabeled.yml__
274-
```yaml
275-
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
276-
name: heart-dataset-unlabeled
277-
description: An unlabeled dataset for heart classification.
278-
type: uri_folder
279-
path: heart-classifier-mlflow/data
280-
```
261+
262+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/heart-dataset-unlabeled.yml" :::
281263

282264
b. Create the data asset:
283265

284-
```azurecli
285-
az ml data create -f heart-dataset-unlabeled.yml
286-
```
266+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="register_dataset" :::
287267

288268
# [Python](#tab/sdk)
289269

290270
a. Create a data asset definition:
291271

292272
```python
293-
data_path = "heart-classifier-mlflow/data"
273+
data_path = "data"
294274
dataset_name = "heart-dataset-unlabeled"
295275

296276
heart_dataset_unlabeled = Data(
@@ -317,9 +297,7 @@ For testing our endpoint, we are going to use a sample of unlabeled data located
317297

318298
# [Azure CLI](#tab/cli)
319299

320-
```azurecli
321-
JOB_NAME = $(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input azureml:heart-dataset-unlabeled@latest | jq -r '.name')
322-
```
300+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="start_batch_scoring_job" :::
323301

324302
> [!NOTE]
325303
> The utility `jq` may not be installed on every installation. You can get installation instructions in [this link](https://stedolan.github.io/jq/download/).
@@ -342,9 +320,7 @@ For testing our endpoint, we are going to use a sample of unlabeled data located
342320

343321
# [Azure CLI](#tab/cli)
344322

345-
```azurecli
346-
az ml job show --name $JOB_NAME
347-
```
323+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="show_job_in_studio" :::
348324

349325
# [Python](#tab/sdk)
350326

@@ -370,9 +346,7 @@ You can download the results of the job by using the job name:
370346

371347
To download the predictions, use the following command:
372348

373-
```azurecli
374-
az ml job download --name $JOB_NAME --output-name score --download-path ./
375-
```
349+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="download_scores" :::
376350

377351
# [Python](#tab/sdk)
378352

@@ -426,7 +400,7 @@ The following data types are supported for batch inference when deploying MLflow
426400

427401
| File extension | Type returned as model's input | Signature requirement |
428402
| :- | :- | :- |
429-
| `.csv` | `pd.DataFrame` | `ColSpec`. If not provided, columns typing is not enforced. |
403+
| `.csv`, `.parquet` | `pd.DataFrame` | `ColSpec`. If not provided, columns typing is not enforced. |
430404
| `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | `np.ndarray` | `TensorSpec`. Input is reshaped to match tensors shape if available. If no signature is available, tensors of type `np.uint8` are inferred. For additional guidance read [Considerations for MLflow models processing images](how-to-image-processing-batch.md#considerations-for-mlflow-models-processing-images). |
431405

432406
> [!WARNING]
@@ -485,34 +459,9 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
485459

486460
1. Create a scoring script. Notice how the folder name `model` you identified before has been included in the `init()` function.
487461

488-
__batch_driver.py__
489-
490-
```python
491-
import os
492-
import mlflow
493-
import pandas as pd
494-
495-
def init():
496-
global model
462+
__deployment-custom/code/batch_driver.py__
497463

498-
# AZUREML_MODEL_DIR is an environment variable created during deployment
499-
# It is the path to the model folder
500-
model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")
501-
model = mlflow.pyfunc.load_model(model_path)
502-
503-
def run(mini_batch):
504-
results = pd.DataFrame(columns=['file', 'predictions'])
505-
506-
for file_path in mini_batch:
507-
data = pd.read_csv(file_path)
508-
pred = model.predict(data)
509-
510-
df = pd.DataFrame(pred, columns=['predictions'])
511-
df['file'] = os.path.basename(file_path)
512-
results = pd.concat([results, df])
513-
514-
return results
515-
```
464+
:::code language="python" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deployment-custom/code/batch_driver.py" :::
516465

517466
1. Let's create an environment where the scoring script can be executed. Since our model is MLflow, the conda requirements are also specified in the model package (for more details about MLflow models and the files included on it see [The MLmodel format](concept-mlflow-models.md#the-mlmodel-format)). We are going then to build the environment using the conda dependencies from the file. However, __we need also to include__ the package `azureml-core` which is required for Batch Deployments.
518467

@@ -532,8 +481,9 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
532481

533482
```python
534483
environment = Environment(
535-
conda_file="./heart-classifier-mlflow/environment/conda.yaml",
536-
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
484+
name="batch-mlflow-xgboost",
485+
conda_file="deployment-custom/environment/conda.yaml",
486+
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
537487
)
538488
```
539489

@@ -543,37 +493,11 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
543493

544494
To create a new deployment under the created endpoint, create a `YAML` configuration like the following:
545495

546-
```yaml
547-
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
548-
endpoint_name: heart-classifier-batch
549-
name: classifier-xgboost-custom
550-
description: A heart condition classifier based on XGBoost
551-
model: azureml:heart-classifier@latest
552-
environment:
553-
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest
554-
conda_file: ./heart-classifier-mlflow/environment/conda.yaml
555-
code_configuration:
556-
code: ./heart-classifier-custom/code/
557-
scoring_script: batch_driver.py
558-
compute: azureml:cpu-cluster
559-
resources:
560-
instance_count: 2
561-
max_concurrency_per_instance: 2
562-
mini_batch_size: 2
563-
output_action: append_row
564-
output_file_name: predictions.csv
565-
retry_settings:
566-
max_retries: 3
567-
timeout: 300
568-
error_threshold: -1
569-
logging_level: info
570-
```
496+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deployment-custom/deployment.yml" :::
571497

572498
Then, create the deployment with the following command:
573499

574-
```azurecli
575-
az ml batch-deployment create -f deployment.yml
576-
```
500+
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/heart-classifier-mlflow/deploy-and-run.sh" ID="create_new_deployment_not_default" :::
577501

578502
# [Python](#tab/sdk)
579503

@@ -587,7 +511,7 @@ Use the following steps to deploy an MLflow model with a custom scoring script.
587511
model=model,
588512
environment=environment,
589513
code_configuration=CodeConfiguration(
590-
code="./heart-classifier-mlflow/code/",
514+
code="deployment-custom/code/",
591515
scoring_script="batch_driver.py",
592516
),
593517
compute=compute_name,

0 commit comments

Comments
 (0)