Skip to content

Commit a97e3e9

Browse files
authored
Merge pull request #233686 from santiagxf/santiagxf/azureml-batch-scoring
Santiagxf/azureml batch scoring
2 parents aad2921 + 3fbb788 commit a97e3e9

File tree

3 files changed

+35
-6
lines changed

3 files changed

+35
-6
lines changed

articles/machine-learning/how-to-batch-scoring-script.md

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,43 @@ ms.custom: how-to
2020
Batch endpoints allow you to deploy models to perform long-running inference at scale. To indicate how batch endpoints should use your model over the input data to create predictions, you need to create and specify a scoring script (also known as batch driver script). In this article, you will learn how to use scoring scripts in different scenarios and their best practices.
2121

2222
> [!TIP]
23-
> MLflow models don't require a scoring script as it is autogenerated for you. For more details about how batch endpoints work with MLflow models, see the dedicated tutorial [Using MLflow models in batch deployments](how-to-mlflow-batch.md). If you want to change the default inference routine, write an scoring script for your MLflow models as explained at [Using MLflow models with a scoring script](how-to-mlflow-batch.md#customizing-mlflow-models-deployments-with-a-scoring-script).
23+
> MLflow models don't require a scoring script as it is autogenerated for you. For more details about how batch endpoints work with MLflow models, see the dedicated tutorial [Using MLflow models in batch deployments](how-to-mlflow-batch.md).
2424
2525
> [!WARNING]
2626
> If you are deploying an Automated ML model under a batch endpoint, notice that the scoring script that Automated ML provides only works for Online Endpoints and it is not designed for batch execution. Please follow this guideline to learn how to create one depending on what your model does.
2727
2828
## Understanding the scoring script
2929

30-
The scoring script is a Python file (`.py`) that contains the logic about how to run the model and read the input data submitted by the batch deployment executor driver. Each model deployment has to provide a scoring script, however, an endpoint may host multiple deployments using different scoring script versions.
30+
The scoring script is a Python file (`.py`) that contains the logic about how to run the model and read the input data submitted by the batch deployment executor. Each model deployment provides the scoring script (allow with any other dependency required) at creation time. It is usually indicated as follows:
31+
32+
# [Azure CLI](#tab/cli)
33+
34+
__deployment.yml__
35+
36+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/mnist-classifier/deployment-torch/deployment.yml" range="8-10":::
37+
38+
# [Python](#tab/python)
39+
40+
```python
41+
deployment = BatchDeployment(
42+
...
43+
code_path="code",
44+
scoring_script="batch_driver.py",
45+
...
46+
)
47+
```
48+
49+
# [Studio](#tab/azure-studio)
50+
51+
When creating a new deployment, you will be prompted for a scoring script and dependencies as follows:
52+
53+
:::image type="content" source="./media/how-to-batch-scoring-script/configure-scoring-script.png" alt-text="Screenshot of the step where you can configure the scoring script in a new deployment.":::
54+
55+
For MLflow models, scoring scripts are automatically generated but you can indicate one by checking the following option:
56+
57+
:::image type="content" source="./media/how-to-batch-scoring-script/configure-scoring-script-mlflow.png" alt-text="Screenshot of the step where you can configure the scoring script in a new deployment when the model has MLflow format.":::
58+
59+
---
3160

3261
The scoring script must contain two methods:
3362

@@ -78,9 +107,9 @@ The `run()` method should return a Pandas `DataFrame` or an array/list. Each ret
78107
> [!IMPORTANT]
79108
> __How to write predictions?__
80109
>
81-
> Use __arrays__ when you need to output a single prediction. Use __pandas DataFrames__ when you need to return multiple pieces of information. For instance, for tabular data, you may want to append your predictions to the original record. Use a pandas DataFrame for this case. For file datasets, __we still recommend to output a pandas DataFrame__ as they provide a more robust approach to read the results.
82-
>
83-
> Although pandas DataFrame may contain column names, they are not included in the output file. If needed, please see [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md).
110+
> Whatever you return in the `run()` function will be appended in the output pedictions file generated by the batch job. It is important to return the right data type from this function. Return __arrays__ when you need to output a single prediction. Return __pandas DataFrames__ when you need to return multiple pieces of information. For instance, for tabular data you may want to append your predictions to the original record. Use a pandas DataFrame for this case. Although pandas DataFrame may contain column names, they are not included in the output file.
111+
>
112+
> If you need to write predictions in a different way, you can [customize outputs in batch deployments](how-to-deploy-model-custom-output.md).
84113
85114
> [!WARNING]
86115
> Do not not output complex data types (or lists of complex data types) in the `run` function. Those outputs will be transformed to string and they will be hard to read.
@@ -161,7 +190,7 @@ For an example about how to achieve it see [Text processing with batch deploymen
161190

162191
### Using models that are folders
163192

164-
When authoring scoring scripts, the environment variable `AZUREML_MODEL_DIR` is typically used in the `init()` function to load the model. However, some models may contain its files inside of a folder. When reading the files in this variable, you may need to account for that. You can identify the folder where your MLflow model is placed as follows:
193+
The environment variable `AZUREML_MODEL_DIR` contains the path to where the selected model is located and it is typically used in the `init()` function to load the model into memory. However, some models may contain its files inside of a folder. When reading the files in this variable, you may need to account for that. You can identify the folder where your MLflow model is placed as follows:
165194

166195
1. Go to [Azure Machine Learning portal](https://ml.azure.com).
167196

23.9 KB
Loading
22.6 KB
Loading

0 commit comments

Comments
 (0)