MicrosoftDocs
diff --git a/‎articles/iot-hub/iot-hub-customer-managed-keys.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/iot-hub/iot-hub-customer-managed-keys.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/machine-learning/batch-inference/how-to-batch-scoring-script.md‎
Lines changed: 151 additions & 0 deletions b/‎articles/machine-learning/batch-inference/how-to-batch-scoring-script.md‎
Lines changed: 151 additions & 0 deletions
diff --git a/‎articles/machine-learning/batch-inference/how-to-mlflow-batch.md‎
Lines changed: 4 additions & 0 deletions b/‎articles/machine-learning/batch-inference/how-to-mlflow-batch.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎articles/machine-learning/batch-inference/how-to-secure-batch-endpoint.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/machine-learning/batch-inference/how-to-secure-batch-endpoint.md‎
Lines changed: 1 addition & 1 deletion
@@ -15,7 +15,7 @@ ROBOTS: NOINDEX
 IoT Hub supports encryption of data at rest using customer-managed keys (CMK), also known as Bring your own key (BYOK). Azure IoT Hub provides encryption of data at rest and in-transit as it's written in our datacenters; the data is encrypted when read and decrypted when written.
 
 >[!NOTE]
->The customer-managed keys feature is currently in [public preview](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
+>The customer-managed keys feature is in private preview, and is not currently accepting new customers.
 
 By default, IoT Hub uses Microsoft-managed keys to encrypt the data. With CMK, you can get another layer of encryption on top of default encryption and can choose to encrypt data at rest with a key encryption key, managed through your [Azure Key Vault](https://azure.microsoft.com/services/key-vault/). This gives you the flexibility to create, rotate, disable, and revoke access controls. If BYOK is configured for your IoT Hub, we also provide double encryption, which offers a second layer of protection, while still allowing you to control the encryption key through your Azure Key Vault.
 
 
@@ -0,0 +1,151 @@
+---
+title: 'Author scoring scripts for batch deployments'
+titleSuffix: Azure Machine Learning
+description: In this article, learn how to author scoring scripts to perform batch inference in batch deployments.
+services: machine-learning
+ms.service: machine-learning
+ms.subservice: mlops
+ms.topic: conceptual
+author: santiagxf
+ms.author: fasantia
+ms.reviewer: larryfr
+ms.date: 11/03/2022
+ms.custom: how-to
+---
+
+# Author scoring scripts for batch deployments
+
+[!INCLUDE [cli v2](../../../includes/machine-learning-dev-v2.md)]
+
+Batch endpoints allow you to deploy models to perform inference at scale. Because how inference should be executed varies from model's format, model's type and use case, batch endpoints require a scoring script (also known as batch driver script) to indicate the deployment how to use the model over the provided data. In this article you will learn how to use scoring scripts in different scenarios and their best practices.
+
+> [!TIP]
+> MLflow models don't require a scoring script as it is autogenerated for you. For more details about how batch endpoints work with MLflow models, see the dedicated tutorial [Using MLflow models in batch deployments](how-to-mlflow-batch.md). Notice that this feature doesn't prevent you from writing an specific scoring script for MLflow models as explained at [Using MLflow models with a scoring script](how-to-mlflow-batch.md#using-mlflow-models-with-a-scoring-script).
+
+> [!WARNING]
+> If you are deploying an Automated ML model under a batch endpoint, notice that the scoring script that Automated ML provides only works for Online Endpoints and it is not designed for batch execution. Please follow this guideline to learn how to create one depending on what your model does.
+
+## Understanding the scoring script
+
+The scoring script is a Python file (`.py`) that contains the logic about how to run the model and read the input data submitted by the batch deployment executor driver. Each model deployment has to provide a scoring script, however, an endpoint may host multiple deployments using different scoring script versions. 
+
+The scoring script must contain two methods:
+
+#### The `init` method
+
+Use the `init()` method for any costly or common preparation. For example, use it to load the model into a global object. This function will be called once at the beginning of the process. You model's files will be available in an environment variable called `AZUREML_MODEL_DIR`. Use this variable to locate the files associated with the model.
+
+```python
+def init():
+    global model
+
+    # AZUREML_MODEL_DIR is an environment variable created during deployment
+    # The path "model" is the name of the registered model's folder
+    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")
+
+    # load the model
+    model = load_model(model_path)
+```
+
+Notice that in this example we are placing the model in a global variable `model`. Use global variables to make available any asset needed to perform inference to your scoring function.
+
+#### The `run` method
+
+Use the `run(mini_batch: List[str]) -> Union[List[Any], pandas.DataFrame]` method to perform the scoring of each mini-batch generated by the batch deployment. Such method will be called once per each `mini_batch` generated for your input data. Batch deployments read data in batches accordingly to how the deployment is configured.
+
+```python
+def run(mini_batch):
+    results = []
+
+    for file in mini_batch:
+        (...)
+
+    return pd.DataFrame(results)
+```
+
+The method receives a list of file paths as a parameter (`mini_batch`). You can use this list to either iterate over each file and process it one by one, or to read the entire batch and process it at once. The best option will depend on your compute memory and the throughput you need to achieve. For an example of how to read entire batches of data at once see [High throughput deployments](how-to-image-processing-batch.md#high-throughput-deployments).
+
+> [!NOTE]
+> __How is work distributed?__:
+> 
+> Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
+
+The `run()` method should return a pandas DataFrame or an array/list. Each returned output element indicates one successful run of an input element in the input `mini_batch`. For file datasets, each row/element will represent a single file processed. For a tabular dataset, each row/element will represent a row in a processed file.
+
+> [!IMPORTANT]
+> __How to write predictions?__:
+> 
+> Use __arrays__ when you need to output a single prediction. Use __pandas DataFrames__ when you need to return multiple pieces of information. For instance, for tabular data, you may want to append your predictions to the original record. Use a pandas DataFrame for this case. For file datasets, __we still recommend to output a pandas DataFrame__ as they provide a more robust approach to read the results.
+> 
+> Although pandas DataFrame may contain column names, they are not included in the output file. If needed, please see [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md).
+
+> [!WARNING]
+> Do not not output complex data types (or lists of complex data types) in the `run` function. Those outputs will be transformed to string and they will be hard to read.
+
+The resulting DataFrame or array is appended to the output file indicated. There's no requirement on the cardinality of the results (1 file can generate 1 or many rows/elements in the output). All elements in the result DataFrame or array will be written to the output file as-is (considering the `output_action` isn't `summary_only`).
+
+## Writing predictions in a different way
+
+By default, the batch deployment will write the model's predictions in a single file as indicated in the deployment. However, there are some cases where you need to write the predictions in multiple files. For instance, if the input data is partitioned, you typically would want to generate your output partitioned too. On those cases you can [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) to indicate:
+
+> [!div class="checklist"]
+> * The file format used (CSV, parquet, json, etc).
+> * The way data is partitioned in the output.
+
+Read the article [Customize outputs in batch deployments](how-to-deploy-model-custom-output.md) for an example about how to achieve it.
+
+## Source control of scoring scripts
+
+It is highly advisable to put scoring scripts under source control. 
+
+## Best practices for writing scoring scripts
+
+When writing scoring scripts that work with big amounts of data, you need to take into account several factors, including:
+
+* The size of each file.
+* The amount of data on each file.
+* The amount of memory required to read each file.
+* The amount of memory required to read an entire batch of files.
+* The memory footprint of the model.
+* The memory footprint of the model when running over the input data.
+* The available memory in your compute.
+
+Batch deployments distribute work at the file level, which means that a folder containing 100 files with mini-batches of 10 files will generate 10 batches of 10 files each. Notice that this will happen regardless of the size of the files involved. If your files are too big to be processed in large mini-batches we suggest to either split the files in smaller files to achieve a higher level of parallelism or to decrease the number of files per mini-batch. At this moment, batch deployment can't account for skews in the file's size distribution.
+
+### Running inference at the mini-batch, file or the row level
+
+Batch endpoints will call the `run()` function in your scoring script once per mini-batch. However, you will have the power to decide if you want to run the inference over the entire batch, over one file at a time, or over one row at a time (if your data happens to be tabular).
+
+#### Mini-batch level
+
+You will typically want to run inference over the batch all at once when you want to achieve high throughput in your batch scoring process. This is the case for instance if you run inference over a GPU where you want to achieve saturation of the inference device. You may also be relying on a data loader that can handle the batching itself if data doesn't fit on memory, like `TensorFlow` or `PyTorch` data loaders. On those cases, you may want to consider running inference on the entire batch.
+
+> [!WARNING]
+> Running inference at the batch level may require having high control over the input data size to be able to correctly account for the memory requirements and avoid out of memory exceptions. Whether you are able or not of loading the entire mini-batch in memory will depend on the size of the mini-batch, the size of the instances in the cluster, the number of workers on each node, and the size of the mini-batch.
+
+For an example about how to achieve it see [High throughput deployments](how-to-image-processing-batch.md#high-throughput-deployments).
+
+#### File level
+
+One of the easiest ways to perform inference is by iterating over all the files in the mini-batch and run your model over it. In some cases, like image processing, this may be a good idea. If your data is tabular, you may need to make a good estimation about the number of rows on each file to estimate if your model is able to handle the memory requirements to not just load the entire data into memory but also to perform inference over it. Remember that some models (specially those based on recurrent neural networks) will unfold and present a memory footprint that may not be linear with the number of rows. If your model is expensive in terms of memory, please consider running inference at the row level.
+
+> [!TIP]
+> If file sizes are too big to be readed even at once, please consider breaking down files into multiple smaller files to account for better parallelization.
+
+For an example about how to achieve it see [Image processing with batch deployments](how-to-image-processing-batch.md).
+
+#### Row level (tabular)
+
+For models that present challenges in the size of their inputs, you may want to consider running inference at the row level. Your batch deployment will still provide your scoring script with a mini-batch of files, however, you will read one file, one row at a time. This may look inefficient but for some deep learning models may be the only way to perform inference without scaling up your hardware requirements. 
+
+For an example about how to achieve it see [Text processing with batch deployments](how-to-nlp-processing-batch.md).
+
+### Relationship between the degree of parallelism and the scoring script
+
+Your deployment configuration controls the size of each mini-batch and the number of workers on each node. Take into account them when deciding if you want to read the entire mini-batch to perform inference. When running multiple workers on the same instance, take into account that memory will be shared across all the workers. Usually, increasing the number of workers per node should be accompanied by a decrease in the mini-batch size or by a change in the scoring strategy (if data size remains the same).
+
+## Next steps
+
+* [Troubleshooting batch endpoints](how-to-troubleshoot-batch-endpoints.md).
+* [Use MLflow models in batch deployments](how-to-mlflow-batch.md).
+* [Image processing with batch deployments](how-to-image-processing-batch.md).
@@ -433,6 +433,10 @@ You will typically select this workflow when:
 > [!IMPORTANT]
 > If you choose to indicate an scoring script for an MLflow model deployment, you will also have to specify the environment where the deployment will run.
 
+> [!WARNING]
+> Customizing the scoring script for MLflow deployments is only available from the Azure CLI or SDK for Python. If you are creating a deployment using [Azure ML studio UI](https://ml.azure.com), please switch to the CLI or the SDK.
+
+
 ### Steps
 
 Use the following steps to deploy an MLflow model with a custom scoring script.
 
@@ -32,7 +32,7 @@ When deploying a machine learning model to a batch endpoint, you can secure thei
 All the batch endpoints created inside of secure workspace are deployed as private batch endpoints by default. No further configuration is required.
 
 > [!IMPORTANT]
-> When working on a private link-enabled workspaces, batch endpoints can be created and managed using Azure Machine Learning studio. However, they can't be invoked from the UI in studio. Please use the Azure ML CLI v2 instead for job creation. For more details about how to use it see [Invoke the batch endpoint to start a batch scoring job](how-to-use-batch-endpoint.md#invoke-the-batch-endpoint-to-start-a-batch-scoring-job).
+> When working on a private link-enabled workspaces, batch endpoints can be created and managed using Azure Machine Learning studio. However, they can't be invoked from the UI in studio. Please use the Azure ML CLI v2 instead for job creation. For more details about how to use it see [Invoke the batch endpoint to start a batch scoring job](how-to-use-batch-endpoint.md#invoke-the-batch-endpoint-to-start-a-batch-job).
 
 The following diagram shows how the networking looks like for batch endpoints when deployed in a private workspace: