Merge pull request #268833 from cdpark/azureml-custom-dem108

prmerger-automator[bot] · web-flow · commit c80325fb6437 · 2024-03-29T13:43:50.000Z
User Story 226300: Q&amp;M: AzureML Freshness updates - Custom container
diff --git a/articles/machine-learning/how-to-deploy-custom-container.md b/articles/machine-learning/how-to-deploy-custom-container.md
@@ -8,7 +8,7 @@ ms.subservice: inferencing
 author: dem108
 ms.author: sehan
 ms.reviewer: mopeakande
-ms.date: 10/13/2022
+ms.date: 03/26/2024
 ms.topic: how-to
 ms.custom: deploy, devplatv2, devx-track-azurecli, cliv2, sdkv2
 ms.devlang: azurecli
@@ -18,40 +18,40 @@ ms.devlang: azurecli
 
 [!INCLUDE [dev v2](includes/machine-learning-dev-v2.md)]
 
-Learn how to use a custom container for deploying a model to an online endpoint in Azure Machine Learning.
+Learn how to use a custom container to deploy a model to an online endpoint in Azure Machine Learning.
 
 Custom container deployments can use web servers other than the default Python Flask server used by Azure Machine Learning. Users of these deployments can still take advantage of Azure Machine Learning's built-in monitoring, scaling, alerting, and authentication.
 
 The following table lists various [deployment examples](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/custom-container) that use custom containers such as TensorFlow Serving, TorchServe, Triton Inference Server, Plumber R package, and Azure Machine Learning Inference Minimal image.
 
-|Example|Script (CLI)|Description| 
+|Example|Script (CLI)|Description|
 |-------|------|---------|
 |[minimal/multimodel](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/minimal/multimodel)|[deploy-custom-container-minimal-multimodel](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-minimal-multimodel.sh)|Deploy multiple models to a single deployment by extending the Azure Machine Learning Inference Minimal image.|
 |[minimal/single-model](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/minimal/single-model)|[deploy-custom-container-minimal-single-model](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-minimal-single-model.sh)|Deploy a single model by extending the Azure Machine Learning Inference Minimal image.|
 |[mlflow/multideployment-scikit](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/mlflow/multideployment-scikit)|[deploy-custom-container-mlflow-multideployment-scikit](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-mlflow-multideployment-scikit.sh)|Deploy two MLFlow models with different Python requirements to two separate deployments behind a single endpoint using the Azure Machine Learning Inference Minimal Image.|
 |[r/multimodel-plumber](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/r/multimodel-plumber)|[deploy-custom-container-r-multimodel-plumber](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-r-multimodel-plumber.sh)|Deploy three regression models to one endpoint using the Plumber R package|
-|[tfserving/half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a simple Half Plus Two model using a TensorFlow Serving custom container using the standard model registration process.|
-|[tfserving/half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a simple Half Plus Two model using a TensorFlow Serving custom container with the model integrated into the image.|
+|[tfserving/half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a Half Plus Two model using a TensorFlow Serving custom container using the standard model registration process.|
+|[tfserving/half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a Half Plus Two model using a TensorFlow Serving custom container with the model integrated into the image.|
 |[torchserve/densenet](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/torchserve/densenet)|[deploy-custom-container-torchserve-densenet](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-torchserve-densenet.sh)|Deploy a single model using a TorchServe custom container.|
 |[torchserve/huggingface-textgen](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/torchserve/huggingface-textgen)|[deploy-custom-container-torchserve-huggingface-textgen](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-torchserve-huggingface-textgen.sh)|Deploy Hugging Face models to an online endpoint and follow along with the Hugging Face Transformers TorchServe example.| 
 |[triton/single-model](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/triton/single-model)|[deploy-custom-container-triton-single-model](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-triton-single-model.sh)|Deploy a Triton model using a custom container|
 
 This article focuses on serving a TensorFlow model with TensorFlow (TF) Serving.
 
 > [!WARNING]
-> Microsoft may not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you may be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your image.
+> Microsoft might not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you might be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your image.
 
 ## Prerequisites
 
 [!INCLUDE [cli & sdk](includes/machine-learning-cli-sdk-v2-prereqs.md)]
 
-* You, or the service principal you use, must have `Contributor` access to the Azure Resource Group that contains your workspace. You'll have such a resource group if you configured your workspace using the quickstart article.
+* You, or the service principal you use, must have *Contributor* access to the Azure resource group that contains your workspace. You have such a resource group if you configured your workspace using the quickstart article.
 
-* To deploy locally, you must have [Docker engine](https://docs.docker.com/engine/install/) running locally. This step is **highly recommended**. It will help you debug issues.
+* To deploy locally, you must have [Docker engine](https://docs.docker.com/engine/install/) running locally. This step is **highly recommended**. It helps you debug issues.
 
 ## Download source code
 
-To follow along with this tutorial, download the source code below.
+To follow along with this tutorial, clone the source code from GitHub.
 
 # [Azure CLI](#tab/cli)
 
@@ -64,10 +64,10 @@ cd azureml-examples/cli
 
 ```azurecli
 git clone https://github.com/Azure/azureml-examples --depth 1
-cd azureml-examples/sdk
+cd azureml-examples/cli
 ```
 
-See also [the example notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/custom-container/online-endpoints-custom-container.ipynb) but note that `3. Test locally` section in the notebook assumes to run under the `azureml-examples/sdk` directory.
+See also [the example notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/custom-container/online-endpoints-custom-container.ipynb), but note that `3. Test locally` section in the notebook assumes that it runs under the `azureml-examples/sdk` directory.
 
 ---
 
@@ -91,7 +91,7 @@ Use docker to run your image locally for testing:
 
 ### Check that you can send liveness and scoring requests to the image
 
-First, check that the container is "alive," meaning that the process inside the container is still running. You should get a 200 (OK) response.
+First, check that the container is *alive*, meaning that the process inside the container is still running. You should get a 200 (OK) response.
 
 :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="check_liveness_locally":::
 
@@ -101,11 +101,12 @@ Then, check that you can get predictions about unlabeled data:
 
 ### Stop the image
 
-Now that you've tested locally, stop the image:
+Now that you tested locally, stop the image:
 
 :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-custom-container-tfserving-half-plus-two.sh" id="stop_image":::
 
 ## Deploy your online endpoint to Azure
+
 Next, deploy your online endpoint to Azure.
 
 # [Azure CLI](#tab/cli)
@@ -125,7 +126,8 @@ __tfserving-deployment.yml__
 # [Python SDK](#tab/python)
 
 ### Connect to Azure Machine Learning workspace
-Connect to Azure Machine Learning Workspace, configure workspace details, and get a handle to the workspace as follows:
+
+Connect to your Azure Machine Learning workspace, configure workspace details, and get a handle to the workspace as follows:
 
 1. Import the required libraries:
 
@@ -183,7 +185,7 @@ endpoint = ManagedOnlineEndpoint(
 
 ### Configure online deployment
 
-A deployment is a set of resources required for hosting the model that does the actual inferencing. We'll create a deployment for our endpoint using the `ManagedOnlineDeployment` class.
+A deployment is a set of resources required for hosting the model that does the actual inferencing. Create a deployment for our endpoint using the `ManagedOnlineDeployment` class.
 
 > [!TIP]
 > - `name` - Name of the deployment.
@@ -229,23 +231,23 @@ There are a few important concepts to notice in this YAML/Python parameter:
 
 #### Readiness route vs. liveness route
 
-An HTTP server defines paths for both _liveness_ and _readiness_. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do work. In machine learning inference, a server could respond 200 OK to a liveness request before loading a model. The server could respond 200 OK to a readiness request only after the model has been loaded into memory.
+An HTTP server defines paths for both _liveness_ and _readiness_. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do work. In machine learning inference, a server could respond 200 OK to a liveness request before loading a model. The server could respond 200 OK to a readiness request only after the model is loaded into memory.
 
-Review the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) for more information about liveness and readiness probes.
+For more information about liveness and readiness probes, see the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
 
 Notice that this deployment uses the same path for both liveness and readiness, since TF Serving only defines a liveness route.
 
 #### Locating the mounted model
 
-When you deploy a model as an online endpoint, Azure Machine Learning _mounts_ your model to your endpoint. Model mounting enables you to deploy new versions of the model without having to create a new Docker image. By default, a model registered with the name *foo* and version *1* would be located at the following path inside of your deployed container: `/var/azureml-app/azureml-models/foo/1`
+When you deploy a model as an online endpoint, Azure Machine Learning _mounts_ your model to your endpoint. Model mounting allows you to deploy new versions of the model without having to create a new Docker image. By default, a model registered with the name *foo* and version *1* would be located at the following path inside of your deployed container: */var/azureml-app/azureml-models/foo/1*
 
-For example, if you have a directory structure of `/azureml-examples/cli/endpoints/online/custom-container` on your local machine, where the model is named `half_plus_two`:
+For example, if you have a directory structure of */azureml-examples/cli/endpoints/online/custom-container* on your local machine, where the model is named *half_plus_two*:
 
 :::image type="content" source="./media/how-to-deploy-custom-container/local-directory-structure.png" alt-text="Diagram showing a tree view of the local directory structure.":::
 
 # [Azure CLI](#tab/cli)
 
-and `tfserving-deployment.yml` contains:
+And *tfserving-deployment.yml* contains:
 
 ```yaml
 model:
@@ -256,26 +258,26 @@ model:
 
 # [Python SDK](#tab/python)
 
-and `Model` class contains:
+And `Model` class contains:
 
 ```python
 model = Model(name="tfserving-mounted", version="1", path="half_plus_two")
 ```
 
 ---
 
-then your model will be located under `/var/azureml-app/azureml-models/tfserving-deployment/1` in your deployment:
+Then your model will be located under */var/azureml-app/azureml-models/tfserving-deployment/1* in your deployment:
 
 :::image type="content" source="./media/how-to-deploy-custom-container/deployment-location.png" alt-text="Diagram showing a tree view of the deployment directory structure.":::
 
-You can optionally configure your `model_mount_path`. It enables you to change the path where the model is mounted. 
+You can optionally configure your `model_mount_path`. It lets you change the path where the model is mounted.
 
 > [!IMPORTANT]
 > The `model_mount_path` must be a valid absolute path in Linux (the OS of the container image).
 
 # [Azure CLI](#tab/cli)
 
-For example, you can have `model_mount_path` parameter in your _tfserving-deployment.yml_:
+For example, you can have `model_mount_path` parameter in your *tfserving-deployment.yml*:
 
 ```YAML
 name: tfserving-deployment
@@ -305,37 +307,35 @@ blue_deployment = ManagedOnlineDeployment(
 
 ---
 
-then your model will be located at `/var/tfserving-model-mount/tfserving-deployment/1` in your deployment. Note that it's no longer under `azureml-app/azureml-models`, but under the mount path you specified:
+Then your model is located at */var/tfserving-model-mount/tfserving-deployment/1* in your deployment. Note that it's no longer under *azureml-app/azureml-models*, but under the mount path you specified:
 
 :::image type="content" source="./media/how-to-deploy-custom-container/mount-path-deployment-location.png" alt-text="Diagram showing a tree view of the deployment directory structure when using mount_model_path.":::
 
 ### Create your endpoint and deployment
 
 # [Azure CLI](#tab/cli)
 
-Now that you've understood how the YAML was constructed, create your endpoint.
+Now that you understand how the YAML was constructed, create your endpoint.
 
 ```azurecli
 az ml online-endpoint create --name tfserving-endpoint -f endpoints/online/custom-container/tfserving-endpoint.yml
 ```
 
-Creating a deployment might take few minutes.
+Creating a deployment might take a few minutes.
 
 ```azurecli
 az ml online-deployment create --name tfserving-deployment -f endpoints/online/custom-container/tfserving-deployment.yml --all-traffic
 ```
 
-
-
 # [Python SDK](#tab/python)
 
-Using the `MLClient` created earlier, we'll now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.
+Using the `MLClient` created earlier, create the endpoint in the workspace. This command starts the endpoint creation and returns a confirmation response while the endpoint creation continues.
 
 ```python
 ml_client.begin_create_or_update(endpoint)
 ```
 
-Create the deployment by running as well.
+Create the deployment by running:
 
 ```python
 ml_client.begin_create_or_update(blue_deployment)
@@ -353,12 +353,12 @@ Once your deployment completes, see if you can make a scoring request to the dep
 
 # [Python SDK](#tab/python)
 
-Using the `MLClient` created earlier, we'll get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:
+Using the `MLClient` created earlier, you get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:
 - `endpoint_name` - Name of the endpoint
 - `request_file` - File with request data
 - `deployment_name` - Name of the specific deployment to test in an endpoint
 
-We'll send a sample request using a json file. The sample json is in the [example repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/custom-container).
+Send a sample request using a JSON file. The sample JSON is in the [example repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/custom-container).
 
 ```python
 # test the blue deployment with some sample data
@@ -373,7 +373,7 @@ ml_client.online_endpoints.invoke(
 
 ### Delete the endpoint
 
-Now that you've successfully scored with your endpoint, you can delete it:
+Now that you successfully scored with your endpoint, you can delete it:
 
 # [Azure CLI](#tab/cli)
 
@@ -389,7 +389,7 @@ ml_client.online_endpoints.begin_delete(name=online_endpoint_name)
 
 ---
 
-## Next steps
+## Related content
 
 - [Safe rollout for online endpoints](how-to-safely-rollout-online-endpoints.md)
 - [Troubleshooting online endpoints deployment](./how-to-troubleshoot-online-endpoints.md)