You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Azure Machine Learning, you can use a custom container to deploy a model to an online endpoint.
21
+
In Azure Machine Learning, you can use a custom container to deploy a model to an online endpoint. Custom container deployments can use web servers other than the default Python Flask server that Azure Machine Learning uses.
22
22
23
-
Custom container deployments can use web servers other than the default Python Flask server that Azure Machine Learning uses. When you use a custom deployment, you can still take advantage of the built-in monitoring, scaling, alerting, and authentication that Azure Machine Learning offers.
23
+
When you use a custom deployment, you can:
24
24
25
-
The following table lists various [deployment examples](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/custom-container) that use custom containers. The examples use various tools and technologies, such as TensorFlow Serving, TorchServe, Triton Inference Server, the Plumber R package, and the Azure Machine Learning inference minimal image.
25
+
- Use various tools and technologies, such as TensorFlow Serving, TorchServe, Triton Inference Server, the Plumber R package, and the Azure Machine Learning inference minimal image.
26
+
- Still take advantage of the built-in monitoring, scaling, alerting, and authentication that Azure Machine Learning offers.
26
27
27
-
|Example|Script (CLI)|Description|
28
+
This article shows you how to use a TensorFlow (TF) Serving image to serve a TF model.
* An Azure resource group that contains your workspace and that you or your service principal have Contributor access to. If you use the steps in [Create the workspace](quickstart-create-resources.md#create-the-workspace) to configure your workspace, you meet this requirement.
35
+
36
+
*[Docker Engine](https://docs.docker.com/engine/install/), installed and running locally. This prerequisite is **highly recommended**. You need it to deploy a model locally, and it's helpful for debugging.
37
+
38
+
## Deployment examples
39
+
40
+
The following table lists [deployment examples](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/custom-container) that use custom containers and take advantage of various tools and technologies.
41
+
42
+
|Example|Azure CLI script|Description|
28
43
|-------|------|---------|
29
44
|[minimal/multimodel](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/minimal/multimodel)|[deploy-custom-container-minimal-multimodel](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-minimal-multimodel.sh)|Deploys multiple models to a single deployment by extending the Azure Machine Learning inference minimal image.|
30
45
|[minimal/single-model](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/minimal/single-model)|[deploy-custom-container-minimal-single-model](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-minimal-single-model.sh)|Deploys a single model by extending the Azure Machine Learning inference minimal image.|
@@ -35,22 +50,14 @@ The following table lists various [deployment examples](https://github.com/Azure
35
50
|[torchserve/densenet](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/torchserve/densenet)|[deploy-custom-container-torchserve-densenet](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-torchserve-densenet.sh)|Deploys a single model by using a TorchServe custom container.|
36
51
|[triton/single-model](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/triton/single-model)|[deploy-custom-container-triton-single-model](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-triton-single-model.sh)|Deploys a Triton model by using a custom container.|
37
52
38
-
This article focuses on serving a TensorFlow model with TensorFlow (TF) Serving.
53
+
This article shows you how to use the tfserving/half-plus-two example.
39
54
40
55
> [!WARNING]
41
-
> Microsoft might not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you might be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your image.
* You, or the service principal you use, must have *Contributor* access to the Azure resource group that contains your workspace. You have such a resource group if you configured your workspace using the quickstart article.
48
-
49
-
* To deploy locally, you must have [Docker engine](https://docs.docker.com/engine/install/) running locally. This step is **highly recommended**. It helps you debug issues.
56
+
> Microsoft support teams might not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you might be asked to use the default image or one of the images that Microsoft provides to see whether the problem is specific to your image.
50
57
51
58
## Download the source code
52
59
53
-
To follow along with the steps in this article, clone the source code from GitHub.
60
+
The steps in this article use code samples from the [azureml-examples](https://github.com/Azure/azureml-examples) repository. Use the following commands to clone the repository:
See also [the example notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/custom-container/online-endpoints-custom-container.ipynb), but note that `3. Test locally` section in the notebook assumes that it runs under the `azureml-examples/sdk` directory.
76
+
In the examples repository, most Python samples are under the sdk/python folder. For this article, go to the cli folder instead. The folder structure under the cli folder is slightly different than the sdk/python structure in this case. Most steps in this article require the cli structure.
77
+
78
+
To follow along with the example steps, see a [Jupyter notebook in the examples repository](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/custom-container/online-endpoints-custom-container.ipynb). But in the following sections of that notebook, the steps run from the azureml-examples/sdk/python folder instead of the cli folder:
79
+
80
+
-3. Test locally
81
+
-5. Test the endpoint with sample data
70
82
71
83
---
72
84
73
85
## Initialize environment variables
74
86
75
-
Define environment variables:
87
+
To use a TF model, you need several environment variables. Run the following commands to define those variables:
For more information, see [Deploy machine learning models to managed online endpoint using Python SDK v2](how-to-deploy-managed-online-endpoint-sdk-v2.md).
For more information, see [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md?view=azureml-api-2&tabs=python).
188
198
189
-
> [!TIP]
190
-
> *`name`: The name of the endpoint. It must be unique in the Azure region. The name for an endpoint must start with an upper- or lowercase letter and only consist of '-'s and alphanumeric characters. For more information on the naming rules, see [endpoint limits](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints).
191
-
> *`auth_mode` : Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. A `key` doesn't expire, but `aml_token` does expire. For more information on authenticating, see [Authenticate to an online endpoint](how-to-authenticate-online-endpoint.md).
199
+
### Configure an online endpoint
192
200
193
-
Optionally, you can add description, tags to your endpoint.
201
+
Use the following code to configure an online endpoint. Keep the following points in mind:
202
+
203
+
- The name of the endpoint must be unique in its Azure region. An endpoint name must start with a letter and only consist of alphanumeric characters and hyphens. For more information about the naming rules, see [Azure Machine Learning online endpoints and batch endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints).
204
+
- For the `auth_mode` value, use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. A key doesn't expire, but a token does expire. For more information about authentication, see [Authenticate clients for online endpoints](how-to-authenticate-online-endpoint.md).
205
+
- The description and tags are optional.
194
206
195
207
```python
196
-
#Creating a unique endpoint name with current datetime to avoid conflicts
208
+
# To create a unique endpoint name, use a time stamp of the current date and time.
A deployment is a set of resources that are required for hosting the model that does the actual inferencing. You can use the `ManagedOnlineDeployment` class to configure a deployment for your endpoint. The constructor of that class uses the following parameters:
211
225
212
-
A deployment is a set of resources required for hosting the model that does the actual inferencing. Create a deployment for our endpoint using the `ManagedOnlineDeployment` class.
226
+
- `name`: The name of the deployment.
227
+
- `endpoint_name`: The name of the endpoint to create the deployment under.
228
+
- `model`: The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
229
+
- `environment`: The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.
230
+
- `environment_variables`: Environment variables that are set during deployment.
231
+
- `MODEL_BASE_PATH`: The parent folder that contains a folder for your model.
232
+
- `MODEL_NAME`: The name of your model.
233
+
- `instance_type`: The virtual machine size to use for the deployment. For a list of supported sizes, see [Managed online endpoints SKU list](reference-managed-online-endpoints-vm-sku-list.md).
234
+
- `instance_count`: The number of instances to use for the deployment.
213
235
214
-
> [!TIP]
215
-
> -`name` - Name of the deployment.
216
-
> -`endpoint_name` - Name of the endpoint to create the deployment under.
217
-
> -`model` - The model to use for the deployment. This value can be either a reference to an existing versioned > model in the workspace or an inline model specification.
218
-
> -`environment` - The environment to use for the deployment. This value can be either a reference to an existing > versioned environment in the workspace or an inline environment specification.
219
-
> -`code_configuration` - the configuration for the source code and scoring script
220
-
> -`path`- Path to the source code directory for scoring the model
221
-
> -`scoring_script` - Relative path to the scoring file in the source code directory
222
-
> -`instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [endpoints SKU list](reference-managed-online-endpoints-vm-sku-list.md).
223
-
> -`instance_count` - The number of instances to use for the deployment
236
+
Use the following code to configure a deployment for your endpoint:
There are a few important concepts to note in this YAML/Python parameter:
267
+
The following sections discuss a few important concepts about the YAML and Python parameters.
255
268
256
269
#### Base image
257
270
258
-
The base image is specified as a parameter in environment, and `docker.io/tensorflow/serving:latest` is used in this example. As you inspect the container, you can find that this server uses `ENTRYPOINT` to start an entry point script, which takes the environment variables such as `MODEL_BASE_PATH` and `MODEL_NAME`, and exposes ports such as `8501`. These details are all specific information for this chosen server. You can use this understanding of the server, to determine how to define the deployment. For example, if you set environment variables for `MODEL_BASE_PATH` and `MODEL_NAME` in the deployment definition, the server (in this case, TF Serving) will take the values to initiate the server. Likewise, if you set the port for the routes to be `8501` in the deployment definition, the user request to such routes will be correctly routed to the TF Serving server.
271
+
In the `environment` section in YAML, or the `Environment` constructor in Python, you specify the base image as a parameter. This example uses `docker.io/tensorflow/serving:latest` as the `image` value.
272
+
273
+
If you inspect your container, you can see that this server uses `ENTRYPOINT` commands to start an entry point script. That script takes environment variables such as `MODEL_BASE_PATH` and `MODEL_NAME`, and it exposes ports such as `8501`. These details all pertain to this server, and you can use this information to determine how to define your deployment. For example, if you set the `MODEL_BASE_PATH` and `MODEL_NAME` environment variables in your deployment definition, TF Serving uses those values to initiate the server. Likewise, if you set the port for each route to be `8501` in the deployment definition, user requests to those routes are correctly routed to the TF Serving server.
259
274
260
-
Note that this specific example is based on the TF Serving case, but you can use any containers that will stay up and respond to requests coming to liveness, readiness, and scoring routes. You can refer to other examples and see how the dockerfile is formed (for example, using `CMD` instead of `ENTRYPOINT`) to create the containers.
275
+
This example is based on the TF Serving case, but you can use any container that stays up and responds to requests that go to liveness, readiness, and scoring routes. To see how to form a Dockerfile to create a container, you can refer to other examples. Some servers use `CMD` instructions instead of `ENTRYPOINT` instructions.
261
276
262
-
#### Inference config
277
+
#### The inference_config parameter
263
278
264
-
Inference config is a parameter in environment, and it specifies the port and path for 3 types of the route: liveness, readiness, and scoring route. Inference config is required if you want to run your own container with managed online endpoint.
279
+
In the `environment` section or the `Environment` class, `inference_config` is a parameter. It specifies the port and path for three types of routes: liveness, readiness, and scoring routes. The `inference_config` parameter is required if you want to run your own container with a managed online endpoint.
265
280
266
281
#### Readiness route vs liveness route
267
282
@@ -414,7 +429,7 @@ For the request data, you can use a sample JSON file from the [example repositor
0 commit comments