You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a few important concepts to notice in this YAML/Python parameter:
229
+
There are a few important concepts to note in this YAML/Python parameter:
230
+
231
+
#### Base image
232
+
233
+
The base image is specified as a parameter in environment, and `docker.io/tensorflow/serving:latest` is used in this example. As you inspect the container, you can find that this server uses `ENTRYPOINT` to start an entry point script, which takes the environment variables such as `MODEL_BASE_PATH` and `MODEL_NAME`, and exposed ports like `8501`. These details are all specific information for this chosen server. You can use this understanding of the server, to determine how to define the deployment. For example, if you set environment variables for `MODEL_BASE_PATH` and `MODEL_NAME` in the deployment definition, the server (in this case, TF Serving) will take the values to initiate the server. Likewise, if you set the port for the routes to be `8501` in the deployment definition, the user request to such routes will be correctly routed to the TF Serving server.
234
+
235
+
Note that this specific example is based on the TF Serving case, but you can use any containers that will stay up and respond to requests coming to liveness, readiness, and scoring routes.
236
+
237
+
#### Inference config
238
+
239
+
Inference config is a parameter in environment, and it specifies the port and path for 3 types of the route: liveness, readiness, and scoring route. Inference config is required if you want to run your own container with managed online endpoint.
230
240
231
241
#### Readiness route vs. liveness route
232
242
233
243
The API server you choose may provide a way to check the status of the server. There are two types of the route that you can specify: _liveness_ and _readiness_. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do work. In the context of machine learning inferencing, a server could respond 200 OK to a liveness request before loading a model, and the server could respond 200 OK to a readiness request only after the model is loaded into the memory.
234
244
235
245
For more information about liveness and readiness probes in general, see the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
236
246
237
-
The liveness and readiness routes will be determined by the API server of your choice. Note that the example deployment in this article uses the same path for both liveness and readiness, since TFServing only defines a liveness route. Refer to other examples for different patterns to define the routes.
247
+
The liveness and readiness routes will be determined by the API server of your choice, as you would have identified when testing the container locally in earlier step. Note that the example deployment in this article uses the same path for both liveness and readiness, since TFServing only defines a liveness route. Refer to other examples for different patterns to define the routes.
238
248
239
249
#### Scoring route
240
250
241
-
The API server you choose would provide a way to receive the payload to work on. In the context of machine learning inference, a server would receive the input data via a specific route. Identify this route for your API server and specify it when you define the deployment to create. Successful creation of the deployment will update the scoring_uri parameter of the endpoint as well, which you can verify with `az ml online-endpoint show -n <name> --query scoring_uri`.
251
+
The API server you choose would provide a way to receive the payload to work on. In the context of machine learning inferencing, a server would receive the input data via a specific route. Identify this route for your API server as you test the container locally in earlier step, and specify it when you define the deployment to create. Successful creation of the deployment will update the scoring_uri parameter of the endpoint as well, which you can verify with `az ml online-endpoint show -n <name> --query scoring_uri`.
0 commit comments