You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-deployment.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Learn how to troubleshoot and solve, or work around, common Docker deployment er
19
19
20
20
## Prerequisites
21
21
22
-
* An **Azure subscription**. If you do not have one, try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
22
+
* An **Azure subscription**. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree).
23
23
* The [Azure Machine Learning SDK](https://docs.microsoft.com/python/api/overview/azure/ml/install?view=azure-ml-py&preserve-view=true).
24
24
* The [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest).
25
25
* The [CLI extension for Azure Machine Learning](reference-azure-machine-learning-cli.md).
@@ -29,14 +29,12 @@ Learn how to troubleshoot and solve, or work around, common Docker deployment er
29
29
30
30
## Steps for Docker deployment of machine learning models
31
31
32
-
When deploying a model in Azure Machine Learning, the system performs a number of tasks.
33
-
34
-
The recommended approach for model deployment is via the [Model.deploy()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model%28class%29?view=azure-ml-py#&preserve-view=truedeploy-workspace--name--models--inference-config-none--deployment-config-none--deployment-target-none--overwrite-false-) API using an [Environment](how-to-use-environments.md) object as an input parameter. In this case, the service creates a base docker image during deployment stage and mounts the required models all in one call. The basic deployment tasks are:
32
+
When deploying a model in Azure Machine Learning, you use the [Model.deploy()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model%28class%29?view=azure-ml-py#&preserve-view=truedeploy-workspace--name--models--inference-config-none--deployment-config-none--deployment-target-none--overwrite-false-) API and an [Environment](how-to-use-environments.md) object. The service creates a base docker image during deployment stage and mounts the required models all in one call. The basic deployment tasks are:
35
33
36
34
1. Register the model in the workspace model registry.
37
35
38
36
2. Define Inference Configuration:
39
-
1. Create an [Environment](how-to-use-environments.md) object based on the dependencies you specify in the environment yaml file or use one of our procured environments.
37
+
1. Create an [Environment](how-to-use-environments.md) object. This object can use the dependencies in an environment yaml file, one of our curated environments.
40
38
2. Create an inference configuration (InferenceConfig object) based on the environment and the scoring script.
41
39
42
40
3. Deploy the model to Azure Container Instance (ACI) service or to Azure Kubernetes Service (AKS).
@@ -47,7 +45,7 @@ Learn more about this process in the [Model Management](concept-model-management
47
45
48
46
If you run into any issue, the first thing to do is to break down the deployment task (previous described) into individual steps to isolate the problem.
49
47
50
-
Assuming you are using the new/recommended deployment method via [Model.deploy()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model%28class%29?view=azure-ml-py#&preserve-view=truedeploy-workspace--name--models--inference-config-none--deployment-config-none--deployment-target-none--overwrite-false-) API with an [Environment](how-to-use-environments.md) object as an input parameter, your code can be broken down into three major steps:
48
+
When using [Model.deploy()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.model%28class%29?view=azure-ml-py#&preserve-view=truedeploy-workspace--name--models--inference-config-none--deployment-config-none--deployment-target-none--overwrite-false-) with an [Environment](how-to-use-environments.md) object as an input parameter, your code can be broken down into three major steps:
51
49
52
50
1. Register the model. Here is some sample code:
53
51
@@ -90,11 +88,11 @@ Assuming you are using the new/recommended deployment method via [Model.deploy()
90
88
aci_service.wait_for_deployment(show_output=True)
91
89
```
92
90
93
-
Once you have broken down the deployment process into individual tasks, we can look at some of the most common errors.
91
+
Breaking thee deployment process into individual tasks makes it easier to identify some of the more common errors.
94
92
95
93
## Debug locally
96
94
97
-
If you encounter problems deploying a model to ACIorAKS, try deploying it as a local web service. Using a local web service makes it easier to troubleshoot problems. The Docker image containing the model is downloaded and started on your local system.
95
+
If you have problems when deploying a model to ACIorAKS, deploy it as a local web service. Using a local web service makes it easier to troubleshoot problems.
98
96
99
97
You can find a sample [local deployment notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb) in the [MachineLearningNotebooks](https://github.com/Azure/MachineLearningNotebooks) repo to explore a runnable example.
If you are defining your own conda specification YAML, you must list azureml-defaults withversion >=1.0.45 as a pip dependency. This package contains the functionality needed to host the model as a web service.
124
+
If you are defining your own conda specification YAML, list azureml-defaults version >=1.0.45 as a pip dependency. This package is needed to host the model as a web service.
127
125
128
-
At this point, you can work with the service as normal. For example, the following code demonstrates sending data to the service:
126
+
At this point, you can work with the service as normal. The following code demonstrates sending data to the service:
129
127
130
128
```python
131
129
import json
@@ -184,7 +182,7 @@ You can address the error by increasing the value of `memory_gb` in `deployment_
184
182
185
183
## Container cannot be scheduled
186
184
187
-
When deploying a service to an Azure Kubernetes Service compute target, Azure Machine Learning will attempt to schedule the service with the requested amount of resources. If after 5 minutes, there are no nodes available in the cluster with the appropriate amount of resources available, the deployment will failwith the message `Couldn't Schedule because the kubernetes cluster didn't have available resources after trying for00:05:00`. You can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service.
185
+
When deploying a service to an Azure Kubernetes Service compute target, Azure Machine Learning will attempt to schedule the service with the requested amount of resources. If there are no nodes available in the cluster with the appropriate amount of resources after 5 minutes, the deployment will fail. The failure message is`Couldn't Schedule because the kubernetes cluster didn't have available resources after trying for00:05:00`. You can address this error by either adding more nodes, changing the SKU of your nodes,or changing the resource requirements of your service.
188
186
189
187
The error message will typically indicate which resource you need more of -for instance, if you see an error message indicating `0/3 nodes are available: 3 Insufficient nvidia.com/gpu` that means that the service requires GPUs and there are three nodes in the cluster that do not have available GPUs. This could be addressed by adding more nodes if you are using a GPUSKU, switching to a GPU enabled SKUif you are notor changing your environment to not require GPUs.
190
188
@@ -284,7 +282,9 @@ You can increase the timeout or try to speed up the service by modifying the sco
284
282
285
283
## Advanced debugging
286
284
287
-
In some cases, you may need to interactively debug the Python code contained in your model deployment. For example, if the entry script is failing and the reason cannot be determined by additional logging. By using Visual Studio Code and the debugpy, you can attach to the code running inside the Docker container. For more information, visit the [interactive debugging inVS Code guide](how-to-debug-visual-studio-code.md#debug-and-troubleshoot-deployments).
285
+
You may need to interactively debug the Python code contained in your model deployment. For example, if the entry script is failing and the reason cannot be determined by additional logging. By using Visual Studio Code and the debugpy, you can attach to the code running inside the Docker container.
286
+
287
+
For more information, visit the [interactive debugging inVS Code guide](how-to-debug-visual-studio-code.md#debug-and-troubleshoot-deployments).
0 commit comments