Update troubleshooting with help for unschedulable

csteegz · web-flow · commit a28f75eae53c · 2020-05-26T13:35:56.000-07:00
Update troubleshooting with more help for unschedulable nodes including specific error strings.
diff --git a/articles/machine-learning/how-to-troubleshoot-deployment.md b/articles/machine-learning/how-to-troubleshoot-deployment.md
@@ -175,6 +175,11 @@ print(service.get_logs())
 # if you only know the name of the service (note there might be multiple services with the same name but different version number)
 print(ws.webservices['mysvc'].get_logs())
 ```
+## Container cannot be scheduled
+
+When deploying a service to an Azure Kubernetes Service compute target, Azure Machine Learning will attempt to schedule the service with the requested amount of resources. If, after 5 minutes, there are no nodes available in the cluster with the appropriate amount of resources available, the deployment will fail with the message `Couldn't Schedule because the kubernetes cluster didn't have available resources after trying for 00:05:00`. You can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service. 
+
+The error message will typically indicate which resource you need more of - for instance, if you see an error message indicating `0/3 nodes are available: 3 Insufficient nvidia.com/gpu` that means that the service requires GPUs and there are 3 nodes in the cluster that do not have available GPUs. This could be addressed by adding more nodes if you are using a GPU SKU, switching to a GPU enabled SKU if you are not or changing your environment to not require GPUs.  
 
 ## Service launch fails