Skip to content

Commit a28f75e

Browse files
authored
Update troubleshooting with help for unschedulable
Update troubleshooting with more help for unschedulable nodes including specific error strings.
1 parent 94fdba8 commit a28f75e

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

articles/machine-learning/how-to-troubleshoot-deployment.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,11 @@ print(service.get_logs())
175175
# if you only know the name of the service (note there might be multiple services with the same name but different version number)
176176
print(ws.webservices['mysvc'].get_logs())
177177
```
178+
## Container cannot be scheduled
179+
180+
When deploying a service to an Azure Kubernetes Service compute target, Azure Machine Learning will attempt to schedule the service with the requested amount of resources. If, after 5 minutes, there are no nodes available in the cluster with the appropriate amount of resources available, the deployment will fail with the message `Couldn't Schedule because the kubernetes cluster didn't have available resources after trying for 00:05:00`. You can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service.
181+
182+
The error message will typically indicate which resource you need more of - for instance, if you see an error message indicating `0/3 nodes are available: 3 Insufficient nvidia.com/gpu` that means that the service requires GPUs and there are 3 nodes in the cluster that do not have available GPUs. This could be addressed by adding more nodes if you are using a GPU SKU, switching to a GPU enabled SKU if you are not or changing your environment to not require GPUs.
178183
179184
## Service launch fails
180185

0 commit comments

Comments
 (0)