You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is a list of common resources that may run out of quota when using Azure services.
105
+
Below is a list of common resources that might run out of quota when using Azure services:
106
106
107
107
*[CPU](#cpu-quota)
108
108
*[Role assignments](#role-assignment-quota)
@@ -114,7 +114,7 @@ Below is a list of common resources that may run out of quota when using Azure s
114
114
115
115
Before deploying a model, you need to have enough compute quota. This quota defines how much virtual cores are available per subscription, per workspace, per SKU, and per region. Each deployment subtracts from available quota and adds it back after deletion, based on type of the SKU.
116
116
117
-
A possible mitigation is to check if there are unused deployments that can be deleted. Or you can submit a [request for a quota increase](./how-to-manage-quotas.md).
117
+
A possible mitigation is to check if there are unused deployments that can be deleted. Or you can submit a [request for a quota increase](./how-to-manage-quotas.md#request-quota-increases).
118
118
119
119
#### Role assignment quota
120
120
@@ -132,7 +132,7 @@ The requested CPU or memory couldn't be satisfied. Please adjust your request or
132
132
133
133
To run the `score.py` provided as part of the deployment, Azure creates a container that includes all the resources that the `score.py` needs, and runs the scoring script on that container.
134
134
135
-
If your container could not start, this means scoring could not happen. It could be that the container is requesting more resources than what `instance_type`could support. If so, consider updating the `instance_type` of the online deployment.
135
+
If your container could not start, this means scoring could not happen. It might be that the container is requesting more resources than what `instance_type`can support. If so, consider updating the `instance_type` of the online deployment.
136
136
137
137
To get the exact reason for an error, run:
138
138
@@ -146,7 +146,7 @@ The specified VM Size failed to provision due to a lack of Azure Machine Learnin
146
146
147
147
### ERROR: BadArgument
148
148
149
-
Below is a list of reasons you may run into this error:
149
+
Below is a list of reasons you might run into this error:
150
150
151
151
*[Resource request was greater than limits](#resource-requests-greater-than-limits)
152
152
*[Unable to download resources](#unable-to-download-resources)
- If you created the associated endpoint with UserAssigned, the user's managed identity must have Storage blob data reader permission on the workspace storage account.
169
169
170
-
During this process, you can run into a few different issues depending on which stage the operation failed at.
170
+
During this process, you can run into a few different issues depending on which stage the operation failed at:
171
171
172
172
*[Unable to download user container image](#unable-to-download-user-container-image)
173
-
*[Unable to download user model/code artifacts](#unable-to-download-user-modelcode-artifacts)
173
+
*[Unable to download user model or code artifacts](#unable-to-download-user-model-or-code-artifacts)
174
174
175
175
To get more details about these errors, run:
176
176
@@ -183,51 +183,54 @@ az ml online-deployment get-logs -n <endpoint-name> --deployment <deployment-nam
183
183
It is possible that the user container could not be found.
184
184
185
185
Make sure container image is available in workspace ACR.
186
-
- For example, if image is `testacr.azurecr.io/azureml/azureml_92a029f831ce58d2ed011c3c42d35acb:latest` check the repository with
186
+
187
+
For example, if image is `testacr.azurecr.io/azureml/azureml_92a029f831ce58d2ed011c3c42d35acb:latest` check the repository with
#### Unable to download user model or code artifacts
190
191
191
-
It is possible that the user model/code artifacts could not be found.
192
+
It is possible that the user model or code artifacts can't be found.
192
193
193
194
Make sure model and code artifacts are registered to the same workspace as the deployment. Use the `show` command to show details for a model or code artifact in a workspace.
195
+
194
196
- For example:
195
197
196
-
```azurecli
197
-
az ml model show --name <model-name>
198
-
az ml code show --name <code-name> --version <version>
199
-
```
200
-
You can also check if the blobs are present in the workspace storage account.
198
+
```azurecli
199
+
az ml model show --name <model-name>
200
+
az ml code show --name <code-name> --version <version>
201
+
```
202
+
203
+
You can also check if the blobs are present in the workspace storage account.
201
204
202
-
- For example, if the blob is `https://foobar.blob.core.windows.net/210212154504-1517266419/WebUpload/210212154504-1517266419/GaussianNB.pkl` you can use this command to check if it exists:
205
+
- For example, if the blob is `https://foobar.blob.core.windows.net/210212154504-1517266419/WebUpload/210212154504-1517266419/GaussianNB.pkl`, you can use this command to check if it exists:
To run the `score.py` provided as part of the deployment, Azure creates a container that includes all the resources that the `score.py` needs, and runs the scoring script on that container. The error in this scenario is that this container is crashing when running, which means scoring couldn't happen. This error happens when:
211
+
To run the `score.py` provided as part of the deployment, Azure creates a container that includes all the resources that the `score.py` needs, and runs the scoring script on that container. The error in this scenario is that this container is crashing when running, which means scoring can't happen. This error happens when:
209
212
210
213
- There's an error in `score.py`. Use `get-logs` to help diagnose common problems:
211
-
- A package that was imported but is not in the conda environment
212
-
- A syntax error
213
-
- A failure in the `init()` method
214
+
- A package that was imported but is not in the conda environment.
215
+
- A syntax error.
216
+
- A failure in the `init()` method.
214
217
- If `get-logs` isn't producing any logs, it usually means that the container has failed to start. To debug this issue, try [deploying locally](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-troubleshoot-online-endpoints.md#deploy-locally) instead.
215
218
- Readiness or liveness probes are not set up correctly.
216
219
- There's an error in the environment setup of the container, such as a missing dependency.
217
220
218
221
### ERROR: ResourceNotFound
219
222
220
-
This error occurs when Azure Resource Manager (ARM) can't find a required resource. For example, you will receive this error if a storage account was referred to but cannot be found at the path on which it was specified. Be sure to double check resources which may have been supplied by exact path or the spelling of their names.
223
+
This error occurs when Azure Resource Manager can't find a required resource. For example, you will receive this error if a storage account was referred to but cannot be found at the path on which it was specified. Be sure to double check resources which might have been supplied by exact path or the spelling of their names.
221
224
222
-
More details on this error can be found [here](https://aka.ms/ARMResourceNotFoundFix).
225
+
For more information, see [Resolve resource not found errors](../azure-resource-manager/troubleshooting/error-not-found).
223
226
224
227
### ERROR: OperationCancelled
225
228
226
-
Azure operations have a certain priority level and are executed from highest to lowest. This error happens when your operation happened to be overridden by another operation which has a higher priority. Retrying the operation may allow it to be performed without cancellation.
229
+
Azure operations have a certain priority level and are executed from highest to lowest. This error happens when your operation happened to be overridden by another operation that has a higher priority. Retrying the operation might allow it to be performed without cancellation.
227
230
228
231
### ERROR: InternalServerError
229
232
230
-
While we do our best to provide a stable and reliable service, sometimes things don't go according to plan. If you get this error, it means something isn't right on our side and we need to fix it. Submit a [customer support ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest) with all related information and we'll address the issue.
233
+
Although we do our best to provide a stable and reliable service, sometimes things don't go according to plan. If you get this error, it means that something isn't right on our side, and we need to fix it. Submit a [customer support ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest) with all related information and we'll address the issue.
0 commit comments