You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-kubernetes-compute.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,22 +7,22 @@ ms.author: chenlujiao
7
7
ms.reviewer: ssalgado
8
8
ms.service: machine-learning
9
9
ms.subservice: core
10
-
ms.date: 11/11/2022
10
+
ms.date: 02/11/2024
11
11
ms.topic: how-to
12
12
ms.custom: build-spring-2022, cliv2, sdkv2
13
13
---
14
14
15
15
# Troubleshoot Kubernetes Compute
16
16
17
-
In this article, you learn how to troubleshoot common workload (including training jobs and endpoints) errors on the [Kubernetes compute](./how-to-attach-kubernetes-to-workspace.md).
17
+
In this article, you learn how to troubleshoot common workload errors on the [Kubernetes compute](./how-to-attach-kubernetes-to-workspace.md). Common errors include training jobs and endpoint errors.
18
18
19
19
## Inference guide
20
20
21
-
The common Kubernetes endpoint errors on Kubernetes compute are categorized into two scopes: **compute scope** and **cluster scope**. The compute scope errors are related to the compute target, such as the compute target is not found, or the compute target is not accessible. The cluster scope errors are related to the underlying Kubernetes cluster, such as the cluster itself is not reachable, or the cluster is not found.
21
+
The common Kubernetes endpoint errors on Kubernetes compute are categorized into two scopes: **compute scope** and **cluster scope**. The compute scope errors are related to the compute target, such as the compute target isn't found, or the compute target isn't accessible. The cluster scope errors are related to the underlying Kubernetes cluster, such as the cluster itself isn't reachable, or the cluster isn't found.
22
22
23
23
### Kubernetes compute errors
24
24
25
-
The common error types in **compute scope** that you might encounter when using Kubernetes compute to create online endpoints and online deployments for real-time model inference, which you can trouble shoot by following the guidelines:
25
+
The following are common error types in **compute scope** that you might encounter when using Kubernetes compute to create online endpoints and online deployments for real-time model inference. You can trouble shoot by following the linked sections for guidelines:
@@ -307,7 +307,7 @@ We could use the method to check private link setup by logging into one pod in t
307
307
308
308
* Find workspace ID in Azure portal or get this ID by running `az ml workspace show` in the command line.
309
309
* Show all azureml-fe pods run by `kubectl get po -n azureml -l azuremlappname=azureml-fe`.
310
-
* Login into any of them run `kubectl exec -it -n azureml {scorin_fe_pod_name} bash`.
310
+
* Sign in into any of them run `kubectl exec -it -n azureml {scorin_fe_pod_name} bash`.
311
311
* If the cluster doesn't use proxy run `nslookup {workspace_id}.workspace.{region}.api.azureml.ms`.
312
312
If you set up private link from VNet to workspace correctly, then the internal IP in VNet should be responded through the *DNSLookup* tool.
313
313
@@ -316,13 +316,13 @@ If you set up private link from VNet to workspace correctly, then the internal I
316
316
curl https://{workspace_id}.workspace.westcentralus.api.azureml.ms/metric/v2.0/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}/api/2.0/prometheus/post -X POST -x {proxy_address} -d {} -v -k
317
317
```
318
318
319
-
When the proxy and workspace are correctly set up with a private link, you should observe an attempt to connect to an internal IP. A response with an HTTP 401 status code is expected in this scenario if a token is not provided.
319
+
When the proxy and workspace are correctly set up with a private link, you should observe an attempt to connect to an internal IP. A response with an HTTP 401 status code is expected in this scenario if a token isn't provided.
320
320
321
321
## Other known issues
322
322
323
-
### Kubernetes compute update does not take effect
323
+
### Kubernetes compute update doesn't take effect
324
324
325
-
At this time, the CLI v2 and SDK v2 do not allow updating any configuration of an existing Kubernetes compute. For example, changing the namespace does not take effect.
325
+
At this time, the CLI v2 and SDK v2 don't allow updating any configuration of an existing Kubernetes compute. For example, changing the namespace doesn't take effect.
0 commit comments