Skip to content

Commit e92401c

Browse files
Merge pull request #266235 from ssalgadodev/patch-72
Update how-to-troubleshoot-kubernetes-compute.md
2 parents 7991f16 + faa8880 commit e92401c

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-compute.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,22 @@ ms.author: chenlujiao
77
ms.reviewer: ssalgado
88
ms.service: machine-learning
99
ms.subservice: core
10-
ms.date: 11/11/2022
10+
ms.date: 02/11/2024
1111
ms.topic: how-to
1212
ms.custom: build-spring-2022, cliv2, sdkv2
1313
---
1414

1515
# Troubleshoot Kubernetes Compute
1616

17-
In this article, you learn how to troubleshoot common workload (including training jobs and endpoints) errors on the [Kubernetes compute](./how-to-attach-kubernetes-to-workspace.md).
17+
In this article, you learn how to troubleshoot common workload errors on the [Kubernetes compute](./how-to-attach-kubernetes-to-workspace.md). Common errors include training jobs and endpoint errors.
1818

1919
## Inference guide
2020

21-
The common Kubernetes endpoint errors on Kubernetes compute are categorized into two scopes: **compute scope** and **cluster scope**. The compute scope errors are related to the compute target, such as the compute target is not found, or the compute target is not accessible. The cluster scope errors are related to the underlying Kubernetes cluster, such as the cluster itself is not reachable, or the cluster is not found.
21+
The common Kubernetes endpoint errors on Kubernetes compute are categorized into two scopes: **compute scope** and **cluster scope**. The compute scope errors are related to the compute target, such as the compute target isn't found, or the compute target isn't accessible. The cluster scope errors are related to the underlying Kubernetes cluster, such as the cluster itself isn't reachable, or the cluster isn't found.
2222

2323
### Kubernetes compute errors
2424

25-
The common error types in **compute scope** that you might encounter when using Kubernetes compute to create online endpoints and online deployments for real-time model inference, which you can trouble shoot by following the guidelines:
25+
The following are common error types in **compute scope** that you might encounter when using Kubernetes compute to create online endpoints and online deployments for real-time model inference. You can trouble shoot by following the linked sections for guidelines:
2626

2727

2828
* [ERROR: GenericComputeError](#error-genericcomputeerror)
@@ -307,7 +307,7 @@ We could use the method to check private link setup by logging into one pod in t
307307
308308
* Find workspace ID in Azure portal or get this ID by running `az ml workspace show` in the command line.
309309
* Show all azureml-fe pods run by `kubectl get po -n azureml -l azuremlappname=azureml-fe`.
310-
* Login into any of them run `kubectl exec -it -n azureml {scorin_fe_pod_name} bash`.
310+
* Sign in into any of them run `kubectl exec -it -n azureml {scorin_fe_pod_name} bash`.
311311
* If the cluster doesn't use proxy run `nslookup {workspace_id}.workspace.{region}.api.azureml.ms`.
312312
If you set up private link from VNet to workspace correctly, then the internal IP in VNet should be responded through the *DNSLookup* tool.
313313

@@ -316,13 +316,13 @@ If you set up private link from VNet to workspace correctly, then the internal I
316316
curl https://{workspace_id}.workspace.westcentralus.api.azureml.ms/metric/v2.0/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}/api/2.0/prometheus/post -X POST -x {proxy_address} -d {} -v -k
317317
```
318318

319-
When the proxy and workspace are correctly set up with a private link, you should observe an attempt to connect to an internal IP. A response with an HTTP 401 status code is expected in this scenario if a token is not provided.
319+
When the proxy and workspace are correctly set up with a private link, you should observe an attempt to connect to an internal IP. A response with an HTTP 401 status code is expected in this scenario if a token isn't provided.
320320
321321
## Other known issues
322322
323-
### Kubernetes compute update does not take effect
323+
### Kubernetes compute update doesn't take effect
324324

325-
At this time, the CLI v2 and SDK v2 do not allow updating any configuration of an existing Kubernetes compute. For example, changing the namespace does not take effect.
325+
At this time, the CLI v2 and SDK v2 don't allow updating any configuration of an existing Kubernetes compute. For example, changing the namespace doesn't take effect.
326326

327327
### Workspace or resource group name end with '-'
328328

0 commit comments

Comments
 (0)