Skip to content

Commit e2c92ca

Browse files
authored
Merge pull request #123545 from siyuZL/siyuzl/amlarc-usererror-ta
Add a new required role for TA-enabled and two new user errors from AML
2 parents 2ccb84b + 1802077 commit e2c92ca

File tree

2 files changed

+30
-0
lines changed

2 files changed

+30
-0
lines changed

articles/machine-learning/how-to-attach-kubernetes-to-workspace.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Otherwise, if a [user-assigned managed identity is specified in Azure Machine Le
5151
|--|--|--|
5252
|Azure Relay|Azure Relay Owner|Only applicable for Arc-enabled Kubernetes cluster. Azure Relay isn't created for AKS cluster without Arc connected.|
5353
|Kubernetes - Azure Arc or Azure Kubernetes Service|Reader <br> Kubernetes Extension Contributor <br> Azure Kubernetes Service Cluster Admin |Applicable for both Arc-enabled Kubernetes cluster and AKS cluster.|
54+
|Azure Kubernetes Service|Contributor|Required only for AKS clusters that use the Trusted Access feature. The workspace uses user-assigned managed identity. See [AzureML access to AKS clusters with special configurations](https://github.com/Azure/AML-Kubernetes/blob/master/docs/azureml-aks-ta-support.md) for details.|
5455

5556

5657
> [!TIP]

articles/machine-learning/how-to-troubleshoot-kubernetes-compute.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,8 @@ Below is a list of error types in **cluster scope** that you might encounter whe
110110
* [ERROR: GenericClusterError](#error-genericclustererror)
111111
* [ERROR: ClusterNotReachable](#error-clusternotreachable)
112112
* [ERROR: ClusterNotFound](#error-clusternotfound)
113+
* [ERROR: ClusterServiceNotFound](#error-clusterservicenotfound)
114+
* [ERROR: ClusterUnauthorized](#error-clusterunauthorized)
113115

114116
#### ERROR: GenericClusterError
115117

@@ -163,6 +165,33 @@ You can check the following items to troubleshoot the issue:
163165
* First, check the cluster resource ID in the Azure portal to verify whether Kubernetes cluster resource still exists and is running normally.
164166
* If the cluster exists and is running, then you can try to detach and reattach the compute to the workspace. Pay attention to more notes on [reattach](#error-genericcomputeerror).
165167

168+
#### ERROR: ClusterServiceNotFound
169+
170+
The error message is as follows:
171+
172+
````bash
173+
AzureML extension service not found in cluster.
174+
````
175+
176+
This error should occur when the extension-owned ingress service doesn't have enough backend pods.
177+
178+
You can:
179+
180+
* Access the cluster and check the status of the service `azureml-ingress-nginx-controller` and its backend pod under the `azureml` namespace.
181+
* If the cluster doesn't have any running backend pods, check the reason by describing the pod. For example, if the pod doesn't have enough resources to run, you can delete some pods to free enough resources for the ingress pod.
182+
183+
#### ERROR: ClusterUnauthorized
184+
185+
The error message is as follows:
186+
187+
````bash
188+
Request to Kubernetes cluster unauthorized.
189+
````
190+
191+
This error should only occur in the TA-enabled cluster, which means the access token expired during the deployment.
192+
193+
You can try again after several minutes.
194+
166195
> [!TIP]
167196
> More troubleshoot guide of common errors when creating/updating the Kubernetes online endpoints and deployments, you can find in [How to troubleshoot online endpoints](how-to-troubleshoot-online-endpoints.md).
168197

0 commit comments

Comments
 (0)