Skip to content

Commit a2d4608

Browse files
committed
Update k8s compute TSG and log info
1 parent 4ded1b7 commit a2d4608

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-compute.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ Cannot found Kubernetes cluster.
173173
This error should occur when the system cannot find the AKS/Arc-Kubernetes cluster.
174174

175175
You can check the following items to troubleshoot the issue:
176-
* First, check the cluster resource ID in the Azure Portal to verify whether Kubernetes cluster resource still exists and is running normally.
176+
* First, check the cluster resource ID in the Azure portal to verify whether Kubernetes cluster resource still exists and is running normally.
177177
* If the cluster exists and is running, then you can try to detach and reattach the compute to the workspace. Pay attention to more notes on [reattach](#error-genericcomputeerror).
178178

179179
> [!TIP]

articles/machine-learning/how-to-troubleshoot-online-endpoints.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,7 @@ To run the `score.py` provided as part of the deployment, Azure creates a contai
450450
- A failure in the `init()` method.
451451
- If `get-logs` isn't producing any logs, it usually means that the container has failed to start. To debug this issue, try [deploying locally](#deploy-locally) instead.
452452
- Readiness or liveness probes aren't set up correctly.
453-
- There's an error in the environment setup of the container, such as a missing dependency.
453+
- There's an error in the environment set up of the container, such as a missing dependency.
454454
- When you face `TypeError: register() takes 3 positional arguments but 4 were given` error, the error may be caused by the dependency between flask v2 and `azureml-inference-server-http`. See [FAQs for inference HTTP server](how-to-inference-server-http.md#1-i-encountered-the-following-error-during-server-startup) for more details.
455455

456456
### ERROR: ResourceNotFound
@@ -524,7 +524,7 @@ Below is a list of reasons you might run into this error when creating/updating
524524
* Role assignment has not yet been completed. In this case, please wait for a few seconds and try again later.
525525
* The Azure ARC (For Azure Arc Kubernetes cluster) or Azure Machine Learning extension (For AKS) is not properly installed or configured. Please try to check the Azure ARC or Azure Machine Learning extension configuration and status.
526526
* The Kubernetes cluster has improper network configuration, please check the proxy, network policy or certificate.
527-
* If you are using a private AKS cluster, it is necessary to setup private endpoints for ACR, storage account, workspace in the AKS vnet.
527+
* If you are using a private AKS cluster, it is necessary to set up private endpoints for ACR, storage account, workspace in the AKS vnet.
528528

529529
### ERROR: EndpointNotFound
530530

articles/machine-learning/reference-kubernetes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Some logs about AzureML workloads in the cluster will be collected through exten
112112
|amlarc-identity-controller |Request and renew Azure Blob/Azure Container Registry token through managed identity. |Only used when `enableInference=true` is set when installing the extension. It has trace logs for status on getting identity for endpoints to authenticate with AzureML service.|
113113
|amlarc-identity-proxy |Request and renew Azure Blob/Azure Container Registry token through managed identity. |Only used when `enableInference=true` is set when installing the extension. It has trace logs for status on getting identity for the cluster to authenticate with AzureML service.|
114114
|aml-operator | Manage the lifecycle of training jobs. |The logs contain AzureML training job pod status in the cluster.|
115-
|azureml-fe-v2| The front-end component that routes incoming inference requests to deployed services. |Access logs at request level, including request Id, start time, response code, error details and durations for request latency. Trace logs for service metadata changes, service running healthy status, etc. for debugging purpose.|
115+
|azureml-fe-v2| The front-end component that routes incoming inference requests to deployed services. |Access logs at request level, including request ID, start time, response code, error details and durations for request latency. Trace logs for service metadata changes, service running healthy status, etc. for debugging purpose.|
116116
| gateway | The gateway is used to communicate and send data back and forth. | Trace logs on requests from AzureML services to the clusters.|
117117
|healthcheck |--| The logs contain azureml namespace resource (AzureML extension) status to diagnose what make the extension not functional. |
118118
|inference-operator-controller-manager| Manage the lifecycle of inference endpoints. |The logs contain AzureML inference endpoint and deployment pod status in the cluster.|

0 commit comments

Comments
 (0)