Skip to content

Commit 6f2dc09

Browse files
committed
Update k8s compute TSG and log info
1 parent b9fdfd8 commit 6f2dc09

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-compute.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ For AKS clusters:
142142

143143

144144
For an AKS cluster or an Azure Arc enabled Kubernetes cluster:
145-
1. Check if the Kubernetes API server is accessible by running `kubectl` command in cluster.
145+
* Check if the Kubernetes API server is accessible by running `kubectl` command in cluster.
146146

147147
#### ERROR: ClusterNotReachable
148148

articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ You need to use the same config settings as above, and you need to disable `job/
230230
#### Volcano scheduler integration supporting cluster autoscaler
231231
As discussed in this [thread](https://github.com/volcano-sh/volcano/issues/2558) , the **gang plugin** is not working well with the cluster autoscaler(CA) and also the node autoscaler in AKS.
232232
233-
If you use the volcano that comes with the AzureML extension via setting `installVolcano=true`, the extension will have a scheduler config by default, which configures the **gang** plugin to prevent job deadlock. Therefore, the the cluster autoscaler(CA) in AKS cluster will not be supported with the volcano installed by extension.
233+
If you use the volcano that comes with the AzureML extension via setting `installVolcano=true`, the extension will have a scheduler config by default, which configures the **gang** plugin to prevent job deadlock. Therefore, the cluster autoscaler(CA) in AKS cluster will not be supported with the volcano installed by extension.
234234
235235
For the case above, if you prefer the AKS cluster autoscaler could work normally, you can configure this `volcanoScheduler.schedulerConfigMap` parameter through updating extension, and specify a custom config of **no gang** volcano scheduler to it, for example:
236236
@@ -255,7 +255,7 @@ volcano-scheduler.conf: |
255255
256256
To use this config in your AKS cluster, you need to follow the steps below:
257257
1. Create a configmap file with the above config in the azureml namespace. This namespace will generally be created when you install the AzureML extension.
258-
1. Set `volcanoScheduler.schedulerConfigMap=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when install the extension by configuring `amloperator.skipResourceValidation=true`. For example:
258+
1. Set `volcanoScheduler.schedulerConfigMap=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when installing the extension by configuring `amloperator.skipResourceValidation=true`. For example:
259259
```azurecli
260260
az k8s-extension update --name <extension-name> --extension-type Microsoft.AzureML.Kubernetes --config volcanoScheduler.schedulerConfigMap=<configmap name> amloperator.skipResourceValidation=true --cluster-type managedClusters --cluster-name <your-AKS-cluster-name> --resource-group <your-RG-name> --scope cluster
261261
```

articles/machine-learning/how-to-troubleshoot-online-endpoints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -523,7 +523,7 @@ Below is a list of reasons you might run into this error when creating/updating
523523
To mitigate this error, refer to the following steps:
524524
* Check the `node selector` definition of the `instance type` you used, and `node label` configuration of your cluster nodes.
525525
* Check `instance type` and the node SKU size for AKS cluster or the node resource for Arc-Kubernetes cluster.
526-
* If the cluster is under-resourced, you can reduce the instance type resource requirement or use the another instance type with smaller resource required.
526+
* If the cluster is under-resourced, you can reduce the instance type resource requirement or use another instance type with smaller resource required.
527527
* If the cluster has no more resource to meet the requirement of the deployment, delete some deployment to release resources.
528528

529529

0 commit comments

Comments
 (0)