Skip to content

Commit 1f1a4f6

Browse files
committed
Update k8s compute TSG and log info
1 parent 4668d55 commit 1f1a4f6

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -228,11 +228,13 @@ volcano-scheduler.conf: |
228228
You need to use the same config settings as above, and you need to disable `job/validate` webhook in the volcano admission if your **volcano version is lower than 1.6**, so that AzureML training workloads can perform properly.
229229
230230
#### Volcano scheduler integration supporting cluster autoscaler
231-
As discussed in this [thread](https://github.com/volcano-sh/volcano/issues/2558) , the **gang plugin** is not working well with the cluster autoscaler(CA) and also the node autoscaler in AKS.
231+
As discussed in this [thread](https://github.com/volcano-sh/volcano/issues/2558) , the **gang plugin** is not working well with the cluster autoscaler(CA) and also the node autoscaler in AKS.
232232
233-
In this case, you could use this config of **no gang** volcano scheduler when using cluster autoscaler:
233+
If you use the volcano that comes with the AzureML extension via setting `installVolcano=true`, the extension will have a scheduler config by default, which configures the **gang** plugin to prevent job deadlock. Therefore, the the cluster autoscaler(CA) in AKS cluster will not be supported with the volcano installed by extension.
234234
235-
```yaml
235+
For the case above, if you prefer the AKS cluster autoscaler could work normally, you can configure this `volcanoScheduler.schedulerConfigMap` parameter through updating extension, and specify a custom config of **no gang** volcano scheduler to it, for example:
236+
237+
```yaml
236238
volcano-scheduler.conf: |
237239
actions: "enqueue, allocate, backfill"
238240
tiers:
@@ -251,11 +253,11 @@ volcano-scheduler.conf: |
251253
- name: binpack
252254
```
253255
254-
To use this config after you install the Azureml extension with the configuration setting of `installVolcano=true`, you need to follow the steps below:
256+
To use this config in your AKS cluster, you need to follow the steps below:
255257
1. Create a configmap file with the above config in the azureml namespace. This namespace will generally be created when you install the AzureML extension.
256-
1. Set `volcanoSchedulerConfig=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when install the extension by configuring `amloperator.skipResourceValidation=true`. For example:
258+
1. Set `volcanoScheduler.schedulerConfigMap=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when install the extension by configuring `amloperator.skipResourceValidation=true`. For example:
257259
```azurecli
258-
az k8s-extension update --name <extension-name> --extension-type Microsoft.AzureML.Kubernetes --config volcanoSchedulerConfig=<configmap name> amloperator.skipResourceValidation=true --cluster-type managedClusters --cluster-name <your-AKS-cluster-name> --resource-group <your-RG-name> --scope cluster
260+
az k8s-extension update --name <extension-name> --extension-type Microsoft.AzureML.Kubernetes --config volcanoScheduler.schedulerConfigMap=<configmap name> amloperator.skipResourceValidation=true --cluster-type managedClusters --cluster-name <your-AKS-cluster-name> --resource-group <your-RG-name> --scope cluster
259261
```
260262
261263
> [!NOTE]

0 commit comments

Comments
 (0)