You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -228,11 +228,13 @@ volcano-scheduler.conf: |
228
228
You need to use the same config settings as above, and you need to disable `job/validate` webhook in the volcano admission if your **volcano version is lower than 1.6**, so that AzureML training workloads can perform properly.
As discussed in this [thread](https://github.com/volcano-sh/volcano/issues/2558) , the **gang plugin** is not working well with the cluster autoscaler(CA) and also the node autoscaler in AKS.
231
+
As discussed in this [thread](https://github.com/volcano-sh/volcano/issues/2558) , the **gang plugin** is not working well with the cluster autoscaler(CA) and also the node autoscaler in AKS.
232
232
233
-
In this case, you could use this config of **no gang** volcano scheduler when using cluster autoscaler:
233
+
If you use the volcano that comes with the AzureML extension via setting `installVolcano=true`, the extension will have a scheduler config by default, which configures the **gang** plugin to prevent job deadlock. Therefore, the the cluster autoscaler(CA) in AKS cluster will not be supported with the volcano installed by extension.
234
234
235
-
```yaml
235
+
For the case above, if you prefer the AKS cluster autoscaler could work normally, you can configure this `volcanoScheduler.schedulerConfigMap` parameter through updating extension, and specify a custom config of **no gang** volcano scheduler to it, for example:
236
+
237
+
```yaml
236
238
volcano-scheduler.conf: |
237
239
actions: "enqueue, allocate, backfill"
238
240
tiers:
@@ -251,11 +253,11 @@ volcano-scheduler.conf: |
251
253
- name: binpack
252
254
```
253
255
254
-
To use this config after you install the Azureml extension with the configuration setting of `installVolcano=true`, you need to follow the steps below:
256
+
To use this config in your AKS cluster, you need to follow the steps below:
255
257
1. Create a configmap file with the above config in the azureml namespace. This namespace will generally be created when you install the AzureML extension.
256
-
1. Set `volcanoSchedulerConfig=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when install the extension by configuring `amloperator.skipResourceValidation=true`. For example:
258
+
1. Set `volcanoScheduler.schedulerConfigMap=<configmap name>` in the extension config to apply this configmap. And you need to skip the resource validation when install the extension by configuring `amloperator.skipResourceValidation=true`. For example:
0 commit comments