Skip to content

Commit 3b95901

Browse files
Merge pull request #249785 from jiaochenlu/update-230830
Update new TSG of K8s compute
2 parents f15d3b4 + b7b305c commit 3b95901

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,20 @@ To update the extension with a custom controller class:
286286
```
287287
az ml extension update --config nginxIngress.controller="k8s.io/amlarc-ingress-nginx"
288288
```
289+
#### Nginx ingress controller installed with the Azure Machine Learning extension crashes due to out-of-memory (OOM) errors
289290
291+
**Symptom**
290292
293+
The nginx ingress controller installed with the Azure Machine Learning extension crashes due to out-of-memory (OOM) errors even when there is no workload. The controller logs do not show any useful information to diagnose the problem.
291294
295+
**Possible Cause**
292296
297+
This issue may occur if the nginx ingress controller runs on a node with many CPUs. By default, the nginx ingress controller spawns worker processes according to the number of CPUs, which may consume more resources and cause OOM errors on nodes with more CPUs. This is a known [issue](https://github.com/kubernetes/ingress-nginx/issues/8166) reported on GitHub
298+
299+
**Resolution**
300+
301+
To resolve this issue, you can:
302+
* Adjust the number of worker processes by installing the extension with the parameter `nginxIngress.controllerConfig.worker-processes=8`.
303+
* Increase the memory limit by using the parameter `nginxIngress.resources.controller.limits.memory=<new limit>`.
304+
305+
Ensure to adjust these two parameters according to your specific node specifications and workload requirements to optimize your workloads effectively.

0 commit comments

Comments
 (0)