Skip to content

Commit 9062bea

Browse files
committed
minor clarification changes
1 parent 5142d9e commit 9062bea

File tree

1 file changed

+24
-2
lines changed

1 file changed

+24
-2
lines changed

articles/operator-nexus/troubleshoot-memory-limits.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ author: matternst7258
1313

1414
## Alerting for memory limits
1515

16-
It is recommended to have alerts setup for the Operator Nexus cluster to look for Kubernetes pods restarting from OOMKill errors. These alerts will allow customers to know if a component on a server is working appropriately.
16+
It's recommended to have alerts set up for the Operator Nexus cluster to look for Kubernetes pods restarting from OOMKill errors. These alerts allow customers to know if a component on a server is working appropriately.
1717

1818
## Identifying Out of Memory (OOM) pods
1919

@@ -51,11 +51,33 @@ The data from these commands identify whether a pod is restarting due to `OOMKil
5151

5252
## Patching memory limits
5353

54-
It is recommended for all memory limit changes be reported to Microsoft support for further investigation or adjustments.
54+
Raise all memory limit changes be reported to Microsoft support for further investigation or adjustments.
5555

5656
> [!WARNING]
5757
> Patching memory limits to a pod are not permanent and can be overwritten if the pod restarts.
5858
59+
## Confirm memory limit changes
60+
61+
When memory limits change, the pods should return to `Ready` state and stop restarting.
62+
63+
The following commands can be used to confirm the behavior.
64+
65+
```azcli
66+
az networkcloud baremetalmachine run-read-command --name "<bareMetalMachineName>" \
67+
--limit-time-seconds 60 \
68+
--commands "[{command:'kubectl get',arguments:[pods,-n,nc-system]}]" \
69+
--resource-group "<cluster_MRG>" \
70+
--subscription "<subscription>"
71+
```
72+
73+
```azcli
74+
az networkcloud baremetalmachine run-read-command --name "<bareMetalMachineName>" \
75+
--limit-time-seconds 60 \
76+
--commands "[{command:'kubectl describe',arguments:[pod,<podName>,-n,nc-system]}]" \
77+
--resource-group "<cluster_MRG>" \
78+
--subscription "<subscription>"
79+
```
80+
5981
## Known services susceptible to OOM issues
6082

6183
* cdi-operator

0 commit comments

Comments
 (0)