Skip to content

Commit 0d30e3a

Browse files
authored
Merge pull request #261074 from neilverse/main
Main
2 parents 99aad51 + d7d823c commit 0d30e3a

File tree

2 files changed

+78
-1
lines changed

2 files changed

+78
-1
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,6 @@
136136
href: howto-use-azure-policy.md
137137
- name: MDE Runtime Protection
138138
href: howto-use-mde-runtime-protection.md
139-
140139
- name: Install CLI Extension
141140
href: howto-install-cli-extensions.md
142141
- name: Troubleshooting
@@ -151,6 +150,8 @@
151150
href: troubleshoot-kubernetes-cluster-dual-stack-configuration.md
152151
- name: Troubleshoot BMM reboot issues
153152
href: troubleshoot-bmm-node-reboot.md
153+
- name: Enable node down cleaner
154+
href: troubleshoot-enable-node-down-cleaner.md
154155
- name: BareMetal Actions
155156
expanded: false
156157
items:
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: "Azure Operator Nexus: Enable node down cleaner"
3+
description: Learn how to enable node down cleaner.
4+
author: neilverse
5+
ms.author: soumyamaitra
6+
ms.service: azure-operator-nexus
7+
ms.topic: troubleshooting
8+
ms.date: 12/12/2023
9+
ms.custom: troubleshooting
10+
---
11+
12+
# Enable node down cleaner
13+
14+
Azure Operator Nexus introduces a new feature called node down cleaner, which is disabled by default.
15+
It helps in moving NFS server pods from a failed node to a new node when a Bare Metal Host is powers off through AZ CLI.
16+
The following procedure enables node down cleaner and is applicable for both green field and brown field environments.
17+
18+
## Prerequisites
19+
20+
- This article assumes that you've installed the Azure command line interface & the `networkcloud` command line interface extension. For more information, see [How to Install CLI Extensions](./howto-install-cli-extensions.md).
21+
- Logged in to Azure CLI with the correct subscription
22+
- The target bare metal machine power state is on and has readyState set to True
23+
- User must have the appropriate permission assigned to execute the `networkcloud baremetalmachine run-command`
24+
25+
## Steps to enable node down cleaner on cluster
26+
27+
The procedure needs to be run against management nodes. To determine which nodes are management nodes, you can run the following Azure CLI `baremetalmachine run-read-command`.
28+
29+
```azurecli
30+
az networkcloud baremetalmachine run-read-command --name <any-ready-baremetal-machine> \
31+
--commands "[{command:'kubectl get',arguments:[nodes,-l,platform.afo-nc.microsoft.com/role=control-plane]}]" \
32+
--limit-time-seconds 60 \
33+
--resource-group <cluster-managed-resource-group> \
34+
--subscription <subscription>
35+
```
36+
37+
Run the following command to enable node down cleaner
38+
39+
```azurecli
40+
az networkcloud baremetalmachine run-command --bare-metal-machine-name <management-node-baremetal-machine> \
41+
--subscription <subscription> \
42+
--resource-group <cluster-managed-resource-group> \
43+
--limit-time-seconds 60 \
44+
--script "IyEvYmluL2Jhc2gKCmt1YmVjdGwgZ2V0IGRlcGxveW1lbnQgLW4gbmMtc3lzdGVtIG5vZGUtZG93
45+
bi1jbGVhbmVyCgprdWJlY3RsIHNjYWxlIGRlcGxveW1lbnQgLW4gbmMtc3lzdGVtIG5vZGUtZG93
46+
bi1jbGVhbmVyIC0tcmVwbGljYXM9MQoKa3ViZWN0bCBnZXQgZGVwbG95bWVudCAtbiBuYy1zeXN0
47+
ZW0gbm9kZS1kb3duLWNsZWFuZXIKCmt1YmVjdGwgZ2V0IHBvZHMgLW4gbmMtc3lzdGVtIC1sIGFw
48+
cC5rdWJlcm5ldGVzLmlvL25hbWU9bm9kZS1kb3duLWNsZWFuZXIKCg=="
49+
```
50+
51+
The script executes the following kubectl commands:
52+
53+
```console
54+
kubectl get deployment -n nc-system node-down-cleaner
55+
56+
kubectl scale deployment -n nc-system node-down-cleaner --replicas=1
57+
58+
kubectl get deployment -n nc-system node-down-cleaner
59+
60+
sleep 5s
61+
62+
kubectl get pods -n nc-system -l app.kubernetes.io/name=node-down-cleaner
63+
```
64+
65+
On execution of the baremetalmachine run-command, node down cleaner will scale to one replica and its pod should be in running state. The output would look like:
66+
67+
```output
68+
====Action Command Output====
69+
NAME READY UP-TO-DATE AVAILABLE AGE
70+
node-down-cleaner 0/0 0 0 4d9h
71+
deployment.apps/node-down-cleaner scaled
72+
NAME READY UP-TO-DATE AVAILABLE AGE
73+
node-down-cleaner 0/1 1 0 4d9h
74+
NAME READY STATUS RESTARTS AGE
75+
node-down-cleaner-xxxxxxxxxxxxxx 1/1 Running 0 5s
76+
```

0 commit comments

Comments
 (0)