Skip to content

Commit 3ecefb7

Browse files
Merge pull request #21 from razo7/update-readme-2
update Readme - 2
2 parents 31c6d16 + 87a2eea commit 3ecefb7

File tree

1 file changed

+53
-41
lines changed

1 file changed

+53
-41
lines changed

README.md

Lines changed: 53 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
1-
# Node Maintenance Operator
1+
# Node Maintenance Operator (NMO)
22

3-
The node-maintenance-operator is an operator generated from the [operator-sdk](https://github.com/operator-framework/operator-sdk).
4-
The purpose of this operator is to watch for new or deleted custom resources called NodeMaintenance which indicate that a node in the cluster should either:
5-
- NodeMaintenance CR created: move node into maintenance, cordon the node - set it as unschedulable and evict the pods (which can be evicted) from that node.
6-
- NodeMaintenance CR deleted: remove pod from maintenance and uncordon the node - set it as schedulable.
3+
The node-maintenance-operator (NMO) is an operator generated from the [operator-sdk](https://github.com/operator-framework/operator-sdk).
4+
The purpose of this operator is to watch for new or deleted custom resources (CRs) called `NodeMaintenance` which indicate that a node in the cluster should either:
5+
- `NodeMaintenance` CR created: move node into maintenance, cordon the node - set it as unschedulable, and evict the pods (which can be evicted) from that node.
6+
- `NodeMaintenance` CR deleted: remove pod from maintenance and uncordon the node - set it as schedulable.
77

8-
> *Note*: The current behavior of the operator is to mimic `kubectl drain <node name>`
9-
> as performed in [medik8s - evict all VMs and Pods on a node ](https://kubevirt.io/user-guide//operations/node_maintenance/#evict-all-vms-and-pods-from-a-node)
8+
> *Note*: The current behavior of the operator is to mimic `kubectl drain <node name>`.
109
1110
## Build and run the operator
1211

1312
There are two ways to run the operator:
1413

15-
- Deploy the latest version, which was built from master branch, to a running Openshift/Kubernetes cluster.
16-
- Build and run or deploy from sources to a running or to be created Openshift/Kubernetes cluster.
14+
- Deploy the latest version, which was built from master branch, to a running OpenShift/Kubernetes cluster.
15+
- Build and deploy from sources to a running or to be created OpenShift/Kubernetes cluster.
1716

1817
### Deploy the latest version
1918

2019
After every PR merge to master images were build and pushed to `quay.io`.
2120
For deployment of NMO using these images you need:
2221

23-
- a running Openshift cluster, or a Kubernetes cluster with OLM installed.
24-
- `operator-sdk` binary installed, see https://sdk.operatorframework.io/docs/installation/
22+
- a running OpenShift cluster, or a Kubernetes cluster with Operator Lifecycle Manager (OLM) installed.
23+
- `operator-sdk` binary installed, see https://sdk.operatorframework.io/docs/installation/.
2524
- a valid `$KUBECONFIG` configured to access your cluster.
2625

2726
Then run `operator-sdk run bundle quay.io/medik8s/node-maintenance-operator-bundle:latest`
@@ -34,10 +33,10 @@ Follow the instructions [here](https://sdk.operatorframework.io/docs/building-op
3433

3534
### Set Maintenance on - Create a NodeMaintenance CR
3635

37-
To set maintenance on a node a `NodeMaintenance` CustomResource should be created.
36+
To set maintenance on a node a `NodeMaintenance` custom resource should be created.
3837
The `NodeMaintenance` CR spec contains:
39-
- nodeName: The name of the node which will be put into maintenance.
40-
- reason: the reason for the node maintenance.
38+
- nodeName: The name of the node which will be put into maintenance mode.
39+
- reason: The reason why the node will be under maintenance.
4140

4241
Create the example `NodeMaintenance` CR found at `config/samples/nodemaintenance_v1beta1_nodemaintenance.yaml`:
4342

@@ -54,62 +53,75 @@ spec:
5453
$ kubectl apply -f config/samples/nodemaintenance_v1beta1_nodemaintenance.yaml
5554

5655
$ kubectl logs <nmo-pod-name>
57-
{"level":"info","ts":1551794418.6742408,"logger":"controller_nodemaintenance","msg":"Reconciling NodeMaintenance","Request.Namespace":"default","Request.Name":"node02"}
58-
{"level":"info","ts":1551794418.674294,"logger":"controller_nodemaintenance","msg":"Applying Maintenance mode on Node: node02 with Reason: Test node maintenance","Request.Namespace":"default","Request.Name":"node02"}
59-
{"level":"info","ts":1551783365.7430992,"logger":"controller_nodemaintenance","msg":"WARNING: ignoring DaemonSet-managed Pods: default/local-volume-provisioner-5xft8, medik8s/disks-images-provider-bxpc5, medik8s/virt-handler-52kpr, openshift-monitoring/node-exporter-4c9jt, openshift-node/sync-8w5x8, openshift-sdn/ovs-kvz9w, openshift-sdn/sdn-qnjdz\n"}
60-
{"level":"info","ts":1551783365.7471824,"logger":"controller_nodemaintenance","msg":"evicting pod \"virt-operator-5559b7d86f-2wsnz\"\n"}
61-
{"level":"info","ts":1551783365.7472217,"logger":"controller_nodemaintenance","msg":"evicting pod \"cdi-operator-55b47b74b5-9v25c\"\n"}
62-
{"level":"info","ts":1551783365.747241,"logger":"controller_nodemaintenance","msg":"evicting pod \"virt-api-7fcd86776d-652tv\"\n"}
63-
{"level":"info","ts":1551783365.747243,"logger":"controller_nodemaintenance","msg":"evicting pod \"simple-deployment-1-m5qv9\"\n"}
64-
{"level":"info","ts":1551783365.7472336,"logger":"controller_nodemaintenance","msg":"evicting pod \"virt-controller-8987cffb8-29w26\"\n"}
56+
022-02-23T07:33:58.924Z INFO controller-runtime.manager.controller.nodemaintenance Reconciling NodeMaintenance {"reconciler group": "nodemaintenance.medik8s.io", "reconciler kind": "NodeMaintenance", "name": "nodemaintenance-sample", "namespace": ""}
57+
2022-02-23T07:33:59.266Z INFO controller-runtime.manager.controller.nodemaintenance Applying maintenance mode {"reconciler group": "nodemaintenance.medik8s.io", "reconciler kind": "NodeMaintenance", "name": "nodemaintenance-sample", "namespace": "", "node": "node02", "reason": "Test node maintenance"}
58+
time="2022-02-24T11:58:20Z" level=info msg="Maintenance taints will be added to node node02"
59+
time="2022-02-24T11:58:20Z" level=info msg="Applying medik8s.io/drain taint add on Node: node02"
60+
time="2022-02-24T11:58:20Z" level=info msg="Patching taints on Node: node02"
61+
2022-02-23T07:33:59.336Z INFO controller-runtime.manager.controller.nodemaintenance Evict all Pods from Node {"reconciler group": "nodemaintenance.medik8s.io", "reconciler kind": "NodeMaintenance", "name": "nodemaintenance-sample", "namespace": "", "nodeName": "node02"}
62+
E0223 07:33:59.498801 1 nodemaintenance_controller.go:449] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-jrprj, openshift-dns/dns-default-kf6jj, openshift-dns/node-resolver-72jzb, openshift-image-registry/node-ca-czgc6, openshift-ingress-canary/ingress-canary-44tgv, openshift-machine-config-operator/machine-config-daemon-csv6c, openshift-monitoring/node-exporter-rzwhz, openshift-multus/multus-additional-cni-plugins-829bh, openshift-multus/multus-qwfc9, openshift-multus/network-metrics-daemon-pxt6n, openshift-network-diagnostics/network-check-target-qqcbr, openshift-sdn/sdn-s5cqx; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-marketplace/nmo-downstream-8-8nms7
63+
I0223 07:33:59.500418 1 nodemaintenance_controller.go:449] evicting pod openshift-network-diagnostics/network-check-source-865d4b5578-n2cxg
64+
I0223 07:33:59.500790 1 nodemaintenance_controller.go:449] evicting pod openshift-ingress/router-default-7548cf6fb5-rgxrq
65+
I0223 07:33:59.500944 1 nodemaintenance_controller.go:449] evicting pod openshift-marketplace/12a4cfa0c2be01867daf1d9b7ad7c0ae7a988fd957a2ad6df0d72ff6875lhcx
66+
I0223 07:33:59.501061 1 nodemaintenance_controller.go:449] evicting pod openshift-marketplace/nmo-downstream-8-8nms7
6567
...
6668
```
6769

6870
### Set Maintenance off - Delete the NodeMaintenance CR
6971

70-
To remove maintenance from a node, delete the corresponding `NodeMaintenance` CR:
72+
To remove maintenance from a node, delete the corresponding `NodeMaintenance` (or `nm` which is a shortName) CR:
7173

7274
```sh
73-
$ kubectl delete nodemaintenance nodemaintenance-sample
74-
75+
$ kubectl delete nm nodemaintenance-sample
76+
nodemaintenance.nodemaintenance.medik8s.io "nodemaintenance-sample" deleted
7577
$ kubectl logs <nmo-pod-name>
76-
{"level":"info","ts":1551794725.0018933,"logger":"controller_nodemaintenance","msg":"Reconciling NodeMaintenance","Request.Namespace":"default","Request.Name":"node02"}
77-
{"level":"info","ts":1551794725.0021605,"logger":"controller_nodemaintenance","msg":"NodeMaintenance Object: default/node02 Deleted ","Request.Namespace":"default","Request.Name":"node02"}
78-
{"level":"info","ts":1551794725.0022023,"logger":"controller_nodemaintenance","msg":"uncordon Node: node02"}
79-
78+
2022-02-24T14:27:35.332Z INFO controller-runtime.manager.controller.nodemaintenance Reconciling NodeMaintenance {"reconciler group": "nodemaintenance.medik8s.io", "reconciler kind": "NodeMaintenance", "name": "nodemaintenance-sample", "namespace": ""}
79+
time="2022-02-24T14:27:35Z" level=info msg="Maintenance taints will be removed from node node02"
80+
time="2022-02-24T14:27:35Z" level=info msg="Applying medik8s.io/drain taint remove on Node: node02"
81+
...
8082
```
8183

8284
## NodeMaintenance Status
8385

84-
The NodeMaintenance CR can contain the following status fields:
86+
The `NodeMaintenance` CR can contain the following status fields:
8587

8688
```yaml
89+
$ kubectl get nm nodemaintenance-sample -o yaml
8790
apiVersion: nodemaintenance.medik8s.io/v1beta1
8891
kind: NodeMaintenance
8992
metadata:
90-
name: nodemaintenance-xyz
93+
name: nodemaintenance-sample
9194
spec:
9295
nodeName: node02
93-
reason: "Test node maintenance"
96+
reason: Test node maintenance
9497
status:
95-
phase: "Running"
96-
lastError: "Last failure message"
97-
pendingPods: [pod-A,pod-B,pod-C]
98-
totalPods: 5
99-
evictionPods: 3
100-
98+
evictionPods: 5
99+
lastError: 'Last failure message'
100+
pendingPods:
101+
- pod-A
102+
- pod-B
103+
- pod-C
104+
- pod-D
105+
- pod-E
106+
phase: Running
107+
totalpods: 19
101108
```
102109
103-
`phase` is the representation of the maintenance progress and can hold a string value of: Running|Succeeded.
104-
The phase is updated for each processing attempt on the CR.
110+
`evictionPods` is the total number of pods up for eviction from the start.
105111

106112
`lastError` represents the latest error if any for the latest reconciliation.
107113

108114
`pendingPods` PendingPods is a list of pending pods for eviction.
109115

116+
`phase` is the representation of the maintenance progress and can hold a string value of: Running|Succeeded.
117+
The phase is updated for each processing attempt on the CR.
118+
110119
`totalPods` is the total number of all pods on the node from the start.
111120

112-
`evictionPods` is the total number of pods up for eviction from the start.
121+
## Debug
122+
### Collecting cluster data with must-gather
123+
124+
Use NMO's must-gather from [here](https://github.com/medik8s/node-maintenance-operator/tree/master/must-gather) to collect related debug data.
113125

114126
## Tests
115127

0 commit comments

Comments
 (0)