Skip to content

Commit 2dec8eb

Browse files
committed
Introduce playbook to update kubernetes cluster from one patch version to another
1 parent 0882aa2 commit 2dec8eb

File tree

18 files changed

+267
-62
lines changed

18 files changed

+267
-62
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
## Update build cluster's Kubernetes version.
2+
3+
This guide covers the steps to update the build cluster's kubernetes version from one patch version to another, or
4+
across minor versions through a playbook.
5+
6+
This update guide is applicable for HA clusters, and is extensively used to automate and update the nodes with a particular
7+
version of Kubernetes.
8+
9+
The playbook is written to install the kubeadm utility through a package manager post, and then proceeding to perform
10+
the cluster upgrade. Initially, all the master nodes are updated, followed by which the worker nodes are updated.
11+
12+
13+
#### Prerequisites
14+
```
15+
Ansible
16+
Kubeconfig of the cluster
17+
```
18+
19+
#### Steps to follow:
20+
1. From the k8s-ansible directory, generate the hosts.yml file on which the Kubernetes cluster updates are to be performed.
21+
In this case, one can use the `hosts.yml` file under `examples/containerd-cluster/hosts.yml` to contain the IP(s)
22+
of the following nodes: Workers and Masters.
23+
```
24+
[masters]
25+
10.20.177.51
26+
10.20.177.26
27+
10.20.177.227
28+
[workers]
29+
10.20.177.39
30+
```
31+
32+
The following lines may additionally be needed in hosts.yml file, in case the cluster is associated with a bastion node.
33+
Here, the private key helps with establishing the SSH connection to the bastion, and X refers to the bastion's IP address.
34+
```
35+
[workers:vars]
36+
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
37+
38+
[masters:vars]
39+
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
40+
```
41+
2. Set the path to the `kubeconfig` of the cluster under `group_vars/all` - under the `kubeconfig_path` variable.
42+
3. Set the Kubernetes versions in the following variables in `group_vars/all` - under the following variables, for example
43+
```
44+
kubernetes_major_minor: "1.32"
45+
kubernetes_patch: "2"
46+
```
47+
3. Once the above are set use the following command to update the nodes -
48+
`ansible-playbook -i examples/containerd-cluster/hosts.yml update-k8s-version.yml --extra-vars group_vars/all`
49+
4. This will proceed to update the nodes. Post update, the same can be verified by executing the kubectl version - under
50+
`Server Version` field
51+
```
52+
# kubectl version
53+
Client Version: v1.32.3
54+
Kustomize Version: v5.5.0
55+
Server Version: v1.33.1
56+
```
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
## Update build cluster's nodes with latest Kernel/patches.
2+
3+
This guide covers the steps to update the build cluster's OS/Kernel/packages to the latest available versions based
4+
on their availability through package managers. It is necessary to keep the nodes to have the latest security patches
5+
installed and have the kernel up-to-date.
6+
7+
This update guide is applicable for HA clusters, and is extensively used to automate and perform the rolling updates of
8+
the nodes.
9+
10+
The strategy used in updating the nodes is by performing rolling-updates to the nodes, post confirming that there are no
11+
pods in the `test-pods` namespace that generally contains the prow-job workloads. The playbook has the mechanism to wait
12+
until the namespace is free from running pods, however, there may be a necessity to terminate the boskos-related pods
13+
as these are generally long-running in nature.
14+
15+
#### Prerequisites
16+
```
17+
ansible
18+
private key of the bastion node.
19+
```
20+
21+
#### Steps to follow:
22+
1. From the k8s-ansible directory, generate the hosts.yml file on which the OS updates are to be performed.
23+
In this case, one can use the hosts.yml file under `examples/containerd-cluster/hosts.yml` to contain the IP(s)
24+
of the following nodes - Bastion, Workers and Masters.
25+
```
26+
[bastion]
27+
1.2.3.4
28+
[masters]
29+
10.20.177.51
30+
10.20.177.26
31+
10.20.177.227
32+
[workers]
33+
10.20.177.39
34+
35+
[workers:vars]
36+
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
37+
38+
[masters:vars]
39+
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
40+
```
41+
2. Set the path to the `kubeconfig` of the cluster under group_vars/all - under the `kubeconfig_path` variable.
42+
3. Once the above are set use the following command to update the nodes -
43+
`ansible-playbook -i examples/containerd-cluster/hosts.yml update-os-packages.yml --extra-vars group_vars/all`
44+
4. This will proceed to update the nodes, and reboot them serially if necessary.

kubetest2-tf/data/k8s-ansible/group_vars/all

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,3 +94,8 @@ cni_plugins_tarball: "cni-plugins-linux-{{ ansible_architecture }}-{{ cni_plugin
9494

9595
# NFS server details
9696
nfs_directory: "/var/nfsshare"
97+
98+
99+
##### Kubernetes version update related fields #####
100+
kubernetes_major_minor: ""
101+
kubernetes_patch: ""

kubetest2-tf/data/k8s-ansible/roles/download-k8s/tasks/main.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- name: Extract the k8s bits
1111
unarchive:
1212
src: "{{ src_template }}/{{ item }}"
13-
dest: "/usr/local/bin/"
13+
dest: "/usr/bin/"
1414
remote_src: yes
1515
extra_opts:
1616
- --strip-components=3
@@ -21,7 +21,7 @@
2121
- name: Import control plane component container images
2222
shell: |
2323
export ARCH={{ ansible_architecture }}
24-
ctr -n k8s.io images import "/usr/local/bin/{{ item }}.tar"
24+
ctr -n k8s.io images import "/usr/bin/{{ item }}.tar"
2525
ctr -n k8s.io images ls -q | grep -e {{ item }} | xargs -L 1 -I '{}' /bin/bash -c 'ctr -n k8s.io images tag "{}" "$(echo "{}" | sed s/-'$ARCH':/:/)"'
2626
with_items:
2727
- kube-apiserver
@@ -33,7 +33,7 @@
3333
- name: Import common container images required for setting up cluster
3434
shell: |
3535
export ARCH={{ ansible_architecture }}
36-
ctr -n k8s.io images import "/usr/local/bin/{{ item }}.tar"
36+
ctr -n k8s.io images import "/usr/bin/{{ item }}.tar"
3737
ctr -n k8s.io images ls -q | grep -e {{ item }} | xargs -L 1 -I '{}' /bin/bash -c 'ctr -n k8s.io images tag "{}" "$(echo "{}" | sed s/-'$ARCH':/:/)"'
3838
with_items:
3939
- kube-proxy

kubetest2-tf/data/k8s-ansible/roles/install-k8s/tasks/main.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,20 @@
4545
dest: /usr/lib/systemd/system/kubelet.service
4646
mode: '0644'
4747

48+
- name: Ensure destination directory exists
49+
ansible.builtin.file:
50+
path: /etc/systemd/system/kubelet.service.d
51+
state: directory
52+
mode: '0755'
53+
54+
# https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd
55+
- name: Generate an override configuration to /etc/systemd/system/kubelet.service
56+
template:
57+
src: override.conf.j2
58+
dest: /etc/systemd/system/kubelet.service.d/override.conf
59+
mode: '0644'
60+
61+
4862
- name: Enable and start kubelet
4963
systemd:
5064
name: kubelet

kubetest2-tf/data/k8s-ansible/roles/install-k8s/templates/kubelet.service.j2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ After=network-online.target
77
[Service]
88
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
99
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
10-
ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
10+
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
1111
Restart=always
1212
StartLimitInterval=0
1313
RestartSec=10
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
[Unit]
2+
Description=kubelet: The Kubernetes Node Agent
3+
Documentation=https://kubernetes.io/docs/
4+
Wants=network-online.target
5+
After=network-online.target
6+
7+
[Service]
8+
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
9+
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
10+
ExecStart=
11+
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
12+
Restart=always
13+
StartLimitInterval=0
14+
RestartSec=10
15+
16+
[Install]
17+
WantedBy=multi-user.target
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
- name: Resolve Kubernetes node name from inventory IP
2+
shell: |
3+
kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=='InternalIP')].address}{'\n'}{end}" --kubeconfig {{ kubeconfig_path }} |\
4+
grep {{ inventory_hostname }} | awk '{print $1}'
5+
register: node_name
6+
delegate_to: "{{ groups['masters'][0] }}"
7+
8+
- name: Cordon the kubernetes node
9+
shell: |
10+
kubectl cordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
11+
register: drain_output
12+
changed_when: "'already cordoned' not in drain_output.stdout"
13+
delegate_to: "{{ groups['masters'][0] }}"
14+
15+
- name: Check and wait if there are any running jobs that need to complete before draining.
16+
shell: |
17+
kubectl get pods -n test-pods \
18+
--kubeconfig {{ kubeconfig_path }} \
19+
--field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running \
20+
-o go-template={% raw %}'{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\n"}} {{end}}{{end}}'{% endraw %} \
21+
| wc -l
22+
register: running_pod_count
23+
retries: 360
24+
delay: 30
25+
until: running_pod_count.stdout | int == 0
26+
delegate_to: "{{ groups['masters'][0] }}"
27+
28+
- name: Drain Kubernetes Node
29+
shell: |
30+
kubectl drain {{ node_name.stdout }} --ignore-daemonsets --delete-emptydir-data --kubeconfig {{ kubeconfig_path }}
31+
register: drain_output
32+
changed_when: "'already cordoned' not in drain_output.stdout"
33+
delegate_to: "{{ groups['masters'][0] }}"
34+
35+
- name: Wait for all pods to be evicted
36+
shell: |
37+
kubectl get pods -n test-pods --field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running -o go-template='{% raw %}{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\\n"}}{{end}}{{end}}{% endraw %}' | wc -l
38+
register: pods_remaining
39+
until: pods_remaining.stdout | int == 0
40+
retries: 10
41+
delay: 15
42+
delegate_to: "{{ groups['masters'][0] }}"
43+
Lines changed: 4 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,11 @@
11
- block:
2-
- name: Resolve Kubernetes node name from inventory IP
3-
shell: |
4-
kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=='InternalIP')].address}{'\n'}{end}" --kubeconfig {{ kubeconfig_path }} |\
5-
grep {{ inventory_hostname }} | awk '{print $1}'
6-
register: node_name
7-
delegate_to: "{{ groups['masters'][0] }}"
8-
9-
- name: Cordon the kubernetes node
10-
shell: |
11-
kubectl cordon {{ node_name.stdout }}
12-
register: drain_output
13-
changed_when: "'already cordoned' not in drain_output.stdout"
14-
delegate_to: "{{ groups['masters'][0] }}"
15-
16-
- name: Check and wait if there are any running jobs that need to complete before draining.
17-
shell: |
18-
kubectl get pods -n test-pods \
19-
--kubeconfig {{ kubeconfig_path }} \
20-
--field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running \
21-
-o go-template={% raw %}'{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\n"}} {{end}}{{end}}'{% endraw %} \
22-
| wc -l
23-
register: running_pod_count
24-
retries: 360
25-
delay: 30
26-
until: running_pod_count.stdout | int == 0
27-
delegate_to: "{{ groups['masters'][0] }}"
28-
29-
- name: Drain Kubernetes Node
30-
shell: |
31-
kubectl drain {{ node_name.stdout }} --ignore-daemonsets --delete-emptydir-data --kubeconfig {{ kubeconfig_path }}
32-
register: drain_output
33-
changed_when: "'already cordoned' not in drain_output.stdout"
34-
delegate_to: "{{ groups['masters'][0] }}"
35-
36-
- name: Wait for all pods to be evicted
37-
shell: |
38-
kubectl get pods -n test-pods --field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running -o go-template='{% raw %}{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\\n"}}{{end}}{{end}}{% endraw %}' | wc -l
39-
register: pods_remaining
40-
until: pods_remaining.stdout | int == 0
41-
retries: 10
42-
delay: 15
43-
delegate_to: "{{ groups['masters'][0] }}"
2+
- name: Include the cordon phase tasks to prepare nodes for upgrade
3+
include_tasks: cordon-phase.yaml
444

455
- name: Reboot node
466
reboot:
477

48-
- name: Wait for node to become Ready
49-
shell: |
50-
kubectl get node {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
51-
register: node_status
52-
until: node_status.stdout == "True"
53-
retries: 20
54-
delay: 15
55-
delegate_to: "{{ groups['masters'][0] }}"
56-
57-
- name: Uncordon the node
58-
shell: kubectl uncordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
59-
delegate_to: "{{ groups['masters'][0] }}"
8+
- name: Include the uncordon phase tasks post upgrade of node
9+
include_tasks: uncordon-phase.yaml
6010

6111
when: reboot_check is defined and reboot_check.rc == 1
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
- name: Wait for node to become Ready
2+
shell: |
3+
kubectl get node {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
4+
register: node_status
5+
until: node_status.stdout == "True"
6+
retries: 20
7+
delay: 15
8+
delegate_to: "{{ groups['masters'][0] }}"
9+
10+
- name: Uncordon the node
11+
shell: kubectl uncordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
12+
delegate_to: "{{ groups['masters'][0] }}"
13+

0 commit comments

Comments
 (0)