Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions kubetest2-tf/data/k8s-ansible/docs/update-k8s.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
## Update build cluster's Kubernetes version.

This guide covers the steps to update the build cluster's kubernetes version from one patch version to another, or
across minor versions through a playbook.

This update guide is applicable for HA clusters, and is extensively used to automate and update the nodes with a particular
version of Kubernetes.

The playbook is written to install the kubeadm utility through a package manager post, and then proceeding to perform
the cluster upgrade. Initially, all the master nodes are updated, followed by which the worker nodes are updated.


#### Prerequisites
```
Ansible
Kubeconfig of the cluster
```

#### Steps to follow:
1. From the k8s-ansible directory, generate the hosts.yml file on which the Kubernetes cluster updates are to be performed.
In this case, one can use the `hosts.yml` file under `examples/containerd-cluster/hosts.yml` to contain the IP(s)
of the following nodes: Workers and Masters.
```
[masters]
10.20.177.51
10.20.177.26
10.20.177.227
[workers]
10.20.177.39
```

The following lines may additionally be needed in hosts.yml file, in case the cluster is associated with a bastion node.
Here, the private key helps with establishing the SSH connection to the bastion, and X refers to the bastion's IP address.
```
[workers:vars]
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'

[masters:vars]
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
```
2. Set the path to the `kubeconfig` of the cluster under `group_vars/all` - under the `kubeconfig_path` variable.
3. Set the Kubernetes versions in the following variables in `group_vars/all` - under the following variables, for example
```
kubernetes_major_minor: "1.32"
kubernetes_patch: "2"
```
3. Once the above are set use the following command to update the nodes -
`ansible-playbook -i examples/containerd-cluster/hosts.yml update-k8s-version.yml --extra-vars group_vars/all`
4. This will proceed to update the nodes. Post update, the same can be verified by executing the kubectl version - under
`Server Version` field
```
# kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.33.1
```
44 changes: 44 additions & 0 deletions kubetest2-tf/data/k8s-ansible/docs/update-os.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## Update build cluster's nodes with latest Kernel/patches.

This guide covers the steps to update the build cluster's OS/Kernel/packages to the latest available versions based
on their availability through package managers. It is necessary to keep the nodes to have the latest security patches
installed and have the kernel up-to-date.

This update guide is applicable for HA clusters, and is extensively used to automate and perform the rolling updates of
the nodes.

The strategy used in updating the nodes is by performing rolling-updates to the nodes, post confirming that there are no
pods in the `test-pods` namespace that generally contains the prow-job workloads. The playbook has the mechanism to wait
until the namespace is free from running pods, however, there may be a necessity to terminate the boskos-related pods
as these are generally long-running in nature.

#### Prerequisites
```
ansible
private key of the bastion node.
```

#### Steps to follow:
1. From the k8s-ansible directory, generate the hosts.yml file on which the OS updates are to be performed.
In this case, one can use the hosts.yml file under `examples/containerd-cluster/hosts.yml` to contain the IP(s)
of the following nodes - Bastion, Workers and Masters.
In case if a bastion is involved in the setup, it is necessary to have a [bastion] section and the associated IP in the `hosts.yml` file
```
[masters]
10.20.177.51
10.20.177.26
10.20.177.227
[workers]
10.20.177.39

## The following section is needed if a bastion is involved.
##[workers:vars]
##ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
##
##[masters:vars]
##ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i <path-to-private-key> -q root@X" -i <path-to-private-key>'
```
2. Set the path to the `kubeconfig` of the cluster under group_vars/all - under the `kubeconfig_path` variable.
3. Once the above are set use the following command to update the nodes -
`ansible-playbook -i examples/containerd-cluster/hosts.yml update-os-packages.yml --extra-vars group_vars/all`
4. This will proceed to update the nodes, and reboot them serially if necessary.
5 changes: 5 additions & 0 deletions kubetest2-tf/data/k8s-ansible/group_vars/all
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,8 @@ cni_plugins_tarball: "cni-plugins-linux-{{ ansible_architecture }}-{{ cni_plugin

# NFS server details
nfs_directory: "/var/nfsshare"


##### Kubernetes version update related fields #####
kubernetes_major_minor: ""
kubernetes_patch: ""
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- name: Extract the k8s bits
unarchive:
src: "{{ src_template }}/{{ item }}"
dest: "/usr/local/bin/"
dest: "/usr/bin/"
remote_src: yes
extra_opts:
- --strip-components=3
Expand All @@ -21,7 +21,7 @@
- name: Import control plane component container images
shell: |
export ARCH={{ ansible_architecture }}
ctr -n k8s.io images import "/usr/local/bin/{{ item }}.tar"
ctr -n k8s.io images import "/usr/bin/{{ item }}.tar"
ctr -n k8s.io images ls -q | grep -e {{ item }} | xargs -L 1 -I '{}' /bin/bash -c 'ctr -n k8s.io images tag "{}" "$(echo "{}" | sed s/-'$ARCH':/:/)"'
with_items:
- kube-apiserver
Expand All @@ -33,7 +33,7 @@
- name: Import common container images required for setting up cluster
shell: |
export ARCH={{ ansible_architecture }}
ctr -n k8s.io images import "/usr/local/bin/{{ item }}.tar"
ctr -n k8s.io images import "/usr/bin/{{ item }}.tar"
ctr -n k8s.io images ls -q | grep -e {{ item }} | xargs -L 1 -I '{}' /bin/bash -c 'ctr -n k8s.io images tag "{}" "$(echo "{}" | sed s/-'$ARCH':/:/)"'
with_items:
- kube-proxy
14 changes: 14 additions & 0 deletions kubetest2-tf/data/k8s-ansible/roles/install-k8s/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@
dest: /usr/lib/systemd/system/kubelet.service
mode: '0644'

- name: Ensure destination directory exists
ansible.builtin.file:
path: /etc/systemd/system/kubelet.service.d
state: directory
mode: '0755'

# https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd
- name: Generate an override configuration to /etc/systemd/system/kubelet.service
template:
src: override.conf.j2
dest: /etc/systemd/system/kubelet.service.d/override.conf
mode: '0644'


- name: Enable and start kubelet
systemd:
name: kubelet
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ After=network-online.target
[Service]
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
Restart=always
StartLimitInterval=0
RestartSec=10
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_EXTRA_ARGS {{ kubelet_extra_args }}
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
- name: Resolve Kubernetes node name from inventory IP
shell: |
kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=='InternalIP')].address}{'\n'}{end}" --kubeconfig {{ kubeconfig_path }} |\
grep {{ inventory_hostname }} | awk '{print $1}'
register: node_name
delegate_to: "{{ groups['masters'][0] }}"

- name: Cordon the kubernetes node
shell: |
kubectl cordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
register: drain_output
changed_when: "'already cordoned' not in drain_output.stdout"
delegate_to: "{{ groups['masters'][0] }}"

- name: Check and wait if there are any running jobs that need to complete before draining.
shell: |
kubectl get pods -n test-pods \
--kubeconfig {{ kubeconfig_path }} \
--field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running \
-o go-template={% raw %}'{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\n"}} {{end}}{{end}}'{% endraw %} \
| wc -l
register: running_pod_count
retries: 360
delay: 30
until: running_pod_count.stdout | int == 0
delegate_to: "{{ groups['masters'][0] }}"

- name: Drain Kubernetes Node
shell: |
kubectl drain {{ node_name.stdout }} --ignore-daemonsets --delete-emptydir-data --kubeconfig {{ kubeconfig_path }}
register: drain_output
changed_when: "'already cordoned' not in drain_output.stdout"
delegate_to: "{{ groups['masters'][0] }}"

- name: Wait for all pods to be evicted
shell: |
kubectl get pods -n test-pods --field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running -o go-template='{% raw %}{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\\n"}}{{end}}{{end}}{% endraw %}' | wc -l
register: pods_remaining
until: pods_remaining.stdout | int == 0
retries: 10
delay: 15
delegate_to: "{{ groups['masters'][0] }}"

Original file line number Diff line number Diff line change
@@ -1,61 +1,11 @@
- block:
- name: Resolve Kubernetes node name from inventory IP
shell: |
kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=='InternalIP')].address}{'\n'}{end}" --kubeconfig {{ kubeconfig_path }} |\
grep {{ inventory_hostname }} | awk '{print $1}'
register: node_name
delegate_to: "{{ groups['masters'][0] }}"

- name: Cordon the kubernetes node
shell: |
kubectl cordon {{ node_name.stdout }}
register: drain_output
changed_when: "'already cordoned' not in drain_output.stdout"
delegate_to: "{{ groups['masters'][0] }}"

- name: Check and wait if there are any running jobs that need to complete before draining.
shell: |
kubectl get pods -n test-pods \
--kubeconfig {{ kubeconfig_path }} \
--field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running \
-o go-template={% raw %}'{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\n"}} {{end}}{{end}}'{% endraw %} \
| wc -l
register: running_pod_count
retries: 360
delay: 30
until: running_pod_count.stdout | int == 0
delegate_to: "{{ groups['masters'][0] }}"

- name: Drain Kubernetes Node
shell: |
kubectl drain {{ node_name.stdout }} --ignore-daemonsets --delete-emptydir-data --kubeconfig {{ kubeconfig_path }}
register: drain_output
changed_when: "'already cordoned' not in drain_output.stdout"
delegate_to: "{{ groups['masters'][0] }}"

- name: Wait for all pods to be evicted
shell: |
kubectl get pods -n test-pods --field-selector spec.nodeName={{ node_name.stdout }},status.phase=Running -o go-template='{% raw %}{{range .items}}{{if or (not .metadata.ownerReferences) (ne (index .metadata.ownerReferences 0).kind "DaemonSet")}}{{.metadata.name}}{{"\\n"}}{{end}}{{end}}{% endraw %}' | wc -l
register: pods_remaining
until: pods_remaining.stdout | int == 0
retries: 10
delay: 15
delegate_to: "{{ groups['masters'][0] }}"
- name: Include the cordon phase tasks to prepare nodes for upgrade
include_tasks: cordon-phase.yaml

- name: Reboot node
reboot:

- name: Wait for node to become Ready
shell: |
kubectl get node {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
register: node_status
until: node_status.stdout == "True"
retries: 20
delay: 15
delegate_to: "{{ groups['masters'][0] }}"

- name: Uncordon the node
shell: kubectl uncordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
delegate_to: "{{ groups['masters'][0] }}"
- name: Include the uncordon phase tasks post upgrade of node
include_tasks: uncordon-phase.yaml

when: reboot_check is defined and reboot_check.rc == 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
- name: Wait for node to become Ready
shell: |
kubectl get node {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
register: node_status
until: node_status.stdout == "True"
retries: 20
delay: 15
delegate_to: "{{ groups['masters'][0] }}"

- name: Uncordon the node
shell: kubectl uncordon {{ node_name.stdout }} --kubeconfig {{ kubeconfig_path }}
delegate_to: "{{ groups['masters'][0] }}"

Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ After=network-online.target
Type=notify
EnvironmentFile=-/etc/sysconfig/crio
Environment=GOTRACEBACK=crash
ExecStart=/usr/local/bin/crio \
ExecStart=/usr/bin/crio \
$CRIO_CONFIG_OPTIONS \
$CRIO_RUNTIME_OPTIONS \
$CRIO_STORAGE_OPTIONS \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
- name: Download CRI-O - {{ crio_version }}
unarchive:
src: "https://storage.googleapis.com/cri-o/artifacts/cri-o.{{ ansible_architecture }}.v{{ crio_version }}.tar.gz"
dest: "/usr/local/bin/"
dest: "/usr/bin/"
remote_src: yes
include: cri-o/bin
extra_opts:
Expand Down
4 changes: 2 additions & 2 deletions kubetest2-tf/data/k8s-ansible/roles/runtime/tasks/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
- name: Install crictl - {{ critools_version }}
unarchive:
src: "https://github.com/kubernetes-sigs/cri-tools/releases/download/v{{ critools_version }}/crictl-v{{ critools_version }}-linux-{{ ansible_architecture }}.tar.gz"
dest: "/usr/local/bin/"
dest: "/usr/bin/"
remote_src: yes

- name: Install iptables
Expand All @@ -56,7 +56,7 @@
- name: Install runc - {{ runc_version }}
get_url:
url: "https://github.com/opencontainers/runc/releases/download/v{{ runc_version }}/runc.{{ ansible_architecture }}"
dest: /usr/local/bin/runc
dest: /usr/bin/runc
mode: '0755'

- name: Install and Configure Runtime - Containerd
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
- name: Add-in the kubernetes repo to /etc/yum.repos.d/kubernetes.repo
template:
src: kubernetes.repo.j2
dest: /etc/yum.repos.d/kubernetes.repo
mode: '0644'

- name: Update the package repository's cache
shell: yum makecache

- name: Install kubeadm of the target version.
shell: sudo yum install -y kubeadm-'{{ kubernetes_major_minor }}.{{ kubernetes_patch }}-*' --disableexcludes=kubernetes
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v{{ kubernetes_major_minor }}/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v{{ kubernetes_major_minor }}/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
- name: Import Cordon related tasks to upgrade kubernetes cluster nodes
include_tasks: ../reboot-sequentially/tasks/cordon-phase.yaml

- name: Plan the kubernetes version update to - {{ kubernetes_major_minor }}.{{ kubernetes_patch}}
shell: kubeadm upgrade plan
when: groups['masters']|length > 1 and inventory_hostname == groups['masters'][0]

- name: Kill the API Server to prevent accepting any inflight-requests
shell: killall -s SIGTERM kube-apiserver && sleep 20
when: node_type == "master"

- name: Perform kubeadm upgrade on the first control-plane node
shell: kubeadm upgrade apply v{{ kubernetes_major_minor }}.{{ kubernetes_patch }}
when: groups['masters']|length > 1 and inventory_hostname == groups['masters'][0]

- name: Update the kubelet and the kubectl utilities
shell: sudo yum install -y kubelet-'{{ kubernetes_major_minor }}.{{ kubernetes_patch }}-*' kubectl-'{{ kubernetes_major_minor }}.{{ kubernetes_patch }}-*' --disableexcludes=kubernetes

- name: Perform kubeadm upgrade on the rest of the nodes
shell: kubeadm upgrade node
when: inventory_hostname != groups['masters'][0] and (node_type == "master" or node_type == "worker")

- name: Reload the systemd processes
shell: sudo systemctl daemon-reload

- name: restart the kubelet
shell: sudo systemctl restart kubelet

- name: Import Uncordon related tasks to upgrade kubernetes cluster nodes
include_tasks: ../reboot-sequentially/tasks/uncordon-phase.yaml
Loading