-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Description
What happened?
TASK [remove-node/remove-etcd-node : Remove member from cluster] *****************************
fatal: [timy-rm-c7 -> timy-rm-c4]: FAILED! => {"msg": "The conditional check 'etcd_removed_nodes != []' failed. The error was: An unhandled exception occurred while templating '{{ (etcd_members.stdout | from_json).members | selectattr('peerURLs.0', '==', etcd_peer_url) }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating 'https://{{ etcd_access_address | ansible.utils.ipwrap }}:2380'. Error was a <class 'ansible.errors.AnsibleFilterError'>, original message: Unrecognized type <<class 'ansible.template.native_helpers.AnsibleUndefined'>> for ipwrap filter <value>
The error appears to be in '/home/rocky/k8s-deploy/roles/remove-node/remove-etcd-node/tasks/main.yml': line 38, column 11, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
# This should always have at most one member, since the etcd_peer_url should be unique in the etcd cluster
when: etcd_removed_nodes != []
^ here
"}
etcd_access_address: "{{ hostvars[inventory_hostname]['main_access_ip'] }}" IP is undefined when node is turned off.
What did you expect to happen?
Able to remove node from cluster using ip in hosts even when the node is unreachable
How can we reproduce it (as minimally and precisely as possible)?
- create the cluster with the hosts file
- c7 is intentionally shutdown, trying to remove the node from the cluster with extra vars
reset_nodes=false allow_ungraceful_removal=true
OS
RHEL 8
Version of Ansible
[rocky@k8s-node-001 ~]$ ansible-playbook --version
ansible-playbook [core 2.16.15]
config file = None
configured module search path = ['/home/rocky/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/rocky/k8s-deploy/ansible-py3/ansible
ansible collection location = /home/rocky/.ansible/collections:/usr/share/ansible/collections
executable location = /home/rocky/k8s-deploy/ansible-playbook-py3
python version = 3.12.12 (main, Jan 6 2026, 17:15:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-28)] (/usr/bin/python3.12)
jinja version = 3.1.6
libyaml = True
Version of Python
3.12.12
Version of Kubespray (commit)
Network plugin used
flannel
Full inventory with variables
[all:vars]
ansible_user='rocky'
ansible_ssh_private_key_file='/home/rocky/k8s-deploy/xxx-us-west-2.pem'
#ansible_ssh_pass=''
ansible_become=true
ansible_become_method='sudo'
#ansible_become_pass=''
# Uncomment and set ansible_python_interpreter to '/usr/libexec/platform-python' when OS is RockyLinux 8
ansible_python_interpreter='/usr/libexec/platform-python'
# Uncomment and configure 'ip' variable to bind K8s services on a different IP than the default network interface IP
[all]
timy-rm-c1 ansible_host=10.10.1.171
timy-rm-c2 ansible_host=10.10.1.208
timy-rm-c3 ansible_host=10.10.1.229
timy-rm-c4 ansible_host=10.10.1.227
timy-rm-c5 ansible_host=10.10.1.141
timy-rm-c6 ansible_host=10.10.1.31
timy-rm-c7 ansible_host=10.10.1.201
[etcd]
timy-rm-c1
timy-rm-c2
timy-rm-c3
timy-rm-c4
timy-rm-c5
timy-rm-c6
timy-rm-c7
[kube_control_plane]
timy-rm-c1
timy-rm-c2
timy-rm-c3
timy-rm-c4
timy-rm-c5
timy-rm-c6
timy-rm-c7
[kube_node]
timy-rm-c1
timy-rm-c2
timy-rm-c3
timy-rm-c4
timy-rm-c5
timy-rm-c6
timy-rm-c7
[kube_ingress]
timy-rm-c1
timy-rm-c2
timy-rm-c3
timy-rm-c4
timy-rm-c5
timy-rm-c6
timy-rm-c7
[vos_generic_node]
timy-rm-c1
timy-rm-c2
timy-rm-c3
timy-rm-c4
timy-rm-c5
timy-rm-c6
timy-rm-c7
[vos_controller_node]
[vos_egress_node]
[vos_ingest_node]
[pgdb]
timy-rm-c3
timy-rm-c4
timy-rm-c5
[prometheus]
# DO NOT MODIFY BELOW UNLESS YOU KNOW WHAT YOU ARE DOING
[kube_node:children]
kube_ingress
vos_node
pgdb
prometheus
[vos_node:children]
vos_controller_node
vos_egress_node
vos_ingest_node
[vos_controller_node:children]
vos_generic_node
[vos_egress_node:children]
vos_generic_node
[vos_ingest_node:children]
vos_generic_node
Command used to invoke ansible
ansible-playbook -vb remove-node.yml -e node=timy-rm-c7 -e 'reset_nodes=false allow_ungraceful_removal=true' --flush-cache
Output of ansible run
Our modified task:
---
# Such that first etcd can be deleted
- name: Set delegated_etcd_host
set_fact: delegated_etcd_host="{{ groups['etcd'] | difference([(node | default(''))]) | first }}"
- name: Remove etcd member from cluster
environment:
ETCDCTL_API: "3"
ETCDCTL_CERT: "{{ kube_cert_dir + '/etcd/server.crt' if etcd_deployment_type == 'kubeadm' else etcd_cert_dir + '/admin-' + delegated_etcd_host + '.pem' }}"
ETCDCTL_KEY: "{{ kube_cert_dir + '/etcd/server.key' if etcd_deployment_type == 'kubeadm' else etcd_cert_dir + '/admin-' + delegated_etcd_host + '-key.pem' }}"
ETCDCTL_CACERT: "{{ kube_cert_dir + '/etcd/ca.crt' if etcd_deployment_type == 'kubeadm' else etcd_cert_dir + '/ca.pem' }}"
ETCDCTL_ENDPOINTS: "https://127.0.0.1:2379"
#delegate_to: "{{ groups['etcd'] | first }}"
delegate_to: "{{ delegated_etcd_host }}"
block:
- name: Lookup members infos
command: "{{ bin_dir }}/etcdctl member list -w json"
register: etcd_members
changed_when: false
check_mode: false
tags:
- facts
- name: Debug etcdctl member list
debug:
msg: "{{ etcd_members.stdout_lines }}"
- name: Remove member from cluster
command:
argv:
- "{{ bin_dir }}/etcdctl"
- member
- remove
# Merge https://github.com/kubernetes-sigs/kubespray/commit/22fb8f8c988954eecb2ab1bf36dbfb0d13668327
#- "{{ '%x' | format(((etcd_members.stdout | from_json).members | selectattr('peerURLs.0', '==', etcd_peer_url))[0].ID) }}"
- "{{ '%x' | format(etcd_removed_nodes[0].ID) }}"
vars:
etcd_removed_nodes: "{{ (etcd_members.stdout | from_json).members | selectattr('peerURLs.0', '==', etcd_peer_url) }}"
# This should always have at most one member, since the etcd_peer_url should be unique in the etcd cluster
when: etcd_removed_nodes != []
register: etcd_removal_output
changed_when: "'Removed member' in etcd_removal_output.stdout"
Error Output:
TASK [remove-node/remove-etcd-node : Remove member from cluster] *****************************
fatal: [timy-rm-c7 -> timy-rm-c4]: FAILED! => {"msg": "The conditional check 'etcd_removed_nodes != []' failed. The error was: An unhandled exception occurred while templating '{{ (etcd_members.stdout | from_json).members | selectattr('peerURLs.0', '==', etcd_peer_url) }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating 'https://{{ etcd_access_address | ansible.utils.ipwrap }}:2380'. Error was a <class 'ansible.errors.AnsibleFilterError'>, original message: Unrecognized type <<class 'ansible.template.native_helpers.AnsibleUndefined'>> for ipwrap filter <value>
The error appears to be in '/home/rocky/k8s-deploy/roles/remove-node/remove-etcd-node/tasks/main.yml': line 38, column 11, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
# This should always have at most one member, since the etcd_peer_url should be unique in the etcd cluster
when: etcd_removed_nodes != []
^ here
"}
Anything else we need to know
Pre a0261a11aa4c18b53a6b733996e5df5241169314 behavior /roles/kubespray-defaults/defaults/main/main.yml
# Vars for pointing to etcd endpoints
etcd_address: "{{ ip | default(fallback_ip) }}"
etcd_access_address: "{{ access_ip | default(etcd_address) }}"
etcd_events_access_address: "{{ access_ip | default(etcd_address) }}"
etcd_peer_url: "https://{{ etcd_access_address }}:2380"
New behaviour
# Vars for pointing to etcd endpoints
etcd_address: "{{ hostvars[inventory_hostname]['main_ip'] }}"
etcd_access_address: "{{ hostvars[inventory_hostname]['main_access_ip'] }}"
etcd_events_access_address: "{{ hostvars[inventory_hostname]['main_access_ip'] }}"
etcd_peer_url: "https://{{ etcd_access_address | ansible.utils.ipwrap }}:2380"