-
Notifications
You must be signed in to change notification settings - Fork 3
MySQL K8s fails unit address resolution when pod IP has multiple PTR records #137
Description
Steps to reproduce
- Deploy the MySQL K8s charm on Kubernetes.
- Let a non-primary unit resolve its unit address while DNS still exposes multiple PTR records for the pod IP.
- Observe
get_unit_address()retrying until it raises.
Expected behavior
The charm should resolve the Kubernetes canonical hostname for the unit reliably, even when multiple PTR records exist for the same pod IP.
Actual behavior
get_unit_address() retries and then raises because the DNS name returned is not the expected unit endpoint name:
File "/var/lib/juju/agents/unit-mysql-1/charm/src/charm.py", line 378, in get_unit_address
raise RuntimeError("unit DNS domain name is not fully propagated yet")
RuntimeError: unit DNS domain name is not fully propagated yet
The affected unit repeatedly logs:
unit-mysql-1: 07:18:44 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:06 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:08 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:10 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:12 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
Reverse lookups show multiple PTR records for the same IPs, including service and endpoint names:
root@nova-0:/# nslookup 10.1.1.210
;; Got recursion not available from 10.152.183.70
210.1.1.10.in-addr.arpa name = 10-1-1-210.mysql.openstack.svc.cluster.local.
210.1.1.10.in-addr.arpa name = mysql-0.mysql-endpoints.openstack.svc.cluster.local.
210.1.1.10.in-addr.arpa name = 10-1-1-210.mysql-primary.openstack.svc.cluster.local.
root@nova-0:/# nslookup 10.1.0.143
;; Got recursion not available from 10.152.183.70
143.0.1.10.in-addr.arpa name = 10-1-0-143.mysql.openstack.svc.cluster.local.
143.0.1.10.in-addr.arpa name = mysql-1.mysql-endpoints.openstack.svc.cluster.local.
143.0.1.10.in-addr.arpa name = 10-1-0-143.mysql-replicas.openstack.svc.cluster.local.
root@nova-0:/# nslookup 10.1.2.155
;; Got recursion not available from 10.152.183.70
155.2.1.10.in-addr.arpa name = 10-1-2-155.mysql.openstack.svc.cluster.local.
155.2.1.10.in-addr.arpa name = mysql-2.mysql-endpoints.openstack.svc.cluster.local.
155.2.1.10.in-addr.arpa name = 10-1-2-155.mysql-replicas.openstack.svc.cluster.local.
Versions
Operating system: Not provided
Juju CLI: Not provided
Juju agent: Not provided
Charm revision: 346 (mysql 8.0/stable)
Microk8s: Not provided
LXD: Not applicable
Log output
Juju debug log: Not attached
Relevant log excerpts:
File "/var/lib/juju/agents/unit-mysql-1/charm/src/charm.py", line 378, in get_unit_address
raise RuntimeError("unit DNS domain name is not fully propagated yet")
RuntimeError: unit DNS domain name is not fully propagated yet
unit-mysql-1: 07:18:44 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:06 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:08 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:10 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:12 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
Additional context
The failure appears to come from relying on reverse DNS / getfqdn() semantics when Kubernetes DNS returns multiple PTR records for a pod IP. A canonical-name lookup via getaddrinfo(..., AI_CANONNAME) appears to be more reliable for this case.
https://bugs.python.org/issue5004