Skip to content

MySQL K8s fails unit address resolution when pod IP has multiple PTR records #137

@gboutry

Description

@gboutry

Steps to reproduce

  1. Deploy the MySQL K8s charm on Kubernetes.
  2. Let a non-primary unit resolve its unit address while DNS still exposes multiple PTR records for the pod IP.
  3. Observe get_unit_address() retrying until it raises.

Expected behavior

The charm should resolve the Kubernetes canonical hostname for the unit reliably, even when multiple PTR records exist for the same pod IP.

Actual behavior

get_unit_address() retries and then raises because the DNS name returned is not the expected unit endpoint name:

File "/var/lib/juju/agents/unit-mysql-1/charm/src/charm.py", line 378, in get_unit_address
    raise RuntimeError("unit DNS domain name is not fully propagated yet")
RuntimeError: unit DNS domain name is not fully propagated yet

The affected unit repeatedly logs:

unit-mysql-1: 07:18:44 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:06 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:08 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:10 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:12 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again

Reverse lookups show multiple PTR records for the same IPs, including service and endpoint names:

root@nova-0:/# nslookup 10.1.1.210
;; Got recursion not available from 10.152.183.70
210.1.1.10.in-addr.arpa    name = 10-1-1-210.mysql.openstack.svc.cluster.local.
210.1.1.10.in-addr.arpa    name = mysql-0.mysql-endpoints.openstack.svc.cluster.local.
210.1.1.10.in-addr.arpa    name = 10-1-1-210.mysql-primary.openstack.svc.cluster.local.

root@nova-0:/# nslookup 10.1.0.143
;; Got recursion not available from 10.152.183.70
143.0.1.10.in-addr.arpa    name = 10-1-0-143.mysql.openstack.svc.cluster.local.
143.0.1.10.in-addr.arpa    name = mysql-1.mysql-endpoints.openstack.svc.cluster.local.
143.0.1.10.in-addr.arpa    name = 10-1-0-143.mysql-replicas.openstack.svc.cluster.local.

root@nova-0:/# nslookup 10.1.2.155
;; Got recursion not available from 10.152.183.70
155.2.1.10.in-addr.arpa    name = 10-1-2-155.mysql.openstack.svc.cluster.local.
155.2.1.10.in-addr.arpa    name = mysql-2.mysql-endpoints.openstack.svc.cluster.local.
155.2.1.10.in-addr.arpa    name = 10-1-2-155.mysql-replicas.openstack.svc.cluster.local.

Versions

Operating system: Not provided
Juju CLI: Not provided
Juju agent: Not provided
Charm revision: 346 (mysql 8.0/stable)
Microk8s: Not provided
LXD: Not applicable

Log output

Juju debug log: Not attached

Relevant log excerpts:

File "/var/lib/juju/agents/unit-mysql-1/charm/src/charm.py", line 378, in get_unit_address
    raise RuntimeError("unit DNS domain name is not fully propagated yet")
RuntimeError: unit DNS domain name is not fully propagated yet

unit-mysql-1: 07:18:44 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:06 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:08 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:10 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again
unit-mysql-1: 07:19:12 WARNING unit.mysql/1.juju-log get_unit_address: unit DNS domain name is not fully propagated yet, trying again

Additional context

The failure appears to come from relying on reverse DNS / getfqdn() semantics when Kubernetes DNS returns multiple PTR records for a pod IP. A canonical-name lookup via getaddrinfo(..., AI_CANONNAME) appears to be more reliable for this case.
https://bugs.python.org/issue5004

Metadata

Metadata

Assignees

Labels

bugSomething isn't working as expected

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions