Skip to content

🐛DNS resolver refresh logic enters a failure state that persists until the pod is restarted. #1571

@lmcmipl

Description

@lmcmipl

Describe the bug

We experienced a transient connectivity issue to the CoreDNS service in one of our K8s clusters that coincided with a cloudflared pod entering its DNS resolver refresh logic.

This resulted in a traffic outage for requests routed through that connector (502 errors)

From the Cloudflare Zero Trust dashboard, the tunnel remained in a “working” state for the entire duration of the outage, despite traffic failures.

The error log below occurred every 5 minutes.

Failing traffic and the cloudflared errors stopped only after restarting the pod.

To mitigate this type of issue in the future, I've tried to use the --dns-resolver-addrs flag to configure a static DNS resolver (i.e., set to the CoreDNS service CLUSTER_IP value) to disable resolver refresh behavior.

Based on this commit:

commit 398da8860f39617a2bb112f59ed3498af6a704ae
Date:   Mon Jun 30 15:20:32 2025 -0700

TUN-9473: Add --dns-resolver-addrs flag

To help support users with environments that don't work well with the
DNS local resolver's automatic resolution process for local resolver
addresses, we introduce a flag to provide them statically to the
runtime. When providing the resolver addresses, cloudflared will no
longer lookup the DNS resolver addresses and use the user input
directly.

When provided with a list of DNS resolvers larger than one, the resolver
service will randomly select one at random for each incoming request.

Closes TUN-9473

However, the flag is rejected as undefined when used with cloudflared tunnel run, even though the commit suggests it should be available to disable resolver refresh behavior.

The error is "Incorrect Usage. flag provided but not defined: --dns-resolver-addrs"

The flag is also not listed in cloudflared tunnel run --help.

To Reproduce
Steps to reproduce the behavior:

  1. Configure --dns-resolver-addrs flag by adding to cloudflared container arg
 args:
   - tunnel
   - --dns-resolver-addrs=<CoreDNS CLUSTER_IP value>
   - --config
   - /etc/cloudflared/config/config.yaml
   - run

Expected behavior

Is --dns-resolver-addrs intended to be supported for cloudflared tunnel run, or is there an alternative supported mechanism to enable static DNS resolver behavior for tunnels ?

In Kubernetes clusters, the CoreDNS ClusterIP is stable for the lifetime of the Service, so dynamic resolver refresh every 5 minutes is unnecessary and can introduce persistent failure states.

Environment and versions

  • Architecture: Kubernetes
  • Version: 2025.11.1

Logs and errors

2025-12-13T00:54:05Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T00:59:10Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:04:15Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:09:20Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:14:25Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:19:30Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:24:35Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:29:40Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:34:45Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:44:50Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T01:55:00Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T02:00:05Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"
2025-12-13T02:10:10Z ERR Failed to refresh DNS local resolver error="lookup region1.v2.argotunnel.com: i/o timeout"

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: NormalMinor issue impacting one or more usersType: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions