dns-discovery-netty: Add support for DNS backup requests #3369

bryce-anderson · 2025-12-04T18:06:36Z

Motivation

We want to explore whether we can reduce some tail latencies that
seem to be rooted in DNS request timeouts. DNS is most often used
over UDP, which is an inherently lossy protocol transport. When
a packet is lost we have to wait for timeouts and retries, which
can cause tail latencies.

Modifications

Instead of waiting for a full timeout, introduce a backup request
which will fire after a duration. If the first request is slow
because of packet loss or other unfortunate latency then we may
get a faster result from the second.

This is an experimental feature for now, but we can later enhance
it with adaptive backup request deadlines and token buckets to
make sure we're good citizens to the DNS servers.

...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java

#### Motivation We want to explore whether we can reduce some tail latencies that seem to be rooted in DNS request timeouts. DNS is most often used over UDP, which is an inherently lossy protocol transport. When a packet is lost we have to wait for timeouts and retries, which can cause tail latencies. #### Modifications Instead of waiting for a full timeout, introduce a backup request which will fire after a duration. If the first request is slow because of packet loss or other unfortunate latency then we may get a faster result from the second. This is an experimental feature for now, but we can later enhance it with adaptive backup request deadlines and token buckets to make sure we're good citizens to the DNS servers.

bryce-anderson · 2026-01-06T21:47:53Z

Note that this is a simplified version of #2918.

idelpivnitskiy

Nice approach and abstractions! Most of my comments are minor suggestions, except the one on absolute timer value vs adaptive. I feel like testing might be hard and risky with an absolute value and we will too quickly come to the need in adaptive timer

...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java

...tty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsServiceDiscovererBuilder.java

idelpivnitskiy · 2026-01-20T05:47:19Z

...tty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsServiceDiscovererBuilder.java


+    // Backup request static configuration: values > 0 mean allow a backup request with fixed delay, disabled otherwise.
+    private static final String DNS_BACKUP_REQUEST_DELAY_MS_PROPERTY =
+            "io.servicetalk.dns.discovery.netty.experimental.dnsBackupRequestDelayMs";


experimental in the name signals that it's a temporary property, but if the plan is to enable it by default and remove later, then finding the right integer value that fits everyone will be a tricky task. If we plan to give users ability to configure this value and the property is only to control default value, then we may need to add builder API and consider renaming the property to something permanent like io.servicetalk.dns.discovery.netty.defaultDnsBackupRequestDelayMs

On the topic of finding the right value: in addition to finding perfect initial value, users (and upstream DNS servers) may be affected if some network (or DNS) conditions change.

WDYT if instead of specific ms value we will ask users to provide a percentile? Then we can track latencies internally and send backup requests only for those that exceed certain percentile, like p999, p99 or p95. Of course, it increases our internal complexity but on the other hand adjusts to a deployment environment and lets users control what percentage of their requests to retry. This may help avoid situations when an accidental spike in latency triggers a storm of backup requests that can kill upstream DNS server

We talked offline, and decided the fixed number approach is fine so long as we strictly control the experiment. That lets us test some values real fast in a controlled way and that can inform us how to set a dynamic value (eg, what percentile to pick by default) later after we sort out some details of how to compute percentiles. One caveat is that we need to make the parsing of the property 'hot' instead of once on startup, and with the value of 0 skipping it altogether for the life of the process so we don't introduce changes for apps we're not testing with.

...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java

...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java

idelpivnitskiy · 2026-01-20T06:27:29Z

...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java

+        verify(backupResolver, times(1)).resolveAll("foo");
+        List<InetAddress> result = new ArrayList<>();
+        backupPromise.setSuccess(result);
+        assertEquals(result, resolveFuture.get());


Few ideas for considerations:

using "same" instead of "equals"

checking primaryPromise is cancelled

parametrizing the test to ensure it works the same regardless of which promise completes first

Took 1 and 3, but I was deliberate about not cancelling the losing promise. Primarily, it's more noise and allocations to trigger cancellation of the other and it's also not that expensive to let the query finish.

...s-discovery-netty/src/test/java/io/servicetalk/dns/discovery/netty/DefaultDnsClientTest.java

bryce-anderson requested review from idelpivnitskiy and normanmaurer December 4, 2025 18:06

normanmaurer reviewed Dec 4, 2025

View reviewed changes

...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java Outdated Show resolved Hide resolved

idelpivnitskiy reviewed Dec 8, 2025

View reviewed changes

bryce-anderson force-pushed the bl_anderson/backup-dns-request-alt branch from bfcab9e to 7bb39c7 Compare December 18, 2025 17:26

bryce-anderson force-pushed the bl_anderson/backup-dns-request-alt branch from b96eb2a to f611635 Compare December 18, 2025 19:11

bryce-anderson changed the title ~~WIP: backup DNS requests to reduce tail latencies associated with lost DNS packets~~ dns-discovery-netty: Add support for DNS backup requests Dec 18, 2025

bryce-anderson requested a review from idelpivnitskiy December 18, 2025 19:12

bryce-anderson marked this pull request as ready for review December 18, 2025 19:12

bryce-anderson mentioned this pull request Jan 6, 2026

dns-discovery-netty: Add support for DNS hedging requests #2918

Closed

idelpivnitskiy reviewed Jan 20, 2026

View reviewed changes

Idel feedback

66a97ed

bryce-anderson requested a review from idelpivnitskiy January 22, 2026 23:10

dns-discovery-netty: Add support for DNS backup requests #3369

Are you sure you want to change the base?

dns-discovery-netty: Add support for DNS backup requests #3369

Uh oh!

Conversation

bryce-anderson commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bryce-anderson commented Jan 6, 2026

Uh oh!

idelpivnitskiy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

idelpivnitskiy Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

idelpivnitskiy Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

bryce-anderson Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

idelpivnitskiy Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

bryce-anderson Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bryce-anderson commented Dec 4, 2025 •

edited

Loading