Skip to content

Add additional failure data into EDE OPT messages on SERVFAIL #16061

@johnhtodd

Description

@johnhtodd
  • Program: Recursor
  • Issue type: Feature request

Short description

It would be useful to have additional data added into the EDE OPT field(s) for describing why a SERVFAIL was produced.

Usecase

We have users raising issues with us about failed lookups, which are the result of various problems that are downstream of our recursive resolvers. We have to spend time debugging and relaying information. It would save us time and energy if the data coming from the recursive resolver contained more fields that were descriptive of the problem, in addition to the EDE which is the summarized synthesized view of the result.

Description

Google DNS does what I think is a good job on this, and I would be quite happy to see a close replication of their error model and set of results. Here is an example:

root@dev5:/tmp# dig @8.8.8.8 A qaux.me.

; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> @8.8.8.8 A qaux.me.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1200
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 23 (Network Error): ([76.223.26.245] rcode=REFUSED for qaux.me/a)
; EDE: 23 (Network Error): ([75.2.118.134] rcode=REFUSED for qaux.me/a)
; EDE: 23 (Network Error): ([99.83.147.209] rcode=REFUSED for qaux.me/a)
; EDE: 23 (Network Error): ([13.248.158.180] rcode=REFUSED for qaux.me/a)
; EDE: 22 (No Reachable Authority): (At delegation qaux.me for qaux.me/a)
;; QUESTION SECTION:
;qaux.me.			IN	A

;; Query time: 60 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Aug 29 15:37:56 UTC 2025
;; MSG SIZE  rcvd: 273

root@dev5:/tmp#

or

root@dev5:/tmp# dig @8.8.8.8 A baskent-adn.edu.tr. +timeout=6 |grep EDE
; EDE: 22 (No Reachable Authority): (At delegation baskent-adn.edu.tr for baskent-adn.edu.tr/a)
root@dev5:/tmp#

This is just packing more EDE messages in that are more informative into what gets transmitted to the requesting client. More data is good.

I think this should be an optional feature, but it even seems like a reasonable candidate to be moved to "default" after some testing time.

At the bottom of this page are the extended text messages that Google includes on their results. I'm sure it's non-exhaustive, but these messages (or even just a subset of them) would reduce the uncertainty about why errors are being generated, and would therefore reduce our costs and burden for support. I have no qualms about copying Google's methods and text.

https://developers.google.com/speed/public-dns/docs/troubleshooting/domains

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions