Skip to content

Commit 95517ba

Browse files
committed
Nat hole punching under the hood
1 parent 3269915 commit 95517ba

File tree

1 file changed

+267
-3
lines changed

1 file changed

+267
-3
lines changed

discv5/discv5-theory.md

Lines changed: 267 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Node Discovery Protocol v5 - Theory
22

3-
**Protocol version v5.2**
3+
**Protocol version v5.1**
44

55
This document explains the algorithms and data structures used by the protocol.
66

@@ -191,13 +191,13 @@ pending when WHOAREYOU is received, as in the following example:
191191

192192
A -> B FINDNODE
193193
A -> B PING
194-
A -> B TALKREQ
194+
A -> B TOPICQUERY
195195
A <- B WHOAREYOU (nonce references PING)
196196

197197
When this happens, all buffered requests can be considered invalid (the remote end cannot
198198
decrypt them) and the packet referenced by the WHOAREYOU `nonce` (in this example: PING)
199199
must be re-sent as a handshake. When the response to the re-sent is received, the new
200-
session is established and other pending requests (example: FINDNODE, TALKREQ) may be
200+
session is established and other pending requests (example: FINDNODE, TOPICQUERY) may be
201201
re-sent.
202202

203203
Note that WHOAREYOU is only ever valid as a response to a previously sent request. If
@@ -334,11 +334,275 @@ the distance to retrieve more nodes from adjacent k-buckets on `B`:
334334
Node `A` now sorts all received nodes by distance to the lookup target and proceeds by
335335
repeating the lookup procedure on another, closer node.
336336

337+
## Topic Advertisement
338+
339+
The topic advertisement subsystem indexes participants by their provided services. A
340+
node's provided services are identified by arbitrary strings called 'topics'. A node
341+
providing a certain service is said to 'place an ad' for itself when it makes itself
342+
discoverable under that topic. Depending on the needs of the application, a node can
343+
advertise multiple topics or no topics at all. Every node participating in the discovery
344+
protocol acts as an advertisement medium, meaning that it accepts topic ads from other
345+
nodes and later returns them to nodes searching for the same topic.
346+
347+
### Topic Table
348+
349+
Nodes store ads for any number of topics and a limited number of ads for each topic. The
350+
data structure holding advertisements is called the 'topic table'. The list of ads for a
351+
particular topic is called the 'topic queue' because it functions like a FIFO queue of
352+
limited length. The image below depicts a topic table containing three queues. The queue
353+
for topic `T₁` is at capacity.
354+
355+
![topic table](./img/topic-queue-diagram.png)
356+
357+
The queue size limit is implementation-defined. Implementations should place a global
358+
limit on the number of ads in the topic table regardless of the topic queue which contains
359+
them. Reasonable limits are 100 ads per queue and 50000 ads across all queues. Since ENRs
360+
are at most 300 bytes in size, these limits ensure that a full topic table consumes
361+
approximately 15MB of memory.
362+
363+
Any node may appear at most once in any topic queue, that is, registration of a node which
364+
is already registered for a given topic fails. Implementations may impose other
365+
restrictions on the table, such as restrictions on the number of IP-addresses in a certain
366+
range or number of occurrences of the same node across queues.
367+
368+
### Tickets
369+
370+
Ads should remain in the queue for a constant amount of time, the `target-ad-lifetime`. To
371+
maintain this guarantee, new registrations are throttled and registrants must wait for a
372+
certain amount of time before they are admitted. When a node attempts to place an ad, it
373+
receives a 'ticket' which tells them how long they must wait before they will be accepted.
374+
It is up to the registrant node to keep the ticket and present it to the advertisement
375+
medium when the waiting time has elapsed.
376+
377+
The waiting time constant is:
378+
379+
target-ad-lifetime = 15min
380+
381+
The assigned waiting time for any registration attempt is determined according to the
382+
following rules:
383+
384+
- When the table is full, the waiting time is assigned based on the lifetime of the oldest
385+
ad across the whole table, i.e. the registrant must wait for a table slot to become
386+
available.
387+
- When the topic queue is full, the waiting time depends on the lifetime of the oldest ad
388+
in the queue. The assigned time is `target-ad-lifetime - oldest-ad-lifetime` in this
389+
case.
390+
- Otherwise the ad may be placed immediately.
391+
392+
Tickets are opaque objects storing arbitrary information determined by the issuing node.
393+
While details of encoding and ticket validation are up to the implementation, tickets must
394+
contain enough information to verify that:
395+
396+
- The node attempting to use the ticket is the node which requested it.
397+
- The ticket is valid for a single topic only.
398+
- The ticket can only be used within the registration window.
399+
- The ticket can't be used more than once.
400+
401+
Implementations may choose to include arbitrary other information in the ticket, such as
402+
the cumulative wait time spent by the advertiser. A practical way to handle tickets is to
403+
encrypt and authenticate them with a dedicated secret key:
404+
405+
ticket = aesgcm_encrypt(ticket-key, ticket-nonce, ticket-pt, '')
406+
ticket-pt = [src-node-id, src-ip, topic, req-time, wait-time, cum-wait-time]
407+
src-node-id = node ID that requested the ticket
408+
src-ip = IP address that requested the ticket
409+
topic = the topic that ticket is valid for
410+
req-time = absolute time of REGTOPIC request
411+
wait-time = waiting time assigned when ticket was created
412+
cum-wait = cumulative waiting time of this node
413+
414+
### Registration Window
415+
416+
The image below depicts a single ticket's validity over time. When the ticket is issued,
417+
the node keeping it must wait until the registration window opens. The length of the
418+
registration window is 10 seconds. The ticket becomes invalid after the registration
419+
window has passed.
420+
421+
![ticket validity over time](./img/ticket-validity.png)
422+
423+
Since all ticket waiting times are assigned to expire when a slot in the queue opens, the
424+
advertisement medium may receive multiple valid tickets during the registration window and
425+
must choose one of them to be admitted in the topic queue. The winning node is notified
426+
using a [REGCONFIRMATION] response.
427+
428+
Picking the winner can be achieved by keeping track of a single 'next ticket' per queue
429+
during the registration window. Whenever a new ticket is submitted, first determine its
430+
validity and compare it against the current 'next ticket' to determine which of the two is
431+
better according to an implementation-defined metric such as the cumulative wait time
432+
stored in the ticket.
433+
434+
### Advertisement Protocol
435+
436+
This section explains how the topic-related protocol messages are used to place an ad.
437+
438+
Let us assume that node `A` provides topic `T`. It selects node `C` as advertisement
439+
medium and wants to register an ad, so that when node `B` (who is searching for topic `T`)
440+
asks `C`, `C` can return the registration entry of `A` to `B`.
441+
442+
Node `A` first attempts to register without a ticket by sending [REGTOPIC] to `C`.
443+
444+
A -> C REGTOPIC [T, ""]
445+
446+
`C` replies with a ticket and waiting time.
447+
448+
A <- C TICKET [ticket, wait-time]
449+
450+
Node `A` now waits for the duration of the waiting time. When the wait is over, `A` sends
451+
another registration request including the ticket. `C` does not need to remember its
452+
issued tickets since the ticket is authenticated and contains enough information for `C`
453+
to determine its validity.
454+
455+
A -> C REGTOPIC [T, ticket]
456+
457+
Node `C` replies with another ticket. Node `A` must keep this ticket in place of the
458+
earlier one, and must also be prepared to handle a confirmation call in case registration
459+
was successful.
460+
461+
A <- C TICKET [ticket, wait-time]
462+
463+
Node `C` waits for the registration window to end on the queue and selects `A` as the node
464+
which is registered. Node `C` places `A` into the topic queue for `T` and sends a
465+
[REGCONFIRMATION] response.
466+
467+
A <- C REGCONFIRMATION [T]
468+
469+
### Ad Placement And Topic Radius
470+
471+
Since every node may act as an advertisement medium for any topic, advertisers and nodes
472+
looking for ads must agree on a scheme by which ads for a topic are distributed. When the
473+
number of nodes advertising a topic is at least a certain percentage of the whole
474+
discovery network (rough estimate: at least 1%), ads may simply be placed on random nodes
475+
because searching for the topic on randomly selected nodes will locate the ads quickly enough.
476+
477+
However, topic search should be fast even when the number of advertisers for a topic is
478+
much smaller than the number of all live nodes. Advertisers and searchers must agree on a
479+
subset of nodes to serve as advertisement media for the topic. This subset is simply a
480+
region of the node ID address space, consisting of nodes whose Kademlia address is within a
481+
certain distance to the topic hash `sha256(T)`. This distance is called the 'topic
482+
radius'.
483+
484+
Example: for a topic `f3b2529e...` with a radius of 2^240, the subset covers all nodes
485+
whose IDs have prefix `f3b2...`. A radius of 2^256 means the entire network, in which case
486+
advertisements are distributed uniformly among all nodes. The diagram below depicts a
487+
region of the address space with topic hash `t` in the middle and several nodes close to
488+
`t` surrounding it. Dots above the nodes represent entries in the node's queue for the
489+
topic.
490+
491+
![diagram explaining the topic radius concept](./img/topic-radius-diagram.png)
492+
493+
To place their ads, participants simply perform a random walk within the currently
494+
estimated radius and run the advertisement protocol by collecting tickets from all nodes
495+
encountered during the walk and using them when their waiting time is over.
496+
497+
### Topic Radius Estimation
498+
499+
Advertisers must estimate the topic radius continuously in order to place their ads on
500+
nodes where they will be found. The radius mustn't fall below a certain size because
501+
restricting registration to too few nodes leaves the topic vulnerable to censorship and
502+
leads to long waiting times. If the radius were too large, searching nodes would take too
503+
long to find the ads.
504+
505+
Estimating the radius uses the waiting time as an indicator of how many other nodes are
506+
attempting to place ads in a certain region. This is achieved by keeping track of the
507+
average time to successful registration within segments of the address space surrounding
508+
the topic hash. Advertisers initially assume the radius is 2^256, i.e. the entire network.
509+
As tickets are collected, the advertiser samples the time it takes to place an ad in each
510+
segment and adjusts the radius such that registration at the chosen distance takes
511+
approximately `target-ad-lifetime / 2` to complete.
512+
513+
## Topic Search
514+
515+
Finding nodes that provide a certain topic is a continuous process which reads the content
516+
of topic queues inside the approximated topic radius. This is a much simpler process than
517+
topic advertisement because collecting tickets and waiting on them is not required.
518+
519+
To find nodes for a topic, the searcher generates random node IDs inside the estimated
520+
topic radius and performs Kademlia lookups for these IDs. All (intermediate) nodes
521+
encountered during lookup are asked for topic queue entries using the [TOPICQUERY] packet.
522+
523+
Radius estimation for topic search is similar to the estimation procedure for
524+
advertisement, but samples the average number of results from TOPICQUERY instead of
525+
average time to registration. The radius estimation value can be shared with the
526+
registration algorithm if the same topic is being registered and searched for.
527+
528+
## Hole punch asymmetric NATs
529+
530+
### Message flow
531+
532+
The protocol introduces the notification packet kind. There are 4 total message
533+
containers, these are abbreviated in the sequence diagram below as follows:
534+
- m - [message packet]
535+
- whoareyou - [WHOAREYOU packet]
536+
- hm - [handshake message packet]
537+
- n - [notification packet]
538+
539+
```mermaid
540+
sequenceDiagram
541+
participant Alice
542+
participant Relay
543+
participant Bob
544+
545+
Relay-->>Alice: m(NODES[Bob's ENR])
546+
Alice->>Bob: m(nonce,FINDNODE)
547+
Note left of Alice:Hole punched in Alice's NAT for Bob
548+
Note left of Alice:FINDNODE timed out
549+
Alice->>Relay: n(RELAYINIT[nonce])
550+
Relay->>Bob:n(RELAYMSG[nonce])
551+
Bob-->>Alice: whoareyou(nonce)
552+
Note right of Bob: Hole punched in Bob's NAT for Alice
553+
Alice-->>Bob: hm(FINDNODE)
554+
```
555+
Bob is behind a NAT. Bob is in Relay's kbuckets, they have a session together and Bob
556+
has sent a packet to Relay in the last ~20 seconds[^1].
557+
558+
As part of a periodic recursive query to fill its kbuckets, Alice sends a [FINDNODE]
559+
request to Bob, who's ENR it received from Relay. By making an outgoing request to
560+
Bob, if Alice is behind a NAT, Alice's NAT adds the filtering rule
561+
`(Alice's-LAN-ip, Alice's-LAN-port, Bob's-WAN-ip, Bob's-WAN-port, entry-lifetime)` to
562+
it's UDP session table[^2][^3]. This means a hole now is punched for Bob in Alice's NAT
563+
for the duration of `entry-lifetime`. The request to Bob times out as Bob is behind a NAT.
564+
565+
Alice initiates an attempt to punch a hole in Bob's NAT via Relay. Alice resets the request
566+
time out on the timed out [FINDNODE] message and wraps the message's nonce in a [RELAYINIT]
567+
notification and sends it to Relay. The notification also contains its ENR and Bob's node
568+
id.
569+
570+
Relay disassembles the [RELAYINIT] notification and uses the `tgt-id` to look up Bob's
571+
ENR in its kbuckets. With high probability, Relay will find Bob's ENR in its kbuckets
572+
as ~1 second ago, Relay assembled a [NODES] response for Alice containing Bob's ENR (see
573+
[UDP Communication] for recommended time out duration). Relay assembles a [RELAYMSG]
574+
notification with Alice's message nonce and ENR, then sends it to the address in Bob's
575+
ENR.
576+
577+
Bob disassembles the [RELAYMSG] and uses the `nonce` to assemble a [WHOAREYOU packet],
578+
then sends it to Alice using the address in the `inr-enr`. Bob's NAT adds the filtering
579+
rule `(Bob's-LAN-ip, Bob's-LAN-port, Alice's-WAN-ip, Alice's-WAN-port, entry-lifetime)` to
580+
it's UDP session table[^2][^3]. A hole is punched in Bob's NAT for Alice for the duration
581+
of `entry-lifetime`.
582+
583+
From here on it's business as usual. See [Sessions].
584+
585+
### Redundancy of enrs in NODES responses and connectivity status assumptions about Relay and Bob
337586

338587
[EIP-778]: ../enr.md
339588
[identity scheme]: ../enr.md#record-structure
589+
[message packet]: ./discv5-wire.md#ordinary-message-packet-flag--0
340590
[handshake message packet]: ./discv5-wire.md#handshake-message-packet-flag--2
341591
[WHOAREYOU packet]: ./discv5-wire.md#whoareyou-packet-flag--1
592+
[notification packet]: ./discv5-wire.md#notification-packet-flag--3
342593
[PING]: ./discv5-wire.md#ping-request-0x01
343594
[PONG]: ./discv5-wire.md#pong-response-0x02
344595
[FINDNODE]: ./discv5-wire.md#findnode-request-0x03
596+
[NODES]: ./discv5-wire.md#nodes-response-0x04
597+
[REGTOPIC]: ./discv5-wire.md#regtopic-request-0x07
598+
[REGCONFIRMATION]: ./discv5-wire.md#regconfirmation-response-0x09
599+
[TOPICQUERY]: ./discv5-wire.md#topicquery-request-0x0a
600+
[RELAYINIT]: ./discv5-wire.md#relayinit-0x01
601+
[RELAYMSG]: ./discv5-wire.md#relaymsg-0x02
602+
603+
[UDP communication]: ./discv5-wire.md#udp-communication
604+
[Sessions]: ./discv5-theory.md#sessions
605+
606+
[^1]: https://pdos.csail.mit.edu/papers/p2pnat.pdf
607+
[^2]: https://datatracker.ietf.org/doc/html/rfc4787
608+
[^3]: https://www.ietf.org/rfc/rfc6146.txt

0 commit comments

Comments
 (0)