|
1 | 1 | # Node Discovery Protocol v5 - Theory
|
2 | 2 |
|
3 |
| -**Protocol version v5.2** |
| 3 | +**Protocol version v5.1** |
4 | 4 |
|
5 | 5 | This document explains the algorithms and data structures used by the protocol.
|
6 | 6 |
|
@@ -191,13 +191,13 @@ pending when WHOAREYOU is received, as in the following example:
|
191 | 191 |
|
192 | 192 | A -> B FINDNODE
|
193 | 193 | A -> B PING
|
194 |
| - A -> B TALKREQ |
| 194 | + A -> B TOPICQUERY |
195 | 195 | A <- B WHOAREYOU (nonce references PING)
|
196 | 196 |
|
197 | 197 | When this happens, all buffered requests can be considered invalid (the remote end cannot
|
198 | 198 | decrypt them) and the packet referenced by the WHOAREYOU `nonce` (in this example: PING)
|
199 | 199 | must be re-sent as a handshake. When the response to the re-sent is received, the new
|
200 |
| -session is established and other pending requests (example: FINDNODE, TALKREQ) may be |
| 200 | +session is established and other pending requests (example: FINDNODE, TOPICQUERY) may be |
201 | 201 | re-sent.
|
202 | 202 |
|
203 | 203 | Note that WHOAREYOU is only ever valid as a response to a previously sent request. If
|
@@ -334,11 +334,275 @@ the distance to retrieve more nodes from adjacent k-buckets on `B`:
|
334 | 334 | Node `A` now sorts all received nodes by distance to the lookup target and proceeds by
|
335 | 335 | repeating the lookup procedure on another, closer node.
|
336 | 336 |
|
| 337 | +## Topic Advertisement |
| 338 | + |
| 339 | +The topic advertisement subsystem indexes participants by their provided services. A |
| 340 | +node's provided services are identified by arbitrary strings called 'topics'. A node |
| 341 | +providing a certain service is said to 'place an ad' for itself when it makes itself |
| 342 | +discoverable under that topic. Depending on the needs of the application, a node can |
| 343 | +advertise multiple topics or no topics at all. Every node participating in the discovery |
| 344 | +protocol acts as an advertisement medium, meaning that it accepts topic ads from other |
| 345 | +nodes and later returns them to nodes searching for the same topic. |
| 346 | + |
| 347 | +### Topic Table |
| 348 | + |
| 349 | +Nodes store ads for any number of topics and a limited number of ads for each topic. The |
| 350 | +data structure holding advertisements is called the 'topic table'. The list of ads for a |
| 351 | +particular topic is called the 'topic queue' because it functions like a FIFO queue of |
| 352 | +limited length. The image below depicts a topic table containing three queues. The queue |
| 353 | +for topic `T₁` is at capacity. |
| 354 | + |
| 355 | + |
| 356 | + |
| 357 | +The queue size limit is implementation-defined. Implementations should place a global |
| 358 | +limit on the number of ads in the topic table regardless of the topic queue which contains |
| 359 | +them. Reasonable limits are 100 ads per queue and 50000 ads across all queues. Since ENRs |
| 360 | +are at most 300 bytes in size, these limits ensure that a full topic table consumes |
| 361 | +approximately 15MB of memory. |
| 362 | + |
| 363 | +Any node may appear at most once in any topic queue, that is, registration of a node which |
| 364 | +is already registered for a given topic fails. Implementations may impose other |
| 365 | +restrictions on the table, such as restrictions on the number of IP-addresses in a certain |
| 366 | +range or number of occurrences of the same node across queues. |
| 367 | + |
| 368 | +### Tickets |
| 369 | + |
| 370 | +Ads should remain in the queue for a constant amount of time, the `target-ad-lifetime`. To |
| 371 | +maintain this guarantee, new registrations are throttled and registrants must wait for a |
| 372 | +certain amount of time before they are admitted. When a node attempts to place an ad, it |
| 373 | +receives a 'ticket' which tells them how long they must wait before they will be accepted. |
| 374 | +It is up to the registrant node to keep the ticket and present it to the advertisement |
| 375 | +medium when the waiting time has elapsed. |
| 376 | + |
| 377 | +The waiting time constant is: |
| 378 | + |
| 379 | + target-ad-lifetime = 15min |
| 380 | + |
| 381 | +The assigned waiting time for any registration attempt is determined according to the |
| 382 | +following rules: |
| 383 | + |
| 384 | +- When the table is full, the waiting time is assigned based on the lifetime of the oldest |
| 385 | + ad across the whole table, i.e. the registrant must wait for a table slot to become |
| 386 | + available. |
| 387 | +- When the topic queue is full, the waiting time depends on the lifetime of the oldest ad |
| 388 | + in the queue. The assigned time is `target-ad-lifetime - oldest-ad-lifetime` in this |
| 389 | + case. |
| 390 | +- Otherwise the ad may be placed immediately. |
| 391 | + |
| 392 | +Tickets are opaque objects storing arbitrary information determined by the issuing node. |
| 393 | +While details of encoding and ticket validation are up to the implementation, tickets must |
| 394 | +contain enough information to verify that: |
| 395 | + |
| 396 | +- The node attempting to use the ticket is the node which requested it. |
| 397 | +- The ticket is valid for a single topic only. |
| 398 | +- The ticket can only be used within the registration window. |
| 399 | +- The ticket can't be used more than once. |
| 400 | + |
| 401 | +Implementations may choose to include arbitrary other information in the ticket, such as |
| 402 | +the cumulative wait time spent by the advertiser. A practical way to handle tickets is to |
| 403 | +encrypt and authenticate them with a dedicated secret key: |
| 404 | + |
| 405 | + ticket = aesgcm_encrypt(ticket-key, ticket-nonce, ticket-pt, '') |
| 406 | + ticket-pt = [src-node-id, src-ip, topic, req-time, wait-time, cum-wait-time] |
| 407 | + src-node-id = node ID that requested the ticket |
| 408 | + src-ip = IP address that requested the ticket |
| 409 | + topic = the topic that ticket is valid for |
| 410 | + req-time = absolute time of REGTOPIC request |
| 411 | + wait-time = waiting time assigned when ticket was created |
| 412 | + cum-wait = cumulative waiting time of this node |
| 413 | + |
| 414 | +### Registration Window |
| 415 | + |
| 416 | +The image below depicts a single ticket's validity over time. When the ticket is issued, |
| 417 | +the node keeping it must wait until the registration window opens. The length of the |
| 418 | +registration window is 10 seconds. The ticket becomes invalid after the registration |
| 419 | +window has passed. |
| 420 | + |
| 421 | + |
| 422 | + |
| 423 | +Since all ticket waiting times are assigned to expire when a slot in the queue opens, the |
| 424 | +advertisement medium may receive multiple valid tickets during the registration window and |
| 425 | +must choose one of them to be admitted in the topic queue. The winning node is notified |
| 426 | +using a [REGCONFIRMATION] response. |
| 427 | + |
| 428 | +Picking the winner can be achieved by keeping track of a single 'next ticket' per queue |
| 429 | +during the registration window. Whenever a new ticket is submitted, first determine its |
| 430 | +validity and compare it against the current 'next ticket' to determine which of the two is |
| 431 | +better according to an implementation-defined metric such as the cumulative wait time |
| 432 | +stored in the ticket. |
| 433 | + |
| 434 | +### Advertisement Protocol |
| 435 | + |
| 436 | +This section explains how the topic-related protocol messages are used to place an ad. |
| 437 | + |
| 438 | +Let us assume that node `A` provides topic `T`. It selects node `C` as advertisement |
| 439 | +medium and wants to register an ad, so that when node `B` (who is searching for topic `T`) |
| 440 | +asks `C`, `C` can return the registration entry of `A` to `B`. |
| 441 | + |
| 442 | +Node `A` first attempts to register without a ticket by sending [REGTOPIC] to `C`. |
| 443 | + |
| 444 | + A -> C REGTOPIC [T, ""] |
| 445 | + |
| 446 | +`C` replies with a ticket and waiting time. |
| 447 | + |
| 448 | + A <- C TICKET [ticket, wait-time] |
| 449 | + |
| 450 | +Node `A` now waits for the duration of the waiting time. When the wait is over, `A` sends |
| 451 | +another registration request including the ticket. `C` does not need to remember its |
| 452 | +issued tickets since the ticket is authenticated and contains enough information for `C` |
| 453 | +to determine its validity. |
| 454 | + |
| 455 | + A -> C REGTOPIC [T, ticket] |
| 456 | + |
| 457 | +Node `C` replies with another ticket. Node `A` must keep this ticket in place of the |
| 458 | +earlier one, and must also be prepared to handle a confirmation call in case registration |
| 459 | +was successful. |
| 460 | + |
| 461 | + A <- C TICKET [ticket, wait-time] |
| 462 | + |
| 463 | +Node `C` waits for the registration window to end on the queue and selects `A` as the node |
| 464 | +which is registered. Node `C` places `A` into the topic queue for `T` and sends a |
| 465 | +[REGCONFIRMATION] response. |
| 466 | + |
| 467 | + A <- C REGCONFIRMATION [T] |
| 468 | + |
| 469 | +### Ad Placement And Topic Radius |
| 470 | + |
| 471 | +Since every node may act as an advertisement medium for any topic, advertisers and nodes |
| 472 | +looking for ads must agree on a scheme by which ads for a topic are distributed. When the |
| 473 | +number of nodes advertising a topic is at least a certain percentage of the whole |
| 474 | +discovery network (rough estimate: at least 1%), ads may simply be placed on random nodes |
| 475 | +because searching for the topic on randomly selected nodes will locate the ads quickly enough. |
| 476 | + |
| 477 | +However, topic search should be fast even when the number of advertisers for a topic is |
| 478 | +much smaller than the number of all live nodes. Advertisers and searchers must agree on a |
| 479 | +subset of nodes to serve as advertisement media for the topic. This subset is simply a |
| 480 | +region of the node ID address space, consisting of nodes whose Kademlia address is within a |
| 481 | +certain distance to the topic hash `sha256(T)`. This distance is called the 'topic |
| 482 | +radius'. |
| 483 | + |
| 484 | +Example: for a topic `f3b2529e...` with a radius of 2^240, the subset covers all nodes |
| 485 | +whose IDs have prefix `f3b2...`. A radius of 2^256 means the entire network, in which case |
| 486 | +advertisements are distributed uniformly among all nodes. The diagram below depicts a |
| 487 | +region of the address space with topic hash `t` in the middle and several nodes close to |
| 488 | +`t` surrounding it. Dots above the nodes represent entries in the node's queue for the |
| 489 | +topic. |
| 490 | + |
| 491 | + |
| 492 | + |
| 493 | +To place their ads, participants simply perform a random walk within the currently |
| 494 | +estimated radius and run the advertisement protocol by collecting tickets from all nodes |
| 495 | +encountered during the walk and using them when their waiting time is over. |
| 496 | + |
| 497 | +### Topic Radius Estimation |
| 498 | + |
| 499 | +Advertisers must estimate the topic radius continuously in order to place their ads on |
| 500 | +nodes where they will be found. The radius mustn't fall below a certain size because |
| 501 | +restricting registration to too few nodes leaves the topic vulnerable to censorship and |
| 502 | +leads to long waiting times. If the radius were too large, searching nodes would take too |
| 503 | +long to find the ads. |
| 504 | + |
| 505 | +Estimating the radius uses the waiting time as an indicator of how many other nodes are |
| 506 | +attempting to place ads in a certain region. This is achieved by keeping track of the |
| 507 | +average time to successful registration within segments of the address space surrounding |
| 508 | +the topic hash. Advertisers initially assume the radius is 2^256, i.e. the entire network. |
| 509 | +As tickets are collected, the advertiser samples the time it takes to place an ad in each |
| 510 | +segment and adjusts the radius such that registration at the chosen distance takes |
| 511 | +approximately `target-ad-lifetime / 2` to complete. |
| 512 | + |
| 513 | +## Topic Search |
| 514 | + |
| 515 | +Finding nodes that provide a certain topic is a continuous process which reads the content |
| 516 | +of topic queues inside the approximated topic radius. This is a much simpler process than |
| 517 | +topic advertisement because collecting tickets and waiting on them is not required. |
| 518 | + |
| 519 | +To find nodes for a topic, the searcher generates random node IDs inside the estimated |
| 520 | +topic radius and performs Kademlia lookups for these IDs. All (intermediate) nodes |
| 521 | +encountered during lookup are asked for topic queue entries using the [TOPICQUERY] packet. |
| 522 | + |
| 523 | +Radius estimation for topic search is similar to the estimation procedure for |
| 524 | +advertisement, but samples the average number of results from TOPICQUERY instead of |
| 525 | +average time to registration. The radius estimation value can be shared with the |
| 526 | +registration algorithm if the same topic is being registered and searched for. |
| 527 | + |
| 528 | +## Hole punch asymmetric NATs |
| 529 | + |
| 530 | +### Message flow |
| 531 | + |
| 532 | +The protocol introduces the notification packet kind. There are 4 total message |
| 533 | +containers, these are abbreviated in the sequence diagram below as follows: |
| 534 | +- m - [message packet] |
| 535 | +- whoareyou - [WHOAREYOU packet] |
| 536 | +- hm - [handshake message packet] |
| 537 | +- n - [notification packet] |
| 538 | + |
| 539 | +```mermaid |
| 540 | + sequenceDiagram |
| 541 | + participant Alice |
| 542 | + participant Relay |
| 543 | + participant Bob |
| 544 | +
|
| 545 | + Relay-->>Alice: m(NODES[Bob's ENR]) |
| 546 | + Alice->>Bob: m(nonce,FINDNODE) |
| 547 | + Note left of Alice:Hole punched in Alice's NAT for Bob |
| 548 | + Note left of Alice:FINDNODE timed out |
| 549 | + Alice->>Relay: n(RELAYINIT[nonce]) |
| 550 | + Relay->>Bob:n(RELAYMSG[nonce]) |
| 551 | + Bob-->>Alice: whoareyou(nonce) |
| 552 | + Note right of Bob: Hole punched in Bob's NAT for Alice |
| 553 | + Alice-->>Bob: hm(FINDNODE) |
| 554 | +``` |
| 555 | +Bob is behind a NAT. Bob is in Relay's kbuckets, they have a session together and Bob |
| 556 | +has sent a packet to Relay in the last ~20 seconds[^1]. |
| 557 | + |
| 558 | +As part of a periodic recursive query to fill its kbuckets, Alice sends a [FINDNODE] |
| 559 | +request to Bob, who's ENR it received from Relay. By making an outgoing request to |
| 560 | +Bob, if Alice is behind a NAT, Alice's NAT adds the filtering rule |
| 561 | +`(Alice's-LAN-ip, Alice's-LAN-port, Bob's-WAN-ip, Bob's-WAN-port, entry-lifetime)` to |
| 562 | +it's UDP session table[^2][^3]. This means a hole now is punched for Bob in Alice's NAT |
| 563 | +for the duration of `entry-lifetime`. The request to Bob times out as Bob is behind a NAT. |
| 564 | + |
| 565 | +Alice initiates an attempt to punch a hole in Bob's NAT via Relay. Alice resets the request |
| 566 | +time out on the timed out [FINDNODE] message and wraps the message's nonce in a [RELAYINIT] |
| 567 | +notification and sends it to Relay. The notification also contains its ENR and Bob's node |
| 568 | +id. |
| 569 | + |
| 570 | +Relay disassembles the [RELAYINIT] notification and uses the `tgt-id` to look up Bob's |
| 571 | +ENR in its kbuckets. With high probability, Relay will find Bob's ENR in its kbuckets |
| 572 | +as ~1 second ago, Relay assembled a [NODES] response for Alice containing Bob's ENR (see |
| 573 | +[UDP Communication] for recommended time out duration). Relay assembles a [RELAYMSG] |
| 574 | +notification with Alice's message nonce and ENR, then sends it to the address in Bob's |
| 575 | +ENR. |
| 576 | + |
| 577 | +Bob disassembles the [RELAYMSG] and uses the `nonce` to assemble a [WHOAREYOU packet], |
| 578 | +then sends it to Alice using the address in the `inr-enr`. Bob's NAT adds the filtering |
| 579 | +rule `(Bob's-LAN-ip, Bob's-LAN-port, Alice's-WAN-ip, Alice's-WAN-port, entry-lifetime)` to |
| 580 | +it's UDP session table[^2][^3]. A hole is punched in Bob's NAT for Alice for the duration |
| 581 | +of `entry-lifetime`. |
| 582 | + |
| 583 | +From here on it's business as usual. See [Sessions]. |
| 584 | + |
| 585 | +### Redundancy of enrs in NODES responses and connectivity status assumptions about Relay and Bob |
337 | 586 |
|
338 | 587 | [EIP-778]: ../enr.md
|
339 | 588 | [identity scheme]: ../enr.md#record-structure
|
| 589 | +[message packet]: ./discv5-wire.md#ordinary-message-packet-flag--0 |
340 | 590 | [handshake message packet]: ./discv5-wire.md#handshake-message-packet-flag--2
|
341 | 591 | [WHOAREYOU packet]: ./discv5-wire.md#whoareyou-packet-flag--1
|
| 592 | +[notification packet]: ./discv5-wire.md#notification-packet-flag--3 |
342 | 593 | [PING]: ./discv5-wire.md#ping-request-0x01
|
343 | 594 | [PONG]: ./discv5-wire.md#pong-response-0x02
|
344 | 595 | [FINDNODE]: ./discv5-wire.md#findnode-request-0x03
|
| 596 | +[NODES]: ./discv5-wire.md#nodes-response-0x04 |
| 597 | +[REGTOPIC]: ./discv5-wire.md#regtopic-request-0x07 |
| 598 | +[REGCONFIRMATION]: ./discv5-wire.md#regconfirmation-response-0x09 |
| 599 | +[TOPICQUERY]: ./discv5-wire.md#topicquery-request-0x0a |
| 600 | +[RELAYINIT]: ./discv5-wire.md#relayinit-0x01 |
| 601 | +[RELAYMSG]: ./discv5-wire.md#relaymsg-0x02 |
| 602 | + |
| 603 | +[UDP communication]: ./discv5-wire.md#udp-communication |
| 604 | +[Sessions]: ./discv5-theory.md#sessions |
| 605 | + |
| 606 | +[^1]: https://pdos.csail.mit.edu/papers/p2pnat.pdf |
| 607 | +[^2]: https://datatracker.ietf.org/doc/html/rfc4787 |
| 608 | +[^3]: https://www.ietf.org/rfc/rfc6146.txt |
0 commit comments