diff --git a/src/ipips/ipip-0501.md b/src/ipips/ipip-0501.md new file mode 100644 index 00000000..e617c0bc --- /dev/null +++ b/src/ipips/ipip-0501.md @@ -0,0 +1,299 @@ +--- +title: "IPIP-0501: Amino DHT HTTP Provider Records" +date: 2025-04-10 +ipip: proposal +editors: + - name: Guillaume Michel + github: guillaumemichel + url: https://guillaume.michel.id/ + affiliation: + name: Shipyard + url: https://ipshipyard.com + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + affiliation: + name: Shipyard + url: https://ipshipyard.com +relatedIssues: +order: 501 +tags: ['ipips'] +--- + +## Summary + +This IPIP introduces a secure mechanism for advertising `/tls/http` +multiaddresses in the Amino DHT. HTTP servers are now required to host a text +file at the well-known path `.well-known/libp2p/amino/providers` listing the +libp2p peer IDs of authorized providers. This verification step enables DHT +servers to ensure that only approved providers can advertise content, +mitigating potential DDoS attacks and preventing malicious actors from falsely +asserting that an HTTP server hosts content, all while leaving existing libp2p +records unchanged. + +## Motivation + +Allowing content providers to advertise `/tls/http` multiaddresses within the +Amino DHT is desirable because it broadens the network's interoperability and +accessibility. With the introduction of HTTP retrievals, providers will be able +to serve content from static HTTP hosting providers, such as S3 buckets, and +they should be able to advertise these addresses to the Amino DHT. + +The current protocol already allows providers to choose which multiaddresses to +associate with their records, and DHT servers serve all the addresses along +with the provider record, even if they don’t understand them. Example: when +`/webtransport` was rolled out, DHT servers that did not speak WebTransport +still returned `/webtransport` addresses, despite not being able to use them. +Hence advertising `/tls/http` multiaddresses to the Amino DHT is already +possible. + +However, since `/tls/http` records are expected to be widely adopted by browser +users, it is essential to mitigate potential Distributed Denial-of-Service +(DDoS) attacks on HTTP servers. If any provider can freely associate arbitrary +`/tls/http` multiaddresses with a provider record, a malicious actor could +trigger significant HTTP traffic to a server they don’t control. We want to +restrict `/tls/http` multiaddresses advertisement to hosts controlled by the +provider. This verification would be performed by the DHT servers before +associating the `tls/http` multiaddresses with the provider record. +Additionally, this check would eliminate addresses pointing to misconfigured +HTTP providers. + +This measure prevents HTTP clients (e.g., browser nodes) from being exploited +in DDoS attacks through bogus DHT records. It is essential for integrating IPFS +into browsers, as browser development teams prioritize robust DDoS prevention. + +## Detailed design + +Providers advertising content hosted on an HTTP server MUST host a text file at +the [well-known location](https://www.rfc-editor.org/rfc/rfc8615) +`.well-known/libp2p/amino/providers`. This file lists the libp2p peer IDs that are +authorized to advertise that HTTP server’s content to the Amino DHT. Each peer +ID MUST follow [string representation from Libp2p PeerID +specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) +(base58btc multihash or CID with libp2p codec), with one peer ID per line: + +``` +12D3KooBase58MH +k51KooBase36CID +``` + +By listing these peer IDs, the HTTP server grants permission for the +corresponding providers to advertise that the server hosts content identified +by any CID. + +When a DHT Server receives an `ADD_PROVIDER` RPC that includes `/tls/http` +multiaddresses, it MUST verify that the provider’s peer ID is listed in the +file located at `.well-known/libp2p/amino/providers` on the advertised HTTP +server, for all `/tls/http` addresses. If the peer ID is not found, the server +MUST NOT associate that `/tls/http` address with the provider record. + +DHT Servers SHOULD cache the resolved mapping of each `/tls/http` multiaddress +to its peer IDs for the duration of the `ReprovideInterval` to minimize +repetitive HTTP GET requests. Additionally, for addresses that fail +verification, a negative cache entry SHOULD be maintained for `15` minutes to +reduce unnecessary load and mitigate potential abuse. + +## Design rationale + +* **Lightweight Verification:** Each HTTP server only answers approximately one +GET request per DHT Server per `ReprovideInterval`, regardless of the number +of CIDs being advertised. +* **Revocation Considerations:** If a provider revokes a peer ID, the +previously published records will persist until the next reprovide cycle. Thus, +a cache duration equal to the `ReprovideInterval` is appropriate. +* **Negative Caching:** A 15-minute negative cache prevents malicious actors +from triggering repeated GET requests, as the cost of generating a DHT provide +request is higher than that of performing an HTTP GET, mitigating +amplification attacks. + +### User benefit + +* **HTTP Addresses in DHT Provider Records:** Official support for `/tls/http` +addresses in the Amino DHT. +* **DHT Delegated Provides (HTTP only):** HTTP Servers can delegate their DHT +provide to any libp2p node identified by its peer id. They can later revoke +this permission. +* **DDoS Attack Mitigation:** The Amino DHT cannot be used to start a DDoS +attack of HTTP clients (e.g browser nodes) upon an arbitrary HTTP server. + +### Cost estimation + +For simplicity, we assume that the HTTP content provider is advertising enough +CIDs so that every online DHT server stores at least one associated provider +record. + +Given that there are currently around 10k DHT servers in the Amino DHT +([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#dht-availability-classified-overall-plot)), +the HTTP server is expected to receive roughly 10k GET requests every +`ReprovideInterval`, one from each DHT server. + +Around 300k libp2p clients interact with the Amino DHT on a daily basis +([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#ipfs-servers-vs-clients-plot)). +Therefore, if an attacker advertises a bogus provider record for a popular CID, +they only need about 3% of these clients to contact the HTTP server in order to +mount an attack that would be more resource-intensive than the countermeasure. +A client trying to fetch content from the targeted server sends one GET +request. + +This analysis only covers current libp2p-based nodes. As more users adopt IPFS +in browsers, the number of nodes that could potentially participate in a DDoS +attack will increase, as will the scale of such an attack. Furthermore, users +of the Delegated Routing HTTP API could also contribute to the attack, even if +they are not DHT clients. + +The cost of the proposed countermeasure seems reasonable compared to the +potential cost of a real DDoS attack. + +### Compatibility + +Nothing changes for existing DHT Servers running an older version. Up-to-date +DHT Servers will make an additional check before associating `/tls/http` +multiaddresses with provider records. Over time, the network will stop +propagating unauthorized HTTP endpoints. + +Providers advertising content with `/tls/http` multiaddresses to the Amino DHT +MUST comply with the described check. We are not aware of `/tls/http` +multiaddresses currently advertised to the Amino DHT, hence no change is +expected from current providers. + +The same verification mechanism could be used by other content routing systems, +such as IPNI. For more control, it is recommended that each content routing +system use a dedicated path, e.g `.well-known/libp2p/ipni/providers` +for IPNI. + +### Security + +In the current Amino DHT implementation, DHT servers do not verify the +multiaddresses included in a provider record when processing an `ADD_PROVIDER` +request. They only allow a node to announce itself as a provider. + +If a malicious libp2p node crafts a multiaddress that pairs its own valid peer +ID with the IP address of another actual libp2p node and advertises that node +as the provider for a particular CID, the client attempting to retrieve the +content will encounter a peer ID mismatch error during the libp2p security +handshake. This fail-fast mechanism prevents misuse in pure libp2p records. + +The challenge arises with HTTP clients because they do not use peer IDs when +fetching content from an HTTP server. As a result, an HTTP connection cannot +fail during the handshake, making it easier for a malicious actor to advertise +an arbitrary peer as the provider for a popular CID. Such misrepresentation +could negatively impact both the client and the HTTP server. + +To prevent this weakening of the system and to stop the DHT from being +exploited as a vector for DDoS attacks using HTTP clients, we introduce an +extra verification step. This step ensures that only authorized libp2p nodes +are allowed to advertise HTTP addresses. With this additional check, DHT HTTP +records will be more reliable and secure than standard libp2p-only records. + +A malicious node could still launch a DDoS attack on an HTTP server by +advertising a libp2p TCP multiaddress, such as `/ip4/A.B.C.D/tcp/443`, as the +provider. This deceptive advertisement might cause other libp2p nodes to +attempt a TCP connection to the HTTP server, with the connection only failing +later. The primary DDoS mitigation goal is to prevent HTTP-only clients from +being drawn into such attacks, since they use `/tls/http` addresses rather than +the unverified libp2p `/tcp/443` addresses. + +Another important consideration is maintaining a secure `CID -> peerid` +mapping. While nodes might still advertise content they do not serve, they must +not be allowed to falsely claim that another node provides a CID. This secure +mapping also supports the potential implementation of a caching layer that +verifies `peerid -> []maddrs` mappings, relying on the trustworthy DHT `CID -> +peerid` mapping. + +In summary, the extra verification for HTTP addresses does not stop nodes from +advertising content they do not possess; it only prevents them from targeting +other nodes by falsely claiming that those nodes provide content they do not +actually host. + +### Alternatives + +#### Do nothing: not verifying `/tls/http` addresses at all + +In its current state, the Amino DHT allows for `/tls/http` provider records. +However, it would be possible for malicious actors to use the DHT as vector of +DDoS attack where numerous HTTP-only clients target a specific HTTP server. + +See [Cost estimation](#cost-estimation) for the rationale why it is better to +do something about it. + +#### Reuse Peer ID Authentication over HTTP + +The [Peer ID Authentication over +HTTP](https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md) +mechanism could potentially be reused, but it presents several significant +drawbacks that render it less practical for HTTP-only IPFS providers. Notably, +it lacks a "server-only" authentication option. While mutual authentication +could be halted after the server responds with an HTTP 401 status and includes +its own PeerID in the HTTP header, this approach introduces notable challenges: + +* It increases complexity, requiring not just a standard HTTP GET request but +also the implementation of a custom Authorization header workflow. +* It restricts the HTTP server to representing only a single PeerID, preventing +the sharding of announcements across multiple PeerIDs and thus making +multi-user storage providers unfeasible. +* It constrains deployment options, requiring the HTTP server to run custom +software, which eliminates the possibility of using static-only hosting +solutions like an S3 bucket. + +#### Generic `.well-known/libp2p/peerid` file + +PeerIDs that the HTTP server has authorized to advertise content to the Amino +DHT could be listed in the generic `.well-known/libp2p/peerid` file. This file +may also be used to delegate content provision requests to other content +routing systems (for example, IPNI), or generally for other applications. + +However, since modifying the DHT protocol is a long and painful process, the +file used by Amino DHT servers for verification MUST remain stable. Any +alteration to the `.well-known/libp2p/peerid` format would require months or +even years for full adoption by DHT servers. In addition, if other applications +begin using this generic file, DHT servers may end up retrieving unnecessary +extra information. + +#### Flat `.well-known/libp2p/amino/providers/{peerid}` empty files + +An alternative approach is to host an empty file for each authorized provider +peer ID at `./well-known/libp2p/amino/providers/{peerid}`. This approach allows +for HTTP HEAD requests instead of GET requests, which is more efficient on the +wire. + +However, this method doesn't support the different [string representation +from the Libp2p PeerID +specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) +and would lead to false negatives if the DHT server looks for another +string representation than the one used on the HTTP server. + +#### Reuse `did:web` Method Specification + +The [did:web Method Specification](https://w3c-ccg.github.io/did-method-web/) +outlines a mechanism for listing one or more ED25519 keys. However, adopting it +presents several challenges: + +* PeerIDs are not simple key fingerprints; they are multihashes derived from a +protobuf structure. +* The method’s JSON manifest must adhere to a specific schema. +* This results in an overly complex JSON format, necessitating additional +processing and conversion, which introduces unnecessary complexity to the DHT +server implementation. + +#### Reuse `.well-known/libp2p/protocols` file + +[Existing libp2p HTTP +specification](https://github.com/libp2p/specs/tree/master/http#namespace) +states that application protocols can be discovered by the well-known resource +`.well-known/libp2p/protocols`. Adding “authorized_peers” field to this file +would allow DHT Servers to dispatch a single GET request to learn about both +PeerID and supported HTTP protocols. + +The downside of this approach is mixing responsibilities of unrelated specs and +use cases, however performance benefit may be worth it. + +## Out of Scope + +* Amino DHT Providing over HTTP +* Amino DHT lookups for HTTP-only Clients +* Amino DHT Delegated Provides for libp2p nodes +* HTTP Provider Records in IPNI + +## Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).