ipfs · guillaumemichel · Apr 10, 2025 · Apr 10, 2025 · Apr 11, 2025 · lidel
@@ -0,0 +1,300 @@
+---
+title: "IPIP-0501: Amino DHT HTTP Provider Records"
+date: 2025-04-10
+ipip: proposal
+editors:
+  - name: Guillaume Michel
+    github: guillaumemichel
+    url: https://guillaume.michel.id/
+    affiliation:
+        name: Shipyard
+        url: https://ipshipyard.com
+  - name: Marcin Rataj
+    github: lidel
+    url: https://lidel.org/
+    affiliation:
+        name: Shipyard
+        url: https://ipshipyard.com
+relatedIssues:
+order: 501
+tags: ['ipips']
+---
+
+## Summary
+
+This IPIP introduces a secure mechanism for advertising `/tls/http`
+multiaddresses in the Amino DHT. By requiring HTTP servers to host an empty
+file at the well-known path `.well-known/libp2p/amino/providers/{peerid}` for
+each authorized libp2p peer ID the DHT servers ensure that only providers
+safelisted by the HTTP server can advertise its content. This additional
+verification step mitigates potential DDoS attacks and prevents malicious
+actors from falsely claiming that HTTP server hosts content, while leaving
+existing libp2p records unaffected.
+
+## Motivation
+
+Allowing content providers to advertise `/tls/http` multiaddresses within the
+Amino DHT is desirable because it broadens the network's interoperability and
+accessibility. With the introduction of HTTP retrievals, providers will be able
+to serve content from static HTTP hosting providers, such as S3 buckets, and
+they should be able to advertise these addresses to the Amino DHT.
+
+The current protocol already allows providers to choose which multiaddresses to
+associate with their records, and DHT servers serve all the addresses along
+with the provider record, even if they don’t understand them. Example: when
+`/webtransport` was rolled out, DHT servers that did not speak WebTransport
+still returned `/webtransport` addresses, despite not being able to use them.
+Hence advertising `/tls/http` multiaddresses to the Amino DHT is already
+possible.
+
+However, since `/tls/http` records are expected to be widely adopted by browser
+users, it is essential to mitigate potential Distributed Denial-of-Service
+(DDoS) attacks on HTTP servers. If any provider can freely associate arbitrary
+`/tls/http` multiaddresses with a provider record, a malicious actor could
+trigger significant HTTP traffic to a server they don’t control. We want to
+restrict `/tls/http` multiaddresses advertisement to hosts controlled by the
+provider. This verification would be performed by the DHT servers before
+associating the `tls/http` multiaddresses with the provider record.
+Additionally, this check would eliminate addresses pointing to misconfigured
+HTTP providers.
+
+This measure prevents HTTP clients (e.g., browser nodes) from being exploited
+in DDoS attacks through bogus DHT records. It is essential for integrating IPFS
+into browsers, as browser development teams prioritize robust DDoS prevention.
+
+## Detailed design
+
+Providers advertising content hosted on an HTTP server MUST host an empty file
+for each libp2p peer ID authorized to advertise that HTTP server’s content to
+the Amino DHT at the [well-known
+location](https://www.rfc-editor.org/rfc/rfc8615)
+`.well-known/libp2p/amino/providers/{peerid}`. The existence of an empty file
+named of the the peer ID serves as the authorization marker. The filename peer
+ID MUST follow [string representation from Libp2p PeerID
+specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation)
+(base58btc multihash or CID with libp2p codec):
+
+```
+.well-known/libp2p/amino/providers/12D3KooBase58MH
+.well-known/libp2p/amino/providers/k51KooBase36CID
+```
+
+By hosting these individual empty files, the HTTP server grants permission for
+the corresponding providers to advertise that the server hosts content
+identified by any CID.
+
+When a DHT server receives an `ADD_PROVIDER` RPC from a given `peerID` that
+includes `/tls/http` multiaddresses, it MUST verify the existence of the file
+at `.well-known/libp2p/amino/provider/peerID` on the advertised HTTP server by
+issuing an HTTP HEAD request for each `/tls/http` address. If the HTTP HEAD
+request does not return a `200` response, the DHT server MUST NOT associate
+that `/tls/http` address with the provider record.
+
+DHT Servers SHOULD cache the resolved mapping of each `/tls/http` multiaddress
+to its peer IDs for the duration of the `ReprovideInterval` to minimize
+repetitive HTTP HEAD requests. Additionally, for addresses that fail
+verification, a negative cache entry SHOULD be maintained for 15 minutes to
+reduce unnecessary load and mitigate potential abuse.
+
+## Design rationale
+
+* **Lightweight Verification:** Each HTTP server only answers approximately one
+HEAD request per DHT Server per `ReprovideInterval`, regardless of the number
+of CIDs being advertised.
+* **Revocation Considerations:** If a provider revokes a peer ID, the
+previously published records will persist until the next reprovide cycle. Thus,
+a cache duration equal to the `ReprovideInterval` is appropriate.
+* **Negative Caching:** A 15-minute negative cache prevents malicious actors
+from triggering repeated HEAD requests, as the cost of generating a DHT provide
+request is higher than that of performing an HTTP HEAD, mitigating
+amplification attacks.
+
+### User benefit
+
+* **HTTP Addresses in DHT Provider Records:** Official support for `/tls/http`
+addresses in the Amino DHT.
+* **DHT Delegated Provides (HTTP only):** HTTP Servers can delegate their DHT
+provide to any libp2p node identified by its peer id. They can later revoke
+this permission.
+* **DDoS Attack Mitigation:** The Amino DHT cannot be used to start a DDoS
+attack of HTTP clients (e.g browser nodes) upon an arbitrary HTTP server.
+
+### Cost estimation
+
+For simplicity, we assume that the HTTP content provider is advertising enough
+CIDs so that every online DHT server stores at least one associated provider
+record.
+
+Given that there are currently around 10k DHT servers in the Amino DHT
+([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#dht-availability-classified-overall-plot)),
+the HTTP server is expected to receive roughly 10k HEAD requests every
+`ReprovideInterval`, one from each DHT server.
+
+Around 300k libp2p clients interact with the Amino DHT on a daily basis
+([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#ipfs-servers-vs-clients-plot)).
+Therefore, if an attacker advertises a bogus provider record for a popular CID,
+they only need about 3% of these clients to contact the HTTP server in order to
+mount an attack that would be more resource-intensive than the countermeasure.
+A client trying to fetch content from the targeted server sends one GET
+request.
+
+This analysis only covers current libp2p-based nodes. As more users adopt IPFS
+in browsers, the number of nodes that could potentially participate in a DDoS
+attack will increase, as will the scale of such an attack. Furthermore, users
+of the Delegated Routing HTTP API could also contribute to the attack, even if
+they are not DHT clients.
+
+The cost of the proposed countermeasure seems reasonable compared to the
+potential cost of a real DDoS attack.
+
+### Compatibility
+
+Nothing changes for existing DHT Servers running an older version. Up-to-date
+DHT Servers will make an additional check before associating `/tls/http`
+multiaddresses with provider records. Over time, the network will stop
+propagating unauthorized HTTP endpoints.
+
+Providers advertising content with `/tls/http` multiaddresses to the Amino DHT
+MUST comply with the described check. We are not aware of `/tls/http`
+multiaddresses currently advertised to the Amino DHT, hence no change is
+expected from current providers.
+
+The same verification mechanism could be used by other content routing systems,
+such as IPNI. For more control, it is recommended that each content routing
+system use a dedicated path, e.g `.well-known/libp2p/ipni/provider/{peerid}`
+for IPNI.
+
+### Security
+
+In the current Amino DHT implementation, DHT servers do not verify the
+multiaddresses included in a provider record when processing an `ADD_PROVIDER`
+request. They only allow a node to announce itself as a provider.
+
+If a malicious libp2p node crafts a multiaddress that pairs its own valid peer
+ID with the IP address of another actual libp2p node and advertises that node
+as the provider for a particular CID, the client attempting to retrieve the
+content will encounter a peer ID mismatch error during the libp2p security
+handshake. This fail-fast mechanism prevents misuse in pure libp2p records.
+
+The challenge arises with HTTP clients because they do not use peer IDs when
+fetching content from an HTTP server. As a result, an HTTP connection cannot
+fail during the handshake, making it easier for a malicious actor to advertise
+an arbitrary peer as the provider for a popular CID. Such misrepresentation
+could negatively impact both the client and the HTTP server.
+
+To prevent this weakening of the system and to stop the DHT from being
+exploited as a vector for DDoS attacks using HTTP clients, we introduce an
+extra verification step. This step ensures that only authorized libp2p nodes
+are allowed to advertise HTTP addresses. With this additional check, DHT HTTP
+records will be more reliable and secure than standard libp2p-only records.
+
+A malicious node could still launch a DDoS attack on an HTTP server by
+advertising a libp2p TCP multiaddress, such as `/ip4/A.B.C.D/tcp/443`, as the
+provider. This deceptive advertisement might cause other libp2p nodes to
+attempt a TCP connection to the HTTP server, with the connection only failing
+later. The primary DDoS mitigation goal is to prevent HTTP-only clients from
+being drawn into such attacks, since they use `/tls/http` addresses rather than
+the unverified libp2p `/tcp/443` addresses.
+
+Another important consideration is maintaining a secure `CID -> peerid`
+mapping. While nodes might still advertise content they do not serve, they must
+not be allowed to falsely claim that another node provides a CID. This secure
+mapping also supports the potential implementation of a caching layer that
+verifies `peerid -> []maddrs` mappings, relying on the trustworthy DHT `CID ->
+peerid` mapping.
+
+In summary, the extra verification for HTTP addresses does not stop nodes from
+advertising content they do not possess; it only prevents them from targeting
+other nodes by falsely claiming that those nodes provide content they do not
+actually host.
+
+### Alternatives
+
+#### Do nothing: not verifying `/tls/http` addresses at all
+
+In its current state, the Amino DHT allows for `/tls/http` provider records.
+However, it would be possible for malicious actors to use the DHT as vector of
+DDoS attack where numerous HTTP-only clients target a specific HTTP server.
+
+See [Cost estimation](#cost-estimation) for the rationale why it is better to
+do something about it.
+
+#### Reuse Peer ID Authentication over HTTP
+
+The [Peer ID Authentication over
+HTTP](https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md)
+mechanism could potentially be reused, but it presents several significant
+drawbacks that render it less practical for HTTP-only IPFS providers. Notably,
+it lacks a "server-only" authentication option. While mutual authentication
+could be halted after the server responds with an HTTP 401 status and includes
+its own PeerID in the HTTP header, this approach introduces notable challenges:
+
+* It increases complexity, requiring not just a standard HTTP HEAD request but
+also the implementation of a custom Authorization header workflow.
+* It restricts the HTTP server to representing only a single PeerID, preventing
+the sharding of announcements across multiple PeerIDs and thus making
+multi-user storage providers unfeasible.
+* It constrains deployment options, requiring the HTTP server to run custom
+software, which eliminates the possibility of using static-only hosting
+solutions like an S3 bucket.
+
+#### Generic `.well-known/libp2p/peerid` file
+
+PeerIDs that the HTTP server has authorized to advertise content to the Amino
+DHT could be listed in the generic `.well-known/libp2p/peerid` file. This file
+may also be used to delegate content provision requests to other content
+routing systems (for example, IPNI), or generally for other applications.
+
+However, since modifying the DHT protocol is a long and painful process, the
+file used by Amino DHT servers for verification MUST remain stable. Any
+alteration to the `.well-known/libp2p/peerid` format would require months or
+even years for full adoption by DHT servers. In addition, if other applications
+begin using this generic file, DHT servers may end up retrieving unnecessary
+extra information.
+
+#### Single `.well-known/libp2p/amino/providers` file
+
+An alternative approach is to consolidate all authorized peer IDs into a single
+file located at `.well-known/libp2p/amino/providers` instead of using separate
+files at `.well-known/libp2p/amino/providers/{peerid}`. However, this method
+has drawbacks. DHT servers would need to download more data than they would
+with a simple HTTP HEAD request. Additionally, they cannot benefit from caching
+all addresses contained in the `.well-known/libp2p/amino/providers` at once,
+because they should only cache addresses used by an actual DHT node to avoid
+caching an unbounded number of peer IDs.
+
+#### Reuse `did:web` Method Specification
+
+The [did:web Method Specification](https://w3c-ccg.github.io/did-method-web/)
+outlines a mechanism for listing one or more ED25519 keys. However, adopting it
+presents several challenges:
+
+* PeerIDs are not simple key fingerprints; they are multihashes derived from a
+protobuf structure.
+* The method’s JSON manifest must adhere to a specific schema.
+* This results in an overly complex JSON format, necessitating additional
+processing and conversion, which introduces unnecessary complexity to the DHT
+server implementation.
+
+#### Reuse `.well-known/libp2p/protocols` file
+
+[Existing libp2p HTTP
+specification](https://github.com/libp2p/specs/tree/master/http#namespace)
+states that application protocols can be discovered by the well-known resource
+`.well-known/libp2p/protocols`. Adding “authorized_peers” field to this file
+would allow DHT Servers to dispatch a single GET request to learn about both
+PeerID and supported HTTP protocols.
+
+The downside of this approach is mixing responsibilities of unrelated specs and
+use cases, however performance benefit may be worth it.
+
+## Out of Scope
+
+* Amino DHT Providing over HTTP
+* Amino DHT lookups for HTTP-only Clients
+* Amino DHT Delegated Provides for libp2p nodes
+* HTTP Provider Records in IPNI
+
+## Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).