|
| 1 | +--- |
| 2 | +title: "IPIP-0000: Amino DHT HTTP Records" |
| 3 | +date: 2025-04-10 |
| 4 | +ipip: proposal |
| 5 | +editors: |
| 6 | + - name: Guillaume Michel |
| 7 | + github: guillaumemichel |
| 8 | + url: https://guillaume.michel.id/ |
| 9 | + affiliation: |
| 10 | + name: Shipyard |
| 11 | + url: https://ipshipyard.com |
| 12 | + - name: Marcin Rataj |
| 13 | + github: lidel |
| 14 | + url: https://lidel.org/ |
| 15 | + affiliation: |
| 16 | + name: Shipyard |
| 17 | + url: https://ipshipyard.com |
| 18 | +relatedIssues: |
| 19 | +order: 0000 |
| 20 | +tags: ['ipips'] |
| 21 | +--- |
| 22 | + |
| 23 | +## Summary |
| 24 | + |
| 25 | +This IPIP introduces a secure mechanism for advertising `/tls/http` |
| 26 | +multiaddresses in the Amino DHT. By requiring HTTP servers to host an empty |
| 27 | +file at the well-known path `.well-known/libp2p/amino/providers/{peerid}` for |
| 28 | +each authorized libp2p peer ID the DHT servers ensure that only providers |
| 29 | +safelisted by the HTTP server can advertise its content. This additional |
| 30 | +verification step mitigates potential DDoS attacks and prevents malicious |
| 31 | +actors from falsely claiming that HTTP server hosts content, while leaving |
| 32 | +existing libp2p records unaffected. |
| 33 | + |
| 34 | +## Motivation |
| 35 | + |
| 36 | +Allowing content providers to advertise `/tls/http` multiaddresses within the |
| 37 | +Amino DHT is desirable because it broadens the network's interoperability and |
| 38 | +accessibility. With the introduction of HTTP retrievals, providers will be able |
| 39 | +to serve content from static HTTP hosting providers, such as S3 buckets, and |
| 40 | +they should be able to advertise these addresses to the Amino DHT. |
| 41 | + |
| 42 | +The current protocol already allows providers to choose which multiaddresses to |
| 43 | +associate with their records, and DHT servers serve all the addresses along |
| 44 | +with the provider record, even if they don’t understand them. Example: when |
| 45 | +`/webtransport` was rolled out, DHT servers that did not speak WebTransport |
| 46 | +still returned `/webtransport` addresses, despite not being able to use them. |
| 47 | +Hence advertising `/tls/http` multiaddresses to the Amino DHT is already |
| 48 | +possible. |
| 49 | + |
| 50 | +However, since `/tls/http` records are expected to be widely adopted by browser |
| 51 | +users, it is essential to mitigate potential Distributed Denial-of-Service |
| 52 | +(DDoS) attacks on HTTP servers. If any provider can freely associate arbitrary |
| 53 | +`/tls/http` multiaddresses with a provider record, a malicious actor could |
| 54 | +trigger significant HTTP traffic to a server they don’t control. We want to |
| 55 | +restrict `/tls/http` multiaddresses advertisement to hosts controlled by the |
| 56 | +provider. This verification would be performed by the DHT servers before |
| 57 | +associating the `tls/http` multiaddresses with the provider record. |
| 58 | +Additionally, this check would eliminate addresses pointing to misconfigured |
| 59 | +HTTP providers. |
| 60 | + |
| 61 | +This measure prevents HTTP clients (e.g., browser nodes) from being exploited |
| 62 | +in DDoS attacks through bogus DHT records. It is essential for integrating IPFS |
| 63 | +into browsers, as browser development teams prioritize robust DDoS prevention. |
| 64 | + |
| 65 | +## Detailed design |
| 66 | + |
| 67 | +Providers advertising content hosted on an HTTP server MUST host an empty file |
| 68 | +for each libp2p peer ID authorized to advertise that HTTP server’s content to |
| 69 | +the Amino DHT at the [well-known |
| 70 | +location](https://www.rfc-editor.org/rfc/rfc8615) |
| 71 | +`.well-known/libp2p/amino/providers/{peerid}`. The existence of an empty file |
| 72 | +named of the the peer ID serves as the authorization marker. The filename peer |
| 73 | +ID MUST follow [string representation from Libp2p PeerID |
| 74 | +specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) |
| 75 | +(base58btc multihash or CID with libp2p codec): |
| 76 | + |
| 77 | +``` |
| 78 | +.well-known/libp2p/amino/providers/12D3KooBase58MH |
| 79 | +.well-known/libp2p/amino/providers/k51KooBase36CID |
| 80 | +``` |
| 81 | + |
| 82 | +By hosting these individual empty files, the HTTP server grants permission for |
| 83 | +the corresponding providers to advertise that the server hosts content |
| 84 | +identified by any CID. |
| 85 | + |
| 86 | +When a DHT server receives an `ADD_PROVIDER` RPC from a given `peerID` that |
| 87 | +includes `/tls/http` multiaddresses, it MUST verify the existence of the file |
| 88 | +at `.well-known/libp2p/amino/provider/peerID` on the advertised HTTP server by |
| 89 | +issuing an HTTP HEAD request for each `/tls/http` address. If the HTTP HEAD |
| 90 | +request does not return a `200` response, the DHT server MUST NOT associate |
| 91 | +that `/tls/http` address with the provider record. |
| 92 | + |
| 93 | +DHT Servers SHOULD cache the resolved mapping of each `/tls/http` multiaddress |
| 94 | +to its peer IDs for the duration of the `ReprovideInterval` to minimize |
| 95 | +repetitive HTTP HEAD requests. Additionally, for addresses that fail |
| 96 | +verification, a negative cache entry SHOULD be maintained for 15 minutes to |
| 97 | +reduce unnecessary load and mitigate potential abuse. |
| 98 | + |
| 99 | +## Design rationale |
| 100 | + |
| 101 | +* **Lightweight Verification:** Each HTTP server only answers approximately one |
| 102 | +HEAD request per DHT Server per `ReprovideInterval`, regardless of the number |
| 103 | +of CIDs being advertised. |
| 104 | +* **Revocation Considerations:** If a provider revokes a peer ID, the |
| 105 | +previously published records will persist until the next reprovide cycle. Thus, |
| 106 | +a cache duration equal to the `ReprovideInterval` is appropriate. |
| 107 | +* **Negative Caching:** A 15-minute negative cache prevents malicious actors |
| 108 | +from triggering repeated HEAD requests, as the cost of generating a DHT provide |
| 109 | +request is higher than that of performing an HTTP HEAD, mitigating |
| 110 | +amplification attacks. |
| 111 | + |
| 112 | +### User benefit |
| 113 | + |
| 114 | +* **HTTP Addresses in DHT Provider Records:** Official support for `/tls/http` |
| 115 | +addresses in the Amino DHT. |
| 116 | +* **DHT Delegated Provides (HTTP only):** HTTP Servers can delegate their DHT |
| 117 | +provide to any libp2p node identified by its peer id. They can later revoke |
| 118 | +this permission. |
| 119 | +* **DDoS Attack Mitigation:** The Amino DHT cannot be used to start a DDoS |
| 120 | +attack of HTTP clients (e.g browser nodes) upon an arbitrary HTTP server. |
| 121 | + |
| 122 | +### Cost estimation |
| 123 | + |
| 124 | +For simplicity, we assume that the HTTP content provider is advertising enough |
| 125 | +CIDs so that every online DHT server stores at least one associated provider |
| 126 | +record. |
| 127 | + |
| 128 | +Given that there are currently around 10k DHT servers in the Amino DHT |
| 129 | +([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#dht-availability-classified-overall-plot)), |
| 130 | +the HTTP server is expected to receive roughly 10k HEAD requests every |
| 131 | +`ReprovideInterval`, one from each DHT server. |
| 132 | + |
| 133 | +Around 300k libp2p clients interact with the Amino DHT on a daily basis |
| 134 | +([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#ipfs-servers-vs-clients-plot)). |
| 135 | +Therefore, if an attacker advertises a bogus provider record for a popular CID, |
| 136 | +they only need about 3% of these clients to contact the HTTP server in order to |
| 137 | +mount an attack that would be more resource-intensive than the countermeasure. |
| 138 | +A client trying to fetch content from the targeted server sends one GET |
| 139 | +request. |
| 140 | + |
| 141 | +This analysis only covers current libp2p-based nodes. As more users adopt IPFS |
| 142 | +in browsers, the number of nodes that could potentially participate in a DDoS |
| 143 | +attack will increase, as will the scale of such an attack. Furthermore, users |
| 144 | +of the Delegated Routing HTTP API could also contribute to the attack, even if |
| 145 | +they are not DHT clients. |
| 146 | + |
| 147 | +The cost of the proposed countermeasure seems reasonable compared to the |
| 148 | +potential cost of a real DDoS attack. |
| 149 | + |
| 150 | +### Compatibility |
| 151 | + |
| 152 | +Nothing changes for existing DHT Servers running an older version. Up-to-date |
| 153 | +DHT Servers will make an additional check before associating `/tls/http` |
| 154 | +multiaddresses with provider records. Over time, the network will stop |
| 155 | +propagating unauthorized HTTP endpoints. |
| 156 | + |
| 157 | +Providers advertising content with `/tls/http` multiaddresses to the Amino DHT |
| 158 | +MUST comply with the described check. We are not aware of `/tls/http` |
| 159 | +multiaddresses currently advertised to the Amino DHT, hence no change is |
| 160 | +expected from current providers. |
| 161 | + |
| 162 | +The same verification mechanism could be used by other content routing systems, |
| 163 | +such as IPNI. For more control, it is recommended that each content routing |
| 164 | +system use a dedicated path, e.g `.well-known/libp2p/ipni/provider/{peerid}` |
| 165 | +for IPNI. |
| 166 | + |
| 167 | +### Security |
| 168 | + |
| 169 | +In the current Amino DHT implementation, DHT servers do not verify the |
| 170 | +multiaddresses included in a provider record when processing an `ADD_PROVIDER` |
| 171 | +request. They only allow a node to announce itself as a provider. |
| 172 | + |
| 173 | +If a malicious libp2p node crafts a multiaddress that pairs its own valid peer |
| 174 | +ID with the IP address of another actual libp2p node and advertises that node |
| 175 | +as the provider for a particular CID, the client attempting to retrieve the |
| 176 | +content will encounter a peer ID mismatch error during the libp2p security |
| 177 | +handshake. This fail-fast mechanism prevents misuse in pure libp2p records. |
| 178 | + |
| 179 | +The challenge arises with HTTP clients because they do not use peer IDs when |
| 180 | +fetching content from an HTTP server. As a result, an HTTP connection cannot |
| 181 | +fail during the handshake, making it easier for a malicious actor to advertise |
| 182 | +an arbitrary peer as the provider for a popular CID. Such misrepresentation |
| 183 | +could negatively impact both the client and the HTTP server. |
| 184 | + |
| 185 | +To prevent this weakening of the system and to stop the DHT from being |
| 186 | +exploited as a vector for DDoS attacks using HTTP clients, we introduce an |
| 187 | +extra verification step. This step ensures that only authorized libp2p nodes |
| 188 | +are allowed to advertise HTTP addresses. With this additional check, DHT HTTP |
| 189 | +records will be more reliable and secure than standard libp2p-only records. |
| 190 | + |
| 191 | +A malicious node could still launch a DDoS attack on an HTTP server by |
| 192 | +advertising a libp2p TCP multiaddress, such as `/ip4/A.B.C.D/tcp/443`, as the |
| 193 | +provider. This deceptive advertisement might cause other libp2p nodes to |
| 194 | +attempt a TCP connection to the HTTP server, with the connection only failing |
| 195 | +later. The primary DDoS mitigation goal is to prevent HTTP-only clients from |
| 196 | +being drawn into such attacks, since they use `/tls/http` addresses rather than |
| 197 | +the unverified libp2p `/tcp/443` addresses. |
| 198 | + |
| 199 | +Another important consideration is maintaining a secure `CID -> peerid` |
| 200 | +mapping. While nodes might still advertise content they do not serve, they must |
| 201 | +not be allowed to falsely claim that another node provides a CID. This secure |
| 202 | +mapping also supports the potential implementation of a caching layer that |
| 203 | +verifies `peerid -> []maddrs` mappings, relying on the trustworthy DHT `CID -> |
| 204 | +peerid` mapping. |
| 205 | + |
| 206 | +In summary, the extra verification for HTTP addresses does not stop nodes from |
| 207 | +advertising content they do not possess; it only prevents them from targeting |
| 208 | +other nodes by falsely claiming that those nodes provide content they do not |
| 209 | +actually host. |
| 210 | + |
| 211 | +### Alternatives |
| 212 | + |
| 213 | +#### Do nothing: not verifying `/tls/http` addresses at all |
| 214 | + |
| 215 | +In its current state, the Amino DHT allows for `/tls/http` provider records. |
| 216 | +However, it would be possible for malicious actors to use the DHT as vector of |
| 217 | +DDoS attack where numerous HTTP-only clients target a specific HTTP server. |
| 218 | + |
| 219 | +See [Cost estimation](#cost-estimation) for the rationale why it is better to |
| 220 | +do something about it. |
| 221 | + |
| 222 | +#### Reuse Peer ID Authentication over HTTP |
| 223 | + |
| 224 | +The [Peer ID Authentication over |
| 225 | +HTTP](https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md) |
| 226 | +mechanism could potentially be reused, but it presents several significant |
| 227 | +drawbacks that render it less practical for HTTP-only IPFS providers. Notably, |
| 228 | +it lacks a "server-only" authentication option. While mutual authentication |
| 229 | +could be halted after the server responds with an HTTP 401 status and includes |
| 230 | +its own PeerID in the HTTP header, this approach introduces notable challenges: |
| 231 | + |
| 232 | +* It increases complexity, requiring not just a standard HTTP HEAD request but |
| 233 | +also the implementation of a custom Authorization header workflow. |
| 234 | +* It restricts the HTTP server to representing only a single PeerID, preventing |
| 235 | +the sharding of announcements across multiple PeerIDs and thus making |
| 236 | +multi-user storage providers unfeasible. |
| 237 | +* It constrains deployment options, requiring the HTTP server to run custom |
| 238 | +software, which eliminates the possibility of using static-only hosting |
| 239 | +solutions like an S3 bucket. |
| 240 | + |
| 241 | +#### Generic `.well-known/libp2p/peerid` file |
| 242 | + |
| 243 | +PeerIDs that the HTTP server has authorized to advertise content to the Amino |
| 244 | +DHT could be listed in the generic `.well-known/libp2p/peerid` file. This file |
| 245 | +may also be used to delegate content provision requests to other content |
| 246 | +routing systems (for example, IPNI), or generally for other applications. |
| 247 | + |
| 248 | +However, since modifying the DHT protocol is a long and painful process, the |
| 249 | +file used by Amino DHT servers for verification MUST remain stable. Any |
| 250 | +alteration to the `.well-known/libp2p/peerid` format would require months or |
| 251 | +even years for full adoption by DHT servers. In addition, if other applications |
| 252 | +begin using this generic file, DHT servers may end up retrieving unnecessary |
| 253 | +extra information. |
| 254 | + |
| 255 | +#### Single `.well-known/libp2p/amino/providers` file |
| 256 | + |
| 257 | +An alternative approach is to consolidate all authorized peer IDs into a single |
| 258 | +file located at `.well-known/libp2p/amino/providers` instead of using separate |
| 259 | +files at `.well-known/libp2p/amino/providers/{peerid}`. However, this method |
| 260 | +has drawbacks. DHT servers would need to download more data than they would |
| 261 | +with a simple HTTP HEAD request. Additionally, they cannot benefit from caching |
| 262 | +all addresses contained in the `.well-known/libp2p/amino/providers` at once, |
| 263 | +because they should only cache addresses used by an actual DHT node to avoid |
| 264 | +caching an unbounded number of peer IDs. |
| 265 | + |
| 266 | +#### Reuse `did:web` Method Specification |
| 267 | + |
| 268 | +The [did:web Method Specification](https://w3c-ccg.github.io/did-method-web/) |
| 269 | +outlines a mechanism for listing one or more ED25519 keys. However, adopting it |
| 270 | +presents several challenges: |
| 271 | + |
| 272 | +* PeerIDs are not simple key fingerprints; they are multihashes derived from a |
| 273 | +protobuf structure. |
| 274 | +* The method’s JSON manifest must adhere to a specific schema. |
| 275 | +* This results in an overly complex JSON format, necessitating additional |
| 276 | +processing and conversion, which introduces unnecessary complexity to the DHT |
| 277 | +server implementation. |
| 278 | + |
| 279 | +#### Reuse `.well-known/libp2p/protocols` file |
| 280 | + |
| 281 | +[Existing libp2p HTTP |
| 282 | +specification](https://github.com/libp2p/specs/tree/master/http#namespace) |
| 283 | +states that application protocols can be discovered by the well-known resource |
| 284 | +`.well-known/libp2p/protocols`. Adding “authorized_peers” field to this file |
| 285 | +would allow DHT Servers to dispatch a single GET request to learn about both |
| 286 | +PeerID and supported HTTP protocols. |
| 287 | + |
| 288 | +The downside of this approach is mixing responsibilities of unrelated specs and |
| 289 | +use cases, however performance benefit may be worth it. |
| 290 | + |
| 291 | +## Out of Scope |
| 292 | + |
| 293 | +* Amino DHT Providing over HTTP |
| 294 | +* Amino DHT lookups for HTTP-only Clients |
| 295 | +* Amino DHT Delegated Provides for libp2p nodes |
| 296 | +* HTTP Provider Records in IPNI |
| 297 | + |
| 298 | +## Copyright |
| 299 | + |
| 300 | +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
0 commit comments