Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions src/ipips/ipip-0501.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @aschmahmann as it aims to address concerns from #496
cc @willscott @masih for visibility and feedback from IPNI
cc @vasco-santos @ribasushi @alanshaw from Storacha side of things

Context: this IPIP explores idea of only authorized PeerIDs being able to announce /tls/http endpoint on Amino DHT (DHT servers would be ignoring multiaddrs that dont pass this validation), but could also act as blueprint for other routing systems that don't want to be used for amplification attacks.

Why we need this and how this relates to Storacha if it does not use DHT?

  • HTTP retrieval is still wip, and remains opt-in for now

How does this relate to IPNI or other routing systems?

  • The way IPFS clients query routing systems does not care about details of specific routing system. Query for CID produces a list of providers as PeerID + multiaddrs.
  • IPNI itself could benefit from similar validation of multiaddrs and HTTP providers (afaik none exists today, addrs are accepted blindly, same amplification attack concerns as Amino DHT)
  • IPNI+DHT+FOO hybrid announcement during the initial data onboarding
    • The onboarding latency between announcing new CIDs to IPNI, IPNI pulling that info, and CID being resolvable via https://cid.contact could be mitigated by doing a one-time announcement to Amino DHT. DHT acting as a "hot storage" for routing info, before it gets propagated to systems like IPNI.
    • This would be done by storage providers that onboard data, and if they, like Storacha, move towards being HTTP-only, we need to agree how to handle /tls/http maddrs)

Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
---
title: "IPIP-0501: Amino DHT HTTP Provider Records"
date: 2025-04-10
ipip: proposal
editors:
- name: Guillaume Michel
github: guillaumemichel
url: https://guillaume.michel.id/
affiliation:
name: Shipyard
url: https://ipshipyard.com
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
affiliation:
name: Shipyard
url: https://ipshipyard.com
relatedIssues:
order: 501
tags: ['ipips']
---

## Summary

This IPIP introduces a secure mechanism for advertising `/tls/http`
multiaddresses in the Amino DHT. By requiring HTTP servers to host an empty
file at the well-known path `.well-known/libp2p/amino/providers/{peerid}` for
each authorized libp2p peer ID the DHT servers ensure that only providers
safelisted by the HTTP server can advertise its content. This additional
verification step mitigates potential DDoS attacks and prevents malicious
actors from falsely claiming that HTTP server hosts content, while leaving
existing libp2p records unaffected.

## Motivation

Allowing content providers to advertise `/tls/http` multiaddresses within the
Amino DHT is desirable because it broadens the network's interoperability and
accessibility. With the introduction of HTTP retrievals, providers will be able
to serve content from static HTTP hosting providers, such as S3 buckets, and
they should be able to advertise these addresses to the Amino DHT.

The current protocol already allows providers to choose which multiaddresses to
associate with their records, and DHT servers serve all the addresses along
with the provider record, even if they don’t understand them. Example: when
`/webtransport` was rolled out, DHT servers that did not speak WebTransport
still returned `/webtransport` addresses, despite not being able to use them.
Hence advertising `/tls/http` multiaddresses to the Amino DHT is already
possible.

However, since `/tls/http` records are expected to be widely adopted by browser
users, it is essential to mitigate potential Distributed Denial-of-Service
(DDoS) attacks on HTTP servers. If any provider can freely associate arbitrary
`/tls/http` multiaddresses with a provider record, a malicious actor could
trigger significant HTTP traffic to a server they don’t control. We want to
restrict `/tls/http` multiaddresses advertisement to hosts controlled by the
provider. This verification would be performed by the DHT servers before
associating the `tls/http` multiaddresses with the provider record.
Additionally, this check would eliminate addresses pointing to misconfigured
HTTP providers.

This measure prevents HTTP clients (e.g., browser nodes) from being exploited
in DDoS attacks through bogus DHT records. It is essential for integrating IPFS
into browsers, as browser development teams prioritize robust DDoS prevention.

## Detailed design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @aschmahmann as it aims to address concerns from #496

This approach does not address my first issue from #496 (comment) if this is supposed to be "enough" to do trustless-gateway based retrieval.

How to handle more than 1 HTTP-based protocol

More info (background, some options, what IPNI does, potential solutions, etc.) is in the linked comment. If we're going to flat out ignore the issue then we should at least document the ramifications / implied spec changes that come from not choosing an explicit mechanism for handling more than 1 HTTP-based protocol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

The focus of this IPIP is to define the verification mechanism for HTTP Trustless Gateways advertisements in the Amino DHT, to prevent HTTP-only clients from being used in a DDoS (reflection) attack.

In parallel, we should have another IPIP defining the handling of multiple transfer protocols in the DHT. The current IPIP (#501) will probably depend on the future IPIP, so it will block on that.


Providers advertising content hosted on an HTTP server MUST host an empty file
for each libp2p peer ID authorized to advertise that HTTP server’s content to
the Amino DHT at the [well-known
location](https://www.rfc-editor.org/rfc/rfc8615)
`.well-known/libp2p/amino/providers/{peerid}`. The existence of an empty file
named of the the peer ID serves as the authorization marker. The filename peer
ID MUST follow [string representation from Libp2p PeerID
specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation)
(base58btc multihash or CID with libp2p codec):

```
.well-known/libp2p/amino/providers/12D3KooBase58MH
.well-known/libp2p/amino/providers/k51KooBase36CID
```

By hosting these individual empty files, the HTTP server grants permission for
the corresponding providers to advertise that the server hosts content
identified by any CID.

When a DHT server receives an `ADD_PROVIDER` RPC from a given `peerID` that
includes `/tls/http` multiaddresses, it MUST verify the existence of the file
at `.well-known/libp2p/amino/provider/peerID` on the advertised HTTP server by
issuing an HTTP HEAD request for each `/tls/http` address. If the HTTP HEAD
request does not return a `200` response, the DHT server MUST NOT associate
that `/tls/http` address with the provider record.

DHT Servers SHOULD cache the resolved mapping of each `/tls/http` multiaddress
to its peer IDs for the duration of the `ReprovideInterval` to minimize
repetitive HTTP HEAD requests. Additionally, for addresses that fail
verification, a negative cache entry SHOULD be maintained for 15 minutes to
reduce unnecessary load and mitigate potential abuse.

## Design rationale

* **Lightweight Verification:** Each HTTP server only answers approximately one
HEAD request per DHT Server per `ReprovideInterval`, regardless of the number
of CIDs being advertised.
* **Revocation Considerations:** If a provider revokes a peer ID, the
previously published records will persist until the next reprovide cycle. Thus,
a cache duration equal to the `ReprovideInterval` is appropriate.
* **Negative Caching:** A 15-minute negative cache prevents malicious actors
from triggering repeated HEAD requests, as the cost of generating a DHT provide
request is higher than that of performing an HTTP HEAD, mitigating
amplification attacks.

### User benefit

* **HTTP Addresses in DHT Provider Records:** Official support for `/tls/http`
addresses in the Amino DHT.
* **DHT Delegated Provides (HTTP only):** HTTP Servers can delegate their DHT
provide to any libp2p node identified by its peer id. They can later revoke
this permission.
* **DDoS Attack Mitigation:** The Amino DHT cannot be used to start a DDoS
attack of HTTP clients (e.g browser nodes) upon an arbitrary HTTP server.

### Cost estimation

For simplicity, we assume that the HTTP content provider is advertising enough
CIDs so that every online DHT server stores at least one associated provider
record.

Given that there are currently around 10k DHT servers in the Amino DHT
([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#dht-availability-classified-overall-plot)),
the HTTP server is expected to receive roughly 10k HEAD requests every
`ReprovideInterval`, one from each DHT server.

Around 300k libp2p clients interact with the Amino DHT on a daily basis
([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#ipfs-servers-vs-clients-plot)).
Therefore, if an attacker advertises a bogus provider record for a popular CID,
they only need about 3% of these clients to contact the HTTP server in order to
mount an attack that would be more resource-intensive than the countermeasure.
A client trying to fetch content from the targeted server sends one GET
request.

This analysis only covers current libp2p-based nodes. As more users adopt IPFS
in browsers, the number of nodes that could potentially participate in a DDoS
attack will increase, as will the scale of such an attack. Furthermore, users
of the Delegated Routing HTTP API could also contribute to the attack, even if
they are not DHT clients.

The cost of the proposed countermeasure seems reasonable compared to the
potential cost of a real DDoS attack.

### Compatibility

Nothing changes for existing DHT Servers running an older version. Up-to-date
DHT Servers will make an additional check before associating `/tls/http`
multiaddresses with provider records. Over time, the network will stop
propagating unauthorized HTTP endpoints.

Providers advertising content with `/tls/http` multiaddresses to the Amino DHT
MUST comply with the described check. We are not aware of `/tls/http`
multiaddresses currently advertised to the Amino DHT, hence no change is
expected from current providers.

The same verification mechanism could be used by other content routing systems,
such as IPNI. For more control, it is recommended that each content routing
system use a dedicated path, e.g `.well-known/libp2p/ipni/provider/{peerid}`
for IPNI.

### Security

In the current Amino DHT implementation, DHT servers do not verify the
multiaddresses included in a provider record when processing an `ADD_PROVIDER`
request. They only allow a node to announce itself as a provider.

If a malicious libp2p node crafts a multiaddress that pairs its own valid peer
ID with the IP address of another actual libp2p node and advertises that node
as the provider for a particular CID, the client attempting to retrieve the
content will encounter a peer ID mismatch error during the libp2p security
handshake. This fail-fast mechanism prevents misuse in pure libp2p records.

The challenge arises with HTTP clients because they do not use peer IDs when
fetching content from an HTTP server. As a result, an HTTP connection cannot
fail during the handshake, making it easier for a malicious actor to advertise
an arbitrary peer as the provider for a popular CID. Such misrepresentation
could negatively impact both the client and the HTTP server.

To prevent this weakening of the system and to stop the DHT from being
exploited as a vector for DDoS attacks using HTTP clients, we introduce an
extra verification step. This step ensures that only authorized libp2p nodes
are allowed to advertise HTTP addresses. With this additional check, DHT HTTP
records will be more reliable and secure than standard libp2p-only records.

A malicious node could still launch a DDoS attack on an HTTP server by
advertising a libp2p TCP multiaddress, such as `/ip4/A.B.C.D/tcp/443`, as the
provider. This deceptive advertisement might cause other libp2p nodes to
attempt a TCP connection to the HTTP server, with the connection only failing
later. The primary DDoS mitigation goal is to prevent HTTP-only clients from
being drawn into such attacks, since they use `/tls/http` addresses rather than
the unverified libp2p `/tcp/443` addresses.

Another important consideration is maintaining a secure `CID -> peerid`
mapping. While nodes might still advertise content they do not serve, they must
not be allowed to falsely claim that another node provides a CID. This secure
mapping also supports the potential implementation of a caching layer that
verifies `peerid -> []maddrs` mappings, relying on the trustworthy DHT `CID ->
peerid` mapping.

In summary, the extra verification for HTTP addresses does not stop nodes from
advertising content they do not possess; it only prevents them from targeting
other nodes by falsely claiming that those nodes provide content they do not
actually host.

### Alternatives

#### Do nothing: not verifying `/tls/http` addresses at all

In its current state, the Amino DHT allows for `/tls/http` provider records.
However, it would be possible for malicious actors to use the DHT as vector of
DDoS attack where numerous HTTP-only clients target a specific HTTP server.

See [Cost estimation](#cost-estimation) for the rationale why it is better to
do something about it.

#### Reuse Peer ID Authentication over HTTP

The [Peer ID Authentication over
HTTP](https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md)
mechanism could potentially be reused, but it presents several significant
drawbacks that render it less practical for HTTP-only IPFS providers. Notably,
it lacks a "server-only" authentication option. While mutual authentication
could be halted after the server responds with an HTTP 401 status and includes
its own PeerID in the HTTP header, this approach introduces notable challenges:

* It increases complexity, requiring not just a standard HTTP HEAD request but
also the implementation of a custom Authorization header workflow.
* It restricts the HTTP server to representing only a single PeerID, preventing
the sharding of announcements across multiple PeerIDs and thus making
multi-user storage providers unfeasible.
* It constrains deployment options, requiring the HTTP server to run custom
software, which eliminates the possibility of using static-only hosting
solutions like an S3 bucket.

#### Generic `.well-known/libp2p/peerid` file

PeerIDs that the HTTP server has authorized to advertise content to the Amino
DHT could be listed in the generic `.well-known/libp2p/peerid` file. This file
may also be used to delegate content provision requests to other content
routing systems (for example, IPNI), or generally for other applications.

However, since modifying the DHT protocol is a long and painful process, the
file used by Amino DHT servers for verification MUST remain stable. Any
alteration to the `.well-known/libp2p/peerid` format would require months or
even years for full adoption by DHT servers. In addition, if other applications
begin using this generic file, DHT servers may end up retrieving unnecessary
extra information.

#### Single `.well-known/libp2p/amino/providers` file

An alternative approach is to consolidate all authorized peer IDs into a single
file located at `.well-known/libp2p/amino/providers` instead of using separate
files at `.well-known/libp2p/amino/providers/{peerid}`. However, this method
has drawbacks. DHT servers would need to download more data than they would
with a simple HTTP HEAD request. Additionally, they cannot benefit from caching
all addresses contained in the `.well-known/libp2p/amino/providers` at once,
because they should only cache addresses used by an actual DHT node to avoid
caching an unbounded number of peer IDs.

#### Reuse `did:web` Method Specification

The [did:web Method Specification](https://w3c-ccg.github.io/did-method-web/)
outlines a mechanism for listing one or more ED25519 keys. However, adopting it
presents several challenges:

* PeerIDs are not simple key fingerprints; they are multihashes derived from a
protobuf structure.
* The method’s JSON manifest must adhere to a specific schema.
* This results in an overly complex JSON format, necessitating additional
processing and conversion, which introduces unnecessary complexity to the DHT
server implementation.

#### Reuse `.well-known/libp2p/protocols` file

[Existing libp2p HTTP
specification](https://github.com/libp2p/specs/tree/master/http#namespace)
states that application protocols can be discovered by the well-known resource
`.well-known/libp2p/protocols`. Adding “authorized_peers” field to this file
would allow DHT Servers to dispatch a single GET request to learn about both
PeerID and supported HTTP protocols.

The downside of this approach is mixing responsibilities of unrelated specs and
use cases, however performance benefit may be worth it.

## Out of Scope

* Amino DHT Providing over HTTP
* Amino DHT lookups for HTTP-only Clients
* Amino DHT Delegated Provides for libp2p nodes
* HTTP Provider Records in IPNI

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).