-
Notifications
You must be signed in to change notification settings - Fork 237
IPIP-504: provider
query parameter as hint for HTTP Gateways
#504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
c5e690a
25ba9b2
00f4bf6
1557178
3f6c131
5382b97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,242 @@ | ||
--- | ||
# IPIP number should match its pull request number. After you open a PR, | ||
# please update title and update the filename to `ipip0000`. | ||
title: "IPIP-0504: Provider Query Parameter" | ||
date: YYYY-MM-DD | ||
ipip: proposal | ||
editors: | ||
- name: Vasco Santos | ||
relatedIssues: | ||
- link to issue | ||
order: 0504 | ||
tags: ['ipips'] | ||
--- | ||
|
||
## Summary | ||
|
||
A URI-based format for expressing content-addressed identifiers (such as IPFS CIDs) optionally augmented with one or more provider hints. This format aims to support a simple, unopinionated, transport-agnostic scheme to simplify data retrieval in content-addressable systems by introducing a clear, extensible, and broadly compatible URI format. | ||
|
||
## Motivation | ||
|
||
Content-addressable systems, such as IPFS, allow data to be identified by the hash of its contents (CID), enabling verifiable, immutable references. However, retrieving content typically relies on side content discovery systems (e.g. DHT, IPNI), even when a client MAY know one (or more) provider of the bytes. A provider in this context is any node, peer, gateway, or service that can serve content identified by a CID. | ||
|
||
Existing solutions (e.g., magnet URIs, RASL) propose alternative ideas where provider hints are encoded next to the content identifier. Inspired by these solutions and focusing particularly on ergonomics, extensibility, and ease of adoption, this IPIP aims to augment an IPFS URI with a provider query parameter. | ||
|
||
## Requirements, Goals, and Non-Goals | ||
|
||
### Goals | ||
|
||
| Goal | Description | | ||
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| **Low-effort adoption** | Enable systems like IPFS (e.g., Kubo, Helia), gateways, etc., to adopt the format with minimal changes. Or even no changes by relying on current discovery systems. | | ||
| **Extensible hint system** | Support encoding multiple transport hints (e.g., HTTP, TCP), while extensible to support intermediary hops (e.g. IPNI, RASL), priorities/fallbacks, etc. | | ||
| **Preserve base compatibility** | Maintain compatibility with existing URI forms such as `ipfs://CID/...` and HTTP gateway URLs. | | ||
| **Ergonomic for CLI and sharing** | Should be human-editable, URL-query-based, no strict URL-encoding beyond what browsers or CLIs already handle. Easy to copy/paste, share, or edit by hand. | | ||
| **Publisher-driven** | Allow publishers to encode as much transport/discovery information as they want, with no requirement for intermediary systems. They can disappear, yet the link remains useful. | | ||
| **Fallback resilience** | URI should encode enough to allow clients to attempt various fallbacks or resolve via discovery (e.g., DHT, IPNI). | | ||
| **Self-descriptive** | May support optional encoding of content types to enable clients to understand how to interpret the content after verification. | | ||
| **Protocol-agnostic** | Must not be tied to HTTP-only systems. Other transport protocols, like the ones supported by libp2p, must be possible to use if encoded as hints. | | ||
| **Forward-compatible** | Format should support future expansions: new hint types, encodings, content representations, etc. | | ||
|
||
### Non-Goals | ||
|
||
| Non-Goal | Reason | | ||
| --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | ||
| **Replace existing `ipfs://` or HTTP gateway URLs** | This format builds upon and extends them; not a replacement. | | ||
| **Strictly define a resolution order** | Clients may choose how and in what order to try hints or fallback strategies. | | ||
| **Mandate use of a centralized service** | While some hints may include centralized endpoints (e.g., HTTP URLs), the URI format should support fully decentralized retrieval. | | ||
| **Guarantee live access** | A hint may point to an offline, censored or throttled node. The client may use other hints or its own discovery logic. | | ||
| **Act as a trust layer** | These URIs do not manage identity or trust directly—verification remains based on CID integrity. | | ||
|
||
### Requirements | ||
|
||
| Requirement | Reasoning | | ||
| ---------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| **CID as core address** | Content addressing should always resolve to a CID. Provider hints decorate, not replace, this. | | ||
| **Multi-hint support** | Support multiple hints per URI, enabling clients to try multiple fetch paths if one fails. | | ||
| **Hinted provider must be optional** | Clients without support for hints must still be able to resolve using traditional discovery (DHT, IPNI, etc.), if the publisher set it up. | | ||
| **No required translation step** | Links should not require dynamic translation (e.g., via an origin-based redirector). Links are self-contained. | | ||
| **Minimal client assumptions** | Clients may safely ignore unknown hints and still function. This ensures progressive enhancement. | | ||
| **Composable with Gateway URLs** | Hints should not break Gateway-based access patterns. For example, users should be able to use URLs like `https://gw.io/ipfs/CID?provider=http...`. | | ||
| **Multiaddr-based hint syntax** | For transport-agnosticism, hints should leverage multiaddr representation. | | ||
| **No third-party resolution dependency** | Links should work standalone—resolution should not depend on reaching a third-party registry or lookup service. | | ||
| **No strict encoding rules** | Except for standard URI syntax, do not require opaque encodings. Hints should be human-readable when possible. | | ||
|
||
## Detailed design | ||
|
||
This section defines a URI format for expressing a content identifier (CID) along with optional provider hints that **guide clients** on how/where to fetch the associated content. The format is intended to be directly compatible with both IPFS Gateway URLs and `ipfs://` scheme URIs, while preserving flexibility and extensibility to also be compatible with other systems or upgrades. | ||
|
||
Please note that the current format is not intended to fully specify all identified use cases or requirements. But focus on leaving the door open to more in depth specifications for specific cases. | ||
|
||
### 📐 Format | ||
|
||
The proposed URI format introduces a new optional query parameter `provider`, which may appear one or more times. Each `provider` value represents a content provider hint and is composed by a `multiaddr` string. The `provider` parameter is optional, and clients MAY ignore it. | ||
|
||
The base format is: | ||
|
||
```sh! | ||
[ipfs://<CID> | https://<gateway>/ipfs/<CID> | https://<CID>.ipfs.<gateway> ]?[provider=<multiaddr1>&provider=<multiaddr22>&...] | ||
``` | ||
|
||
### 🧠 Parsing | ||
|
||
The CID is the core of a Provider-Hinted URI. Clients MUST extract the CID before evaluating any hints. The format is designed to be compatible with current IPFS like URIs, while explicitly defining how to locate the CID and interpret `provider` query parameters. | ||
|
||
#### CID Extraction Rules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like the below details should be linking to another spec i'm sure we have written somewhere There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mind pointing out where it lives? I do not know about this and could not find nothing when working on this |
||
|
||
To ensure consistent parsing, clients MUST extract the CID using the following precedence rules: | ||
|
||
**1. Multi-dotted Origin Format** | ||
|
||
If the CID is encoded as a subdomain label (e.g., `https://<CID>.ipfs.<gateway>`): | ||
|
||
- The CID MUST be the left-most label. | ||
- The label immediately following MUST be `ipfs`. | ||
- Any path MUST NOT also include a CID. | ||
|
||
**Example:** | ||
✅ `https://bafy...ipfs.dweb.link` | ||
❌ `https://bafy...ipfs.dweb.link/ipfs/bafybogus` (ambiguous; reject) | ||
|
||
**2. Path-Based Format** | ||
|
||
If the CID is encoded in the path (e.g., `https://<gateway>/ipfs/<CID>`): | ||
|
||
- The path MUST match the pattern `/ipfs/<CID>`, where `<CID>` is a valid content identifier. | ||
|
||
**Example:** | ||
✅ `https://gateway.io/ipfs/bafy...` | ||
❌ `https://gateway.io/bafy...` (no `/ipfs/` marker) | ||
|
||
**3. ipfs:// Scheme Format** | ||
|
||
- The CID MUST immediately follow the scheme delimiter: `ipfs://<CID>`. | ||
- Additional path/query components MAY follow. | ||
|
||
**Example:** | ||
✅ `ipfs://bafy...` | ||
|
||
**4. Conflict Resolution** | ||
|
||
If a CID is present in both a multi-dotted origin and in the path (even if they match), the URI MUST be rejected as ambiguous. | ||
|
||
--- | ||
|
||
#### Query Parameter: `provider` | ||
|
||
- Name: `provider` | ||
- Type: URI Query Parameter (repeating allowed) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why repeating parameter instead of comma-delimited? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably to make spec simpler to implement. Some reasons to repeat instead of comma-separating:
|
||
- Value: Multiaddr string (`?provider=multiaddr`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Independent of my main objection to the idea of In Go we saw so many badly written parsers of multiaddrs that there's an in progress regex-like library to try to make them easier to work with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoa that parser library is neat! would it be possible to just retrospecify how that library does it and call THAT the spec, or at least a starting point for it? i'm not sure this IPIP needs to be blocked on that specification process, but it seems as good a time/occasion as any to finally nail down multiaddr (and make it easier for a provably interoperable parser library to be made for other languages!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a fan of pulling in libp2p concepts into gateway spec here (peerid, multiaddr). What if in the future everything will be HTTP providers – are we cosplaying libp2p with multiaddrs and fake PeerIDs still? Maybe to do bare minimum to future proof this, state this field is an opaque string that should be parsed as Multiaddrs (if starting with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find surprising that Anyway, after syncing with @bumblefudge last week, I totally agree that we should also accept There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made string possible in 00f4bf6 |
||
- Interpretation: Optional hint for how to fetch and locate the content identified by the CID | ||
|
||
#### Query Parsing (`provider` Parameters) | ||
|
||
Once a CID has been successfully extracted, clients MAY parse `provider` parameters from the query string. Each `provider` value represents a provider hint, encoded as a multiaddr string. | ||
|
||
**1. Parsing Rules** | ||
|
||
- The `provider` query parameter MAY appear multiple times. | ||
- Each `provider` parameter MUST be treated as an independent, optional provider hint. | ||
- Clients MAY ignore hints with invalid multiaddrs. | ||
|
||
**2. Evaluate hints** | ||
|
||
- Clients MAY: | ||
- Ignore all `provider` parameters (if unsupported). | ||
- Evaluate hints in order of appearance (left-to-right). | ||
- Evaluate hints in parallel. | ||
- Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel. | ||
|
||
|
||
Note that the `multiaddr` string should point to the `origin` server where given CID is provided, and not include the actual CID in the Hint multiaddr as a subdomain/path. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are CIDs valid in multiaddr at all (besides Peer ID)? might be worth linking to multiaddr spec: https://github.com/libp2p/specs/blob/master/addressing/README.md#multiaddr-in-libp2p There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a bit confused. CID is not encoded in the multiaddr |
||
|
||
--- | ||
|
||
#### Example Parsing Flows | ||
|
||
**Input URI:** | ||
`https://bafy....ipfs.dweb.link/ipfs/bafy...?provider=/dns4/hash-stream-like-server.io/tcp/443/https` | ||
→ **REJECT** (CID appears in both hostname and path) | ||
|
||
**Input URI:** | ||
`https://dweb.link/ipfs/bafy...?provider=/dns4/hash-stream-like-server.io/tcp/443/https&provider=/ip4/192.0.2.1/tcp/4001/ws` | ||
|
||
→ Extract CID: `bafy...` | ||
→ Parse `provider` params: | ||
|
||
1. `/dns4/hash-stream-like-server.io/tcp/443/https` using `http` | ||
2. `/ip4/192.0.2.1/tcp/4001/ws` using `libp2p` | ||
→ Attempt connections via hints or fall back to default resolution. | ||
|
||
**Input URI:** | ||
`https://dweb.link/ipfs/bafy...?provider=/dns4/hash-stream-like-server.io/tcp/443/https` | ||
|
||
→ Extract CID: `bafy...` | ||
→ Parse `provider` params: | ||
|
||
1. `/dns4/hash-stream-like-server.io/tcp/443/https` using `http` | ||
→ Attempt connections via hints or fall back to default resolution. | ||
|
||
--- | ||
|
||
### Client Behavior and potential Server Roles | ||
|
||
In addition to guiding client-side resolution, provider hints can be interpreted by servers under certain circumstances. The semantics of hint placement influence visibility and use: | ||
|
||
- If the `provider` parameter is included in the **query** (`?...`), it MAY be communicated to the server depending on the client parsing the parameter. | ||
- If the `provider` is encoded as a **fragment** (`#...`), it is only accessible to the client (browsers do not send fragments to the server). | ||
|
||
This distinction allows URI publishers to tailor behavior: | ||
|
||
- **Client-only mode:** Use a fragment (`#provider=...`) to ensure the server remains unaware of hint data. This is useful for privacy-preserving client apps or when hints are intended to guide only the client. | ||
- **Server-assisted mode:** Use query parameters (`?provider=...`) to allow the server to parse and act on provider hints. This may enable proxy behavior, similar to existing IPFS gateways like `ipfs.io` or `dweb.link`. | ||
|
||
Publishers of such URIs should consider the **security profile** and **trust assumptions** of their environment when deciding how to encode hints. | ||
|
||
This flexibility supports a spectrum of use cases—from fully local client-side fetch strategies to cooperative client-server resolution pipelines. | ||
|
||
## Design rationale | ||
|
||
TODO | ||
|
||
The rationale fleshes out the specification by describing what motivated | ||
the design and why particular design decisions were made. | ||
|
||
Provide evidence of rough consensus and working code within the community, | ||
and discuss important objections or concerns raised during discussion. | ||
|
||
### User benefit | ||
|
||
TODO | ||
|
||
How will end users benefit from this work? | ||
vasco-santos marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
### Compatibility | ||
|
||
TODO | ||
|
||
Explain the upgrade considerations for existing implementations. | ||
|
||
### Security | ||
|
||
TODO | ||
|
||
Explain the security implications/considerations relevant to the proposed change. | ||
|
||
### Alternatives | ||
|
||
TODO | ||
|
||
Describe alternate designs that were considered and related work. | ||
|
||
## Test fixtures | ||
|
||
TODO | ||
|
||
List relevant CIDs. Describe how implementations can use them to determine | ||
specification compliance. This section can be skipped if IPIP does not deal | ||
with the way IPFS handles content-addressed data, or the modified specification | ||
file already includes this information. | ||
|
||
### Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand that there's a latency improvement that can be had here by hard-coding a provider into the URL in practice when I've seen this come up in the past it's been due to people not wanting to use the "mainnet" routing systems while sort of pretending that the content is available via mainnet (e.g. a pinning service not wanting to advertise their data to the Amino DHT or IPNI, but instead have their users use URIs like
ipfs://bafyfoo?provider=<the-pinning-service>
. In this light this proposal seems more likely to harm than help the IPFS ecosystem.Some examples:
ipfs://bafyfoo
used to not be ephemeralipfs://bafyfoo?provider=<pinning-service-that-had-the-cid-when-the-link-was-made>
is now ephemeral. Yes, you could fallback to ignoring the provider but:ipfs://bafyfoo?provider=<pinning-service-that-had-the-data-when-the-link-was-made>
into their applications, smart contracts, etc. will now need to figure out how to update theprovider
part of the URI vs previously when they could just be ephemeralSome alternatives to this approach that seem like they resolve much of the problems for users:
It'd be useful to understand why the benefits of this outweigh the associated ecosystem risks
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen many situations in which you want to share content addressed data with all the benefits of verification and p2p without necessarily caring about long-term persistence. If we don't formalise this in a spec, we end up pushing users to solving this in user/application space (see example), leading to fragmentation, and no ecosystem benefits from a conventional approach.
This would typically be done using the Delegated Routing API for which we have a lot of tooling and support in implementations. Practically, this which would involve setting a application specific endpoint somewhere, which — just like a specific provider maddr — can go down, or become stale eventually. But for as long as that delegated routing endpoint is up, it helps the app can map CIDs to provider maddrs.
It seems to me that insisting that this is the recommended to do it, over encoding it along with the CID is overly pedantic, given that the two approaches are not all that dissimilar, with the one exception that your mixing "permanent" information, i.e. the CID, with impermanent, i.e. a specific provider maddr.
This is the crux of the debate here; and ultimately a question of where this boundary between ephemeral and permanent should be delineated. I happen to think it maps elegantly to query parameters.
But more broadly, content routing is hard and adds a performance tax that undercuts adoption in scenarios where the benefits of verification and content-addressing are desired.
Optimising for successful retrieval by CID should be a higher level goal, and think that the broad strokes proposed here advance this goal. Moreover, it paves a path for incremental adoption of content addressing.
So my take is that that this we should just encode all of these caveats into this IPIP, e.g. strongly recommend against persistence of the provider hint in long term storage like on-chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this @2color . This is exactly what I feel like! I wrote some use cases that can benefit directly from this https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md#use-cases
Note that some of them are actually depending on a follow up of this (adding
tags
) to the multiaddr, which for now is a different conversation, as alsotags
are optional ways to expand on this even furtherThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vasco-santos for flagging the use cases. As I think I've called out the risk I'm most concerned with here is degradation of the ecosystem of URIs that have for the last 10 years been associated with "mainnet". Breaking them down into your use cases:
A thought on a potential compromise here that may address my concerns while still helping with the problem quoted above.
An additional / alternative compromise introduced to me by @Stebalien:
Since one of the major issues I flagged is the links becoming effectively ephemeral, introducing vendor lock-in, etc. this can largely be side-stepped by allowing introducing these hints at the DNS(Link) layer. Since those records are largely mutable anyway the risks are much lower.
The risks / downsides are still non-zero. For example, users in ENS land are using immutable names for versioning in ways that again resurface the problems documented here. However, it is IMO less bad than simply allowing the ephemeral links and location dependence in the immutable IPFS part of the stack.