-
Notifications
You must be signed in to change notification settings - Fork 237
IPIP 0499: CID Profiles #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 16 commits
8842176
4ba68f0
6cc64cb
d8b8389
600d1fc
595588c
41f9b86
229988f
f37e610
7a12f0a
ff69e56
09baf68
cffade8
0402c84
ec07e30
f454912
9c621ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
--- | ||
title: 'IPIP-0499: CID Profiles' | ||
date: 2025-04-03 | ||
ipip: proposal | ||
editors: | ||
- name: Michelle Lee | ||
github: mishmosh | ||
affiliation: | ||
name: IPFS Foundation | ||
- name: Daniel Norman | ||
github: 2color | ||
affiliation: | ||
name: Shipyard | ||
url: https://ipshipyard.com | ||
relatedIssues: | ||
- https://discuss.ipfs.tech/t/should-we-profile-cids/18507 | ||
order: 0499 | ||
tags: ['ipips'] | ||
--- | ||
|
||
## Summary | ||
|
||
This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation. | ||
|
||
Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithm, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs. | ||
|
||
## Motivation | ||
|
||
UnixFS CIDs are not deterministic. This means that the same file tree can yield different CIDs depending on the parameters used by the implementation to generate it, which in some cases, aren't even configurable by the user. For example, the chunk size, DAG width, and layout can vary between implementations or even between different versions of the same implementation. | ||
|
||
This lack of determinism makes has a number of drawbacks: | ||
|
||
- It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs. | ||
- Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. | ||
- In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs | ||
|
||
By introducing profiles which define the parameters that affect the root CID of the DAG, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. | ||
|
||
## Detailed design | ||
|
||
We introduce a set of named profiles that define a set of parameters for generating UnixFS CIDs. These profiles can be used by implementations to ensure that the same content will yield the same CID across different tools and implementations. | ||
|
||
### UnixFS parameters | ||
|
||
The profiles define a set of parameters that affect the resulting string encoding of the CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include: | ||
|
||
1. CID version, e.g. CIDv0 or CIDv1 | ||
1. Multibase encoding for the CID, e.g. base32 | ||
1. Hash function used for all nodes in the DAG, e.g. sha2-256 | ||
1. UnixFS file chunking algorithm | ||
1. UnixFS file chunk size or target (if required by the chunking algorithm) | ||
1. UnixFS DAG layout (e.g. balanced, trickle etc...) | ||
1. UnixFS DAG width (max number of links per `File` node) | ||
1. `HAMTDirectory` fanout (must be a power of 2) | ||
|
||
1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this number is dynamic based on the lengths of the actual link entries in the dag, we will need to specify what algorithm that estimation follows. I would put such things in a special "ipfs legacy" profile to be honest, along with cidv0, non-raw leaves etc. We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, each layout would have its own set of layout-params:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, that's exactly what we're doing by defining this profile. |
||
1. Leaf Envelope: either `dag-pb` or `raw` | ||
1. Whether empty directories are included in the DAG. Some implementations apply filtering before merkleizing filesystem entries in the DAG. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. couple of other things to consider?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added this as a parameter. According to the latest version of https://github.com/ipfs/specs/pull/331/files, the calculation is done as follows:
If calculated according to this, does it make accurate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds about right, I remember there being some nuance in exactly what's included in the size calculation, making it not super stable if you get it slightly wrong (as we did for some variants in go-unixfsnode for a while) |
||
This would be specified as a table in (forthcoming [UnixFS spec](https://github.com/ipfs/specs/pull/331/files)). | ||
|
||
## Named profiles | ||
|
||
To make it easier for users and implementations to choose a set of parameters, we define a named profile `unixfs-2025` to encapsulate the parameters established as the baseline default by multiple implementations as of 2025. | ||
|
||
The **`unixfs-2025`** profile name is designed to be referenced by implementations and users to ensure that the same content will yield the same CID across different tools and implementations. | ||
|
||
The profile is defined as follows: | ||
|
||
| Parameter | Value | | ||
| ----------------------------- | ------------------------------------------------------- | | ||
| CID version | CIDv1 | | ||
| Hash function | sha2-256 | | ||
| Max chunk size | 1MiB | | ||
| DAG layout | balanced | | ||
| DAG width (children per node) | 1024 | | ||
| `HAMTDirectory` fanout | 256 blocks | | ||
| `HAMTDirectory` threshold | 256KiB (estimated by counting the size of PBNode.links) | | ||
| Leaves | raw | | ||
| Empty directories | TODO | | ||
|
||
## Current defaults | ||
|
||
Here is a summary table of current (2025-Q2) defaults: | ||
|
||
| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | | ||
| ----------------------------- | ------------- | ------------------------------ | ---------------- | ------------------ | ----------------------- | ------------- | | ||
| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | | ||
| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | | ||
| Max chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | | ||
| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | | ||
| DAG width (children per node) | 1024 | 174 | 1024 | 174 | **1024** | not specified | | ||
| `HAMTDirectory` fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | | ||
| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | | ||
| Leaves | raw | raw | raw | raw | raw | not specified | | ||
| Empty directories | Included | Included | Ignored | Included | Included | not specified | | ||
|
||
See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ | ||
|
||
### User benefit | ||
|
||
Profiles reduce the burden of verifying UnixFS content, as users can simply choose a profile and know that the resulting CIDs will be deterministic across implementations. This eliminates the need for users to understand the underlying parameters that affect CID generation, and allows them to focus on the content itself. | ||
|
||
Moreover, profiles allow users to verify content without needing to rely on additional merkle proofs and CAR files, which can be cumbersome and inefficient. | ||
|
||
Finally, profiles improve the developer experience by aligning with the mental model of a hash function. | ||
|
||
### Compatibility | ||
|
||
UnixFS Data encoded with the profiles defined in this IPIP is fully compatible with existing implementations, as it is fully compliant with the UnixFS specification. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cannot be compliant with details that are not specified as of today.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Contingent on #331 |
||
|
||
To produce CIDs that are compliant with this IPIP, implementations will need to support the parameters defined in the profiles. This may require changes to existing implementations to expose configuration options for the parameters, or to implement new functionality to support the profiles. | ||
|
||
Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. | ||
|
||
### Alternatives | ||
|
||
As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID. | ||
|
||
## Test fixtures | ||
|
||
TODO | ||
|
||
List relevant CIDs. Describe how implementations can use them to determine | ||
specification compliance. This section can be skipped if IPIP does not deal | ||
with the way IPFS handles content-addressed data, or the modified specification | ||
file already includes this information. | ||
|
||
### Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
Uh oh!
There was an error while loading. Please reload this page.