-
Notifications
You must be signed in to change notification settings - Fork 236
IPIP 0499: CID Profiles #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
8842176
4ba68f0
6cc64cb
d8b8389
600d1fc
595588c
41f9b86
229988f
f37e610
7a12f0a
ff69e56
09baf68
cffade8
0402c84
ec07e30
f454912
9c621ba
c109c1a
383f9e3
e564968
bbd547f
70514b9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| # IPIP number should match its pull request number. After you open a PR, | ||
| # please update title and update the filename to `ipip0000`. | ||
| title: "IPIP-0499: CID Profiles" | ||
| date: 2025-04-03 | ||
| ipip: proposal | ||
| editors: | ||
| - name: Michelle Lee | ||
| github: mishmosh | ||
| affiliation: | ||
| name: IPFS Foundation | ||
| relatedIssues: | ||
| - https://discuss.ipfs.tech/t/should-we-profile-cids/18507 | ||
| order: 0499 | ||
| tags: ['ipips'] | ||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| <!--One paragraph explanation of the IPIP.--> | ||
| This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| We introduce a profile naming system, | ||
|
|
||
| Each profile must specify the following characteristics: | ||
|
|
||
| 1. CID version (currently only CIDv0 or CIDv1) | ||
| 1. Hash algorithm | ||
| 1. UnixFS Chunk algorithm (e.g. size-based or content-based) | ||
| 1. UnixFS directory DAG layout (e.g. balanced, trickle) | ||
| 1. UnixFS file DAG width (max number of links per `File` node) | ||
| 1. UnixFS directory DAG width (max number of links per basic `Directory` node) | ||
| 1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`) | ||
| 1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node) | ||
| 1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves) | ||
| 1. Empty directories (informative suggestion) | ||
|
|
||
| Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. | ||
|
|
||
| This would be specified as a table in (forthcoming UnixFS spec). | ||
|
|
||
| ## Design rationale | ||
|
|
||
| The profile names are chosen to be easy to pronounce. | ||
|
|
||
| Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel: | ||
|
|
||
| | | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | | ||
| |---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------| | ||
| | CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | | ||
| | Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | | ||
| | Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | | ||
| | Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified | | ||
| | Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? | | ||
| | Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | | ||
| | `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | | ||
| | DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | | ||
| | Leaves | raw | raw | raw | raw | raw | not specified | | ||
| | Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified | | ||
2color marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ | ||
|
|
||
| ### User benefit | ||
|
|
||
| Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations. | ||
2color marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Compatibility | ||
|
|
||
| Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. | ||
|
|
||
| Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. | ||
|
|
||
| ### Security | ||
|
|
||
| TODO | ||
|
|
||
| ### Alternatives | ||
|
|
||
| Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. | ||
|
|
||
|
|
||
| #### Empty directories | ||
|
|
||
| Decision if empty directories should be included is left out of scope. | ||
2color marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Tools can apply arbitrary filtering before passing filesystem entries | ||
| to be converted into a DAG, thus for 1:1 CID reproducibility one should | ||
| run without any prefilters, or ensure the same prefilters are applied. | ||
|
|
||
| ## Test fixtures | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just noting this is (imo) a blocker. We did not merge UnixFS spec until we had sensible set of fixtures that people could use as reference. The spec may be incomplete, but a fixture will let people reverse-engineer any details, and then PR improvement to spec. Without fixtures for each UnixFS node type, we risk unknown unknown silently impacting final CID (e.g. because we did not know that someone may decide to place leaves one level sooner as "optimization" and someone else always at bottom, as "formal consistency")
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tracking this in ipfs/kubo#11071
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks!
|
||
|
|
||
| TODO | ||
|
|
||
| List relevant CIDs. Describe how implementations can use them to determine | ||
| specification compliance. This section can be skipped if IPIP does not deal | ||
| with the way IPFS handles content-addressed data, or the modified specification | ||
| file already includes this information. | ||
|
|
||
| ### Copyright | ||
|
|
||
| Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
Uh oh!
There was an error while loading. Please reload this page.