From 88421763386acdd660a064ea728a75d088030655 Mon Sep 17 00:00:00 2001 From: Mosh <1306020+mishmosh@users.noreply.github.com> Date: Thu, 3 Apr 2025 10:02:35 -0400 Subject: [PATCH 01/16] Create ipip-0000.md --- src/ipips/ipip-0000.md | 102 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 src/ipips/ipip-0000.md diff --git a/src/ipips/ipip-0000.md b/src/ipips/ipip-0000.md new file mode 100644 index 00000000..68fb29b8 --- /dev/null +++ b/src/ipips/ipip-0000.md @@ -0,0 +1,102 @@ +--- +# IPIP number should match its pull request number. After you open a PR, +# please update title and update the filename to `ipip0000`. +title: "IPIP-0000: CID Profiles" +date: 2025-04-03 +ipip: proposal +editors: + - name: Michelle Lee +relatedIssues: + - n/a +order: 0000 +tags: ['ipips'] +--- + +## Summary + + +This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. + +## Motivation + +Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. + +## Detailed design + +We introduce a profile naming system, + +Each profile must specify the following characteristics: + +1. CID version (CIDv0 or CIDv1) +2. Hash algorithm +3. Chunk size +4. DAG width +5. DAG layout +6. Required + +Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. + +| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | +|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| +| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | +| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | +| DAG layout | balanced | balanced | balanced | balanced | not specified | + + + +This would be specified as a table in (forthcoming UnixFS spec). + + + +## Design rationale + +The profile names are chosen to be easy to pronounce. + +Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel: + +| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | +|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| +| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | +| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | +| DAG layout | balanced | balanced | balanced | balanced | not specified | + +* Kubo has 2 different default DAG widths: + * For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256. + * For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174 + +See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ + +### User benefit + +Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations. + +### Compatibility + +Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. + +Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability. + +### Security + +TODO + +### Alternatives + +Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. + +## Test fixtures + +TODO + +List relevant CIDs. Describe how implementations can use them to determine +specification compliance. This section can be skipped if IPIP does not deal +with the way IPFS handles content-addressed data, or the modified specification +file already includes this information. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 4ba68f030e067ea3acaba5514e5d97ba87d535f5 Mon Sep 17 00:00:00 2001 From: Mosh <1306020+mishmosh@users.noreply.github.com> Date: Thu, 3 Apr 2025 10:03:29 -0400 Subject: [PATCH 02/16] Update and rename ipip-0000.md to ipip-0499.md --- src/ipips/{ipip-0000.md => ipip-0499.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename src/ipips/{ipip-0000.md => ipip-0499.md} (99%) diff --git a/src/ipips/ipip-0000.md b/src/ipips/ipip-0499.md similarity index 99% rename from src/ipips/ipip-0000.md rename to src/ipips/ipip-0499.md index 68fb29b8..d1947e2d 100644 --- a/src/ipips/ipip-0000.md +++ b/src/ipips/ipip-0499.md @@ -1,7 +1,7 @@ --- # IPIP number should match its pull request number. After you open a PR, # please update title and update the filename to `ipip0000`. -title: "IPIP-0000: CID Profiles" +title: "IPIP-0499: CID Profiles" date: 2025-04-03 ipip: proposal editors: From 6cc64cb765aaab872793b2bd3b49c7f02c8f14b2 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 15 Apr 2025 23:41:17 +0200 Subject: [PATCH 03/16] add extra attributes proposed in review Co-authored-by: Bumblefudge --- src/ipips/ipip-0499.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index d1947e2d..7f75d728 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -27,11 +27,15 @@ We introduce a profile naming system, Each profile must specify the following characteristics: -1. CID version (CIDv0 or CIDv1) +1. CID version (currently only CIDv0 or CIDv1) 2. Hash algorithm -3. Chunk size -4. DAG width -5. DAG layout +3. UnixFS Chunk size (explicitly set, not contextual/reactive to input) +4. UnixFS directory DAG width +5. UnixFS directory DAG layout +6. HAMT directory DAG threshold +7. HAMT directory DAG width +8. Leaf Envelope (historically dag-pb, now none/raw) +9. Allow empty directories 6. Required Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. @@ -43,7 +47,10 @@ Additional profiles can be added at a future date. Profile names may be chosen f | Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | | DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | | DAG layout | balanced | balanced | balanced | balanced | not specified | - +| HAMT threshold | 256KiB (est) | 256KiB (est) | 1000 **links** | 256KiB | not specified | +| HAMT width | 256 blocks | 256 blocks | 256 blocks | 256 blocks | not specified | +| Leaves | raw | raw | raw | raw | not specified | +| EmptyDirs | allowed | allowed | disallowed | allowed | not specified | This would be specified as a table in (forthcoming UnixFS spec). From d8b83891fdef2e104278a05c085faf8c568b258f Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 16 Apr 2025 00:29:09 +0200 Subject: [PATCH 04/16] incorporate kubo#10774 Import.* config params for controlling DAG width were added in: https://github.com/ipfs/kubo/pull/10774 --- src/ipips/ipip-0499.md | 82 +++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 42 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 7f75d728..648a48ed 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -6,16 +6,19 @@ date: 2025-04-03 ipip: proposal editors: - name: Michelle Lee + github: mishmosh + affiliation: + name: IPFS Foundation relatedIssues: - - n/a -order: 0000 + - https://discuss.ipfs.tech/t/should-we-profile-cids/18507 +order: 0499 tags: ['ipips'] --- ## Summary -This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. +This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. ## Motivation @@ -23,57 +26,43 @@ Currently, CIDs can be generated with a variety of settings and optimizations fo ## Detailed design -We introduce a profile naming system, +We introduce a profile naming system, Each profile must specify the following characteristics: 1. CID version (currently only CIDv0 or CIDv1) -2. Hash algorithm -3. UnixFS Chunk size (explicitly set, not contextual/reactive to input) -4. UnixFS directory DAG width -5. UnixFS directory DAG layout -6. HAMT directory DAG threshold -7. HAMT directory DAG width -8. Leaf Envelope (historically dag-pb, now none/raw) -9. Allow empty directories -6. Required +1. Hash algorithm +1. UnixFS Chunk algorithm (e.g. size-based or content-based) +1. UnixFS directory DAG layout (e.g. balanced, trickle) +1. UnixFS file DAG width (max number of links per `File` node) +1. UnixFS directory DAG width (max number of links per basic `Directory` node) +1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`) +1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node) +1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves) +1. Empty directories (informative suggestion) Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. -| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | -|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| -| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | -| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | -| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | -| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | -| DAG layout | balanced | balanced | balanced | balanced | not specified | -| HAMT threshold | 256KiB (est) | 256KiB (est) | 1000 **links** | 256KiB | not specified | -| HAMT width | 256 blocks | 256 blocks | 256 blocks | 256 blocks | not specified | -| Leaves | raw | raw | raw | raw | not specified | -| EmptyDirs | allowed | allowed | disallowed | allowed | not specified | - - This would be specified as a table in (forthcoming UnixFS spec). - - ## Design rationale -The profile names are chosen to be easy to pronounce. - -Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel: +The profile names are chosen to be easy to pronounce. -| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL | -|-------------|---------------|-----------------------------|------------------|-----------------------|---------------| -| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | -| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | -| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified | -| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified | -| DAG layout | balanced | balanced | balanced | balanced | not specified | +Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel: -* Kubo has 2 different default DAG widths: - * For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256. - * For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174 +| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | +|---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------| +| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | +| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | +| Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified | +| Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? | +| Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | +| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | +| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | +| Leaves | raw | raw | raw | raw | raw | not specified | +| Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified | See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ @@ -85,7 +74,7 @@ Reliable, deterministic CIDs allow independent verification of content across to Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. -Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability. +Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. ### Security @@ -95,6 +84,15 @@ TODO Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. + +#### Empty directories + +Decision if empty directories should be included is left out of scope. + +Tools can apply arbitrary filtering before passing filesystem entries +to be converted into a DAG, thus for 1:1 CID reproducibility one should +run without any prefilters, or ensure the same prefilters are applied. + ## Test fixtures TODO From 595588c8d4dd47bba835950a212d32769a3ec28e Mon Sep 17 00:00:00 2001 From: Daniel Norman <1992255+2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 09:21:54 +0200 Subject: [PATCH 05/16] Update src/ipips/ipip-0499.md Co-authored-by: Christian Paul --- src/ipips/ipip-0499.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 648a48ed..362ca3b7 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -68,7 +68,7 @@ See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/185 ### User benefit -Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations. +Reliable, deterministic CIDs allow independent verification of content across tools and implementations. ### Compatibility From 41f9b86982d10abd32da9cf7e5fc820054011d3f Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 10:06:47 +0200 Subject: [PATCH 06/16] add daniel as editor --- src/ipips/ipip-0499.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 362ca3b7..09146efe 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -1,7 +1,5 @@ --- -# IPIP number should match its pull request number. After you open a PR, -# please update title and update the filename to `ipip0000`. -title: "IPIP-0499: CID Profiles" +title: 'IPIP-0499: CID Profiles' date: 2025-04-03 ipip: proposal editors: @@ -9,6 +7,11 @@ editors: github: mishmosh affiliation: name: IPFS Foundation + - name: Daniel Norman + github: 2color + affiliation: + name: Shipyard + url: https://ipshipyard.com relatedIssues: - https://discuss.ipfs.tech/t/should-we-profile-cids/18507 order: 0499 From 229988f67d088a03850fd229c438efd8c6bb1044 Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 12:16:41 +0200 Subject: [PATCH 07/16] edit summary and motivation --- src/ipips/ipip-0499.md | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 09146efe..a203b6e2 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -20,12 +20,27 @@ tags: ['ipips'] ## Summary - -This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. +This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the same content will yield the same CID across different implementations. + +Profiles explicitly define the following UnixFS parameters: CID version, hash algorithm, chunk size, DAG width, layout, and other parameters that affect the resulting CID. + +This allows for deterministic UnixFS CIDs. ## Motivation -Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. +UnixFS CIDs are not deterministic. This means that the same file tree can yield different CIDs depending on the parameters used by the implementation to generate it, which in some cases, aren't even configurable by the user. For example, the chunk size, DAG width, and layout can vary between implementations or even between different versions of the same implementation. + +This lack of determinism makes has a number of drawbacks: + +- It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs. +- Users need to include the UnixFS merkle proofs in order to verify the CID, adding storage overhead and complexity to the verification process. +- In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs + +By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and the determinism of CIDs, where the same content will yield the same CID across different implementations. + + same content will yield the same CID across different implementations, making it easier to verify content and improving the developer experience. + +UnixFS CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file tree can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. ## Detailed design From f37e6107f672e2f427598a29f6764e39316425bd Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 12:17:00 +0200 Subject: [PATCH 08/16] edit summary --- src/ipips/ipip-0499.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index a203b6e2..3fcb54fd 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -20,11 +20,9 @@ tags: ['ipips'] ## Summary -This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the same content will yield the same CID across different implementations. +This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation. -Profiles explicitly define the following UnixFS parameters: CID version, hash algorithm, chunk size, DAG width, layout, and other parameters that affect the resulting CID. - -This allows for deterministic UnixFS CIDs. +Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithem, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs. ## Motivation From 7a12f0a936054e68a52ebc18d4d16f9308197e60 Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 12:17:28 +0200 Subject: [PATCH 09/16] edit parameters and design --- src/ipips/ipip-0499.md | 58 ++++++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 3fcb54fd..a9775877 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -31,33 +31,30 @@ UnixFS CIDs are not deterministic. This means that the same file tree can yield This lack of determinism makes has a number of drawbacks: - It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs. -- Users need to include the UnixFS merkle proofs in order to verify the CID, adding storage overhead and complexity to the verification process. +- Users are requires to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. - In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs -By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and the determinism of CIDs, where the same content will yield the same CID across different implementations. - - same content will yield the same CID across different implementations, making it easier to verify content and improving the developer experience. +By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. UnixFS CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file tree can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. ## Detailed design -We introduce a profile naming system, +We introduce a set of named profiles that define a set of parameters for generating UnixFS CIDs. These profiles can be used by implementations to ensure that the same content will yield the same CID across different tools and implementations. + +### UnixFS parameters -Each profile must specify the following characteristics: +The profiles define a set of parameters that affect the resulting CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include: 1. CID version (currently only CIDv0 or CIDv1) -1. Hash algorithm -1. UnixFS Chunk algorithm (e.g. size-based or content-based) -1. UnixFS directory DAG layout (e.g. balanced, trickle) -1. UnixFS file DAG width (max number of links per `File` node) -1. UnixFS directory DAG width (max number of links per basic `Directory` node) -1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`) -1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node) -1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves) -1. Empty directories (informative suggestion) - -Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves. +1. Hash function +1. UnixFS chunk size +1. UnixFS DAG layout (e.g. balanced, trickle) +1. UnixFS DAG width (max number of links per `File` node) +1. `HAMTDirectory` fanout (must be a power of 2) +2. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links +3. Leaf Envelope: either `dag-pb` or `raw` +4. Whether empty directories are included in the DAG This would be specified as a table in (forthcoming UnixFS spec). @@ -65,20 +62,19 @@ This would be specified as a table in (forthcoming UnixFS spec). The profile names are chosen to be easy to pronounce. -Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel: - -| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | -|---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------| -| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | -| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | -| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | -| Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified | -| Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? | -| Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | -| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | -| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | -| Leaves | raw | raw | raw | raw | raw | not specified | -| Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified | +Here is a summary table of current (2025-Q2) defaults: + +| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL | +| ----------------------------- | ------------- | ------------------------------ | ---------------- | ------------------ | ----------------------- | ------------- | +| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | +| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | +| Max chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified | +| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified | +| DAG width (children per node) | 1024 | 174 | 1024 | 174 | **1024** | not specified | +| `HAMTDirectory` fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified | +| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified | +| Leaves | raw | raw | raw | raw | raw | not specified | +| Empty directories | Included | Included | Ignored | Included | Included | not specified | See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/ From ff69e563c1a9e6bb2b781d08d7a3b09168318aae Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 12:18:08 +0200 Subject: [PATCH 10/16] edit user benefit and compatibility --- src/ipips/ipip-0499.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index a9775877..4335ea63 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -80,23 +80,25 @@ See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/185 ### User benefit -Reliable, deterministic CIDs allow independent verification of content across tools and implementations. +Profiles reduce the burden of verifying UnixFS content, as users can simply choose a profile and know that the resulting CIDs will be deterministic across implementations. This eliminates the need for users to understand the underlying parameters that affect CID generation, and allows them to focus on the content itself. + +Moreover, profiles allow users to verify content without needing to rely on additional merkle proofs and CAR files, which can be cumbersome and inefficient. + +Finally, profiles improve the developer experience by aligning with the mental model of a hash function. ### Compatibility -Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles. +UnixFS Data encoded with the profiles defined in this IPIP is fully compatible with existing implementations, as it is fully compliant with the UnixFS specification. -Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. +To produce CIDs that are compliant with this IPIP, implementations will need to support the parameters defined in the profiles. This may require changes to existing implementations to expose configuration options for the parameters, or to implement new functionality to support the profiles. -### Security +Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. -TODO ### Alternatives Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. - #### Empty directories Decision if empty directories should be included is left out of scope. From 09baf68c7a5bc76f2a69a3f326c7f1d54ec578dd Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Tue, 12 Aug 2025 12:26:21 +0200 Subject: [PATCH 11/16] refine parameters and introduce a named profile --- src/ipips/ipip-0499.md | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 4335ea63..3fc66959 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -52,15 +52,34 @@ The profiles define a set of parameters that affect the resulting CID. These par 1. UnixFS DAG layout (e.g. balanced, trickle) 1. UnixFS DAG width (max number of links per `File` node) 1. `HAMTDirectory` fanout (must be a power of 2) -2. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links -3. Leaf Envelope: either `dag-pb` or `raw` -4. Whether empty directories are included in the DAG +1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links +1. Leaf Envelope: either `dag-pb` or `raw` +1. Whether empty directories are included in the DAG + - Some implementations apply filtering before merkleizing filesystem entries in the DAG. -This would be specified as a table in (forthcoming UnixFS spec). +This would be specified as a table in (forthcoming [UnixFS spec](https://github.com/ipfs/specs/pull/331/files)). -## Design rationale +## Named profiles -The profile names are chosen to be easy to pronounce. +To make it easier for users and implementations to choose a set of parameters, we define a named profile `unixfs-2025` to encapsulate the parameters established as the baseline default by multiple implementations as of 2025. + +The **`unixfs-2025`** profile name is designed to be referenced by implementations and users to ensure that the same content will yield the same CID across different tools and implementations. + +The profile is defined as follows: + +| Parameter | Value | +| ----------------------------- | ------------------------------------------------------- | +| CID version | CIDv1 | +| Hash function | sha2-256 | +| Max chunk size | 1MiB | +| DAG layout | balanced | +| DAG width (children per node) | 1024 | +| `HAMTDirectory` fanout | 256 blocks | +| `HAMTDirectory` threshold | 256KiB (estimated by counting the size of PBNode.links) | +| Leaves | raw | +| Empty directories | TODO | + +## Current defaults Here is a summary table of current (2025-Q2) defaults: @@ -94,18 +113,9 @@ To produce CIDs that are compliant with this IPIP, implementations will need to Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width. - ### Alternatives -Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted. - -#### Empty directories - -Decision if empty directories should be included is left out of scope. - -Tools can apply arbitrary filtering before passing filesystem entries -to be converted into a DAG, thus for 1:1 CID reproducibility one should -run without any prefilters, or ensure the same prefilters are applied. +As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle proofs needed to verify the CID. ## Test fixtures From cffade84d0945c2fbd06be95aa9d5f2c1d5cd8d3 Mon Sep 17 00:00:00 2001 From: Daniel Norman <1992255+2color@users.noreply.github.com> Date: Wed, 20 Aug 2025 10:31:52 +0200 Subject: [PATCH 12/16] Apply suggestions from code review Co-authored-by: Hector Sanjuan --- src/ipips/ipip-0499.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 3fc66959..53d49883 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -22,7 +22,7 @@ tags: ['ipips'] This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation. -Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithem, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs. +Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithm, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs. ## Motivation @@ -31,7 +31,7 @@ UnixFS CIDs are not deterministic. This means that the same file tree can yield This lack of determinism makes has a number of drawbacks: - It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs. -- Users are requires to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. +- Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. - In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. @@ -49,7 +49,7 @@ The profiles define a set of parameters that affect the resulting CID. These par 1. CID version (currently only CIDv0 or CIDv1) 1. Hash function 1. UnixFS chunk size -1. UnixFS DAG layout (e.g. balanced, trickle) +1. UnixFS DAG layout (e.g. balanced, trickle etc...) 1. UnixFS DAG width (max number of links per `File` node) 1. `HAMTDirectory` fanout (must be a power of 2) 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links @@ -115,7 +115,7 @@ Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob ### Alternatives -As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle proofs needed to verify the CID. +As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID. ## Test fixtures From 0402c840d713f95c2565fef6cb1074e96fd2487b Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Wed, 20 Aug 2025 10:55:56 +0200 Subject: [PATCH 13/16] edit based on hector's feedback --- src/ipips/ipip-0499.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 53d49883..238e4796 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -34,9 +34,7 @@ This lack of determinism makes has a number of drawbacks: - Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. - In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs -By introducing profiles, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. - -UnixFS CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file tree can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation. +By introducing profiles which define the parameters that affect the root CID of the DAG, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. ## Detailed design From ec07e30d5bef63a654feac04292d450eaa1a4fef Mon Sep 17 00:00:00 2001 From: Daniel Norman <1992255+2color@users.noreply.github.com> Date: Wed, 20 Aug 2025 11:11:19 +0200 Subject: [PATCH 14/16] Apply suggestions from code review Co-authored-by: Rod Vagg --- src/ipips/ipip-0499.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 238e4796..c83ed6fb 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -46,7 +46,8 @@ The profiles define a set of parameters that affect the resulting CID. These par 1. CID version (currently only CIDv0 or CIDv1) 1. Hash function -1. UnixFS chunk size +1. UnixFS file chunking algorithm +1. UnixFS file chunk size or target (if required by the chunking algorithm) 1. UnixFS DAG layout (e.g. balanced, trickle etc...) 1. UnixFS DAG width (max number of links per `File` node) 1. `HAMTDirectory` fanout (must be a power of 2) From f454912150e6d478ca8144d8ebb495e414da0851 Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Wed, 20 Aug 2025 11:31:15 +0200 Subject: [PATCH 15/16] add multibase encoding --- src/ipips/ipip-0499.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index c83ed6fb..2058dc75 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -42,10 +42,11 @@ We introduce a set of named profiles that define a set of parameters for generat ### UnixFS parameters -The profiles define a set of parameters that affect the resulting CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include: +The profiles define a set of parameters that affect the resulting string encoding of the CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include: -1. CID version (currently only CIDv0 or CIDv1) -1. Hash function +1. CID version, e.g. CIDv0 or CIDv1 +1. Multibase encoding for the CID, e.g. base32 +1. Hash function used for all nodes in the DAG, e.g. sha2-256 1. UnixFS file chunking algorithm 1. UnixFS file chunk size or target (if required by the chunking algorithm) 1. UnixFS DAG layout (e.g. balanced, trickle etc...) @@ -53,8 +54,7 @@ The profiles define a set of parameters that affect the resulting CID. These par 1. `HAMTDirectory` fanout (must be a power of 2) 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links 1. Leaf Envelope: either `dag-pb` or `raw` -1. Whether empty directories are included in the DAG - - Some implementations apply filtering before merkleizing filesystem entries in the DAG. +1. Whether empty directories are included in the DAG. Some implementations apply filtering before merkleizing filesystem entries in the DAG. This would be specified as a table in (forthcoming [UnixFS spec](https://github.com/ipfs/specs/pull/331/files)). From 9c621ba7d7f5f80bad090656074bc6a430f28901 Mon Sep 17 00:00:00 2001 From: Daniel N <2color@users.noreply.github.com> Date: Wed, 20 Aug 2025 12:29:05 +0200 Subject: [PATCH 16/16] address feedback from rvagg --- src/ipips/ipip-0499.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/ipips/ipip-0499.md b/src/ipips/ipip-0499.md index 2058dc75..d3cc2df1 100644 --- a/src/ipips/ipip-0499.md +++ b/src/ipips/ipip-0499.md @@ -51,10 +51,12 @@ The profiles define a set of parameters that affect the resulting string encodin 1. UnixFS file chunk size or target (if required by the chunking algorithm) 1. UnixFS DAG layout (e.g. balanced, trickle etc...) 1. UnixFS DAG width (max number of links per `File` node) -1. `HAMTDirectory` fanout (must be a power of 2) +1. `HAMTDirectory` bitwidth, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves). 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links 1. Leaf Envelope: either `dag-pb` or `raw` 1. Whether empty directories are included in the DAG. Some implementations apply filtering before merkleizing filesystem entries in the DAG. +1. Directory wrapping for single files: in order to retain the name of a single file, some implementations have the option to wrap the file in a `Directory` with link to the file. +2. Presence and accurate setting of `Tsize`. This would be specified as a table in (forthcoming [UnixFS spec](https://github.com/ipfs/specs/pull/331/files)).