ipfs · mishmosh · Apr 3, 2025 · Apr 3, 2025 · Apr 15, 2025 · Apr 15, 2025
@@ -0,0 +1,134 @@
+---
+title: 'IPIP-0499: CID Profiles'
+date: 2025-11-14
+ipip: proposal
+editors:
+  - name: Michelle Lee
+    github: mishmosh
+    affiliation:
+      name: IPFS Foundation
+      url: https://ipfsfoundation.org
+  - name: Daniel Norman
+    github: 2color
+    affiliation:
+      name: Independent
+      url: https://norman.life
+relatedIssues:
+  - https://discuss.ipfs.tech/t/should-we-profile-cids/18507
+order: 0499
+tags: ['ipips']
+---
+
+## Summary
+
+This proposal introduces **configuration profiles** for CIDs that represent files and directories using [UnixFS](https://specs.ipfs.tech/unixfs/). 
+
+## Motivation
+
+UnixFS CIDs are currently non-deterministic. The same file or directory can produce different CIDs across implementations, because parameters like chunk size, DAG width, and layout vary between implementations. Often, these parameters are not even configurable by users.
+
+This creates three problems:
+
+- **Verification difficulty:** The same content produces different CIDs across tools, making content verification unreliable.
+- **Additional overhead:** Users must store and transfer UnixFS merkle proofs to verify CIDs, adding storage overhead, network bandwidth, and complexity.
+- **Broken expectations:** Unlike standard hash functions where identical input produces identical output, UnixFS CIDs behave unpredictably.
+
+Configuration profiles solve this by explicitly defining all parameters that affect CID generation. This preserves UnixFS flexibility (users can still choose parameters) while enabling deterministic results.
+
+## Detailed design
+
+We introduce a set of **named configuration profiles**, each specifying the complete set of parameters for generating UnixFS CIDs. When implementations use these profiles, they guarantee that the same input, processed with the same profile, will yield the same CID across different tools and implementations.
+
+### UnixFS parameters
+
+Here is the complete set of UnixFS parameters that affect the resulting string encoding of the CID:
+
+1. CID version, e.g. CIDv0 or CIDv1
+1. Multibase encoding for the CID, e.g. base32
+1. Hash function used for all nodes in the DAG, e.g. sha2-256
+1. UnixFS file chunking algorithm
+1. UnixFS file chunk size or target (if required by the chunking algorithm)
+1. UnixFS DAG layout (e.g. balanced, trickle etc...)
+1. UnixFS DAG width (max number of links per `File` node)
+1. `HAMTDirectory` fanout, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves).
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
-1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.
-1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on the final size of the serialized form of the [PBNode protobuf message](https://specs.ipfs.tech/unixfs/#dag-pb-node) that represents the directory.
-1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.
-1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on the final size of the serialized form of the [PBNode protobuf message](https://specs.ipfs.tech/unixfs/#dag-pb-node) that represents the directory.
+1. Leaf Envelope: either `dag-pb` or `raw`
+1. Whether empty directories are included in the DAG. Some implementations may apply filtering.
+1. Whether hidden entities (including dot files) are included in the DAG. Some implementations may apply filtering.
+1. Directory wrapping for single files: in order to retain the name of a single file, some implementations have the option to wrap the file in a `Directory` with link to the file.
+1. Presence and accurate setting of `Tsize`.
+
+The handling of symlinks and symlink follows is defined by the [UnixFS](https://specs.ipfs.tech/unixfs/) spec.
+
+## CID profiles
+
+To enable consistent CID generation, we define a series of named profiles that specify complete UnixFS parameter sets. Profile names may have any prefix, but must end in `YYYY-MM`.
+
+The initial profile in the series, **`unixfs-2025`**, captures the baseline default parameters used by multiple implementations as of November 2025.
+
+| Parameter                     | `unixfs-2025`                                           |
+| ----------------------------- | ------------------------------------------------------- |
+| CID version                   | CIDv1                                                   |
+| Hash function                 | sha2-256                                                |
+| Max chunk size                | 1MiB                                                    |
+| DAG layout                    | balanced                                                |
+| DAG width (children per node) | 1024                                                    |
+| `HAMTDirectory` fanout        | 256 blocks                                              |
+| `HAMTDirectory` threshold     | 256KiB (estimated by counting the size of PBNode.links) |
+| Leaves                        | raw                                                     |
+| Empty directories             | TODO                                                    |
+| Hidden entities               | TODO                                                    |
+
+## Legacy profiles
+
+We also define a series of **legacy profiles**, used by various implementations as of November 2025:
+
+|                               | `kubo-legacy-2015` (kubo default) | `helia-2025` | `storacha-2025`    | `kubo-2025`        | `kubo-wide-2025`        | `dasl-2025`   |
+| ----------------------------- | ------------------------------ | --------------- | ------------------ | ------------------ | ----------------------- | ------------- |
+| CID version                   | CIDv0                          | CIDv1           | CIDv1              | CIDv1              | CIDv1                   | CIDv1         |
+| Hash function                 | sha2-256                       | sha2-256        | sha2-256           | sha2-256           | sha2-256                | sha2-256      |
+| Max chunk size                | 256KiB                         | 1MiB            | 1MiB               | 1MiB               | 1MiB                    | not specified |
+| DAG layout                    | balanced                       | balanced        | balanced           | balanced           | balanced                | not specified |
+| DAG width (children per node) | 174                            | 1024            | 1024               | 174                | **1024**                | not specified |
+| `HAMTDirectory` fanout        | 256 blocks                     | 256 blocks      | 256 blocks         | 256 blocks         | **1024**                | not specified |
+| `HAMTDirectory` threshold     | 256KiB (est:links[name+cid])   | 256KiB (est)    | 1000 **links**     | 256KiB             | **1MiB**                | not specified |
+| Leaves                        | raw                            | raw             | raw                | raw                | raw                     | not specified |
+| Empty directories             | Included                       | Included        | Ignored            | Included           | Included                | not specified |
+
+See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/
+
+### User benefit
+
+Profiles provide 3 key advantages for working with content-addressed data:
+
+1. **Predictable, deterministic behavior:** Profiles restore the expected property of content addressing: identical input data always produces identical CIDs, regardless of which implementation generates them.
+
+2. **Lightweight verification:** Users can verify content without needing to rely on additional merkle proofs or CAR files.
+
+3. **Simplified workflow:** Users can select a profile and automatically get consistent CIDs across all implementations, without needing to configure or understand the underlying parameters.
+
+### Compatibility
+
+UnixFS data encoded with the CID profiles defined in this IPIP remains fully compatible with existing implementations, since it conforms to the [https://specs.ipfs.tech/unixfs/](specification).
+
+To generate CIDs in compliance with this IPIP, implementations must support the parameters defined in the profiles and support the set of named profiles. They MAY also support legacy profiles.
+
+* Adding new functionality to support parameters and/or profiles
+* Exposing configuration options for profiles
+
+### Alternatives
+
+As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.
+
+## Test fixtures
+
+TODO
+
+List relevant CIDs. Describe how implementations can use them to determine
+specification compliance. This section can be skipped if IPIP does not deal
+with the way IPFS handles content-addressed data, or the modified specification
+file already includes this information.
+
+### Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).