IPIP 0499: CID Profiles #499

mishmosh · 2025-04-03T14:03:02Z

Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID.

This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. They can be used to verify data across implementations, provide recommended settings depending on retrieval performance goals, and more.

src/ipips/ipip-0499.md

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

Co-authored-by: Bumblefudge <[email protected]>

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

lidel · 2025-04-15T22:37:05Z

Thank you for kicking this off, and filling initial state.

I've incorporated specific "dag width" settings for File, Directory and HAMTDirectory nodes,
and updated the table to reflect state from ipfs/kubo#10774
and profiles that exist in Kubo master branch: legacy-cid-v0, test-cid-v1 and test-cid-v1-wide:

https://github.com/ipfs/kubo/blob/master/config/profile.go#L268-L307

agree what "cid-2025" profile should look like
- this will be new default in "Kubo v1.0"
- we have test-cid-v1 and test-cid-v1-wide in Kubo as potential candidates
switch to PR from local branch (so we have build preview)
figure out how to render the information (currently the table is not supported by https://github.com/ipfs/spec-generator)

src/ipips/ipip-0499.md

Co-authored-by: Christian Paul <[email protected]>

src/ipips/ipip-0499.md

hsanjuan · 2025-08-18T13:20:53Z

src/ipips/ipip-0499.md

+1. UnixFS DAG layout (e.g. balanced, trickle)
+1. UnixFS DAG width (max number of links per `File` node)
+1. `HAMTDirectory` fanout (must be a power of 2)
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links


If this number is dynamic based on the lengths of the actual link entries in the dag, we will need to specify what algorithm that estimation follows. I would put such things in a special "ipfs legacy" profile to be honest, along with cidv0, non-raw leaves etc. We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things.

So, each layout would have its own set of layout-params:

balanced:

max-links: N

trickle:

max-leaves-per-level: N

We probably should heavily discourage coming up with profiles that do weird things, like dynamically setting params or not using raw-leaves for things.

Yeah, that's exactly what we're doing by defining this profile.

wait is kubo dynamically assigning HAMT Directory threshold, currently? i was assuming this was a static number!

The current spec mentions fanout but not threshold, so i'm a little confused what current implementations are doing and if it's even worth fitting into the profile system or just giving up and letting a significant portion of HAMT-shared legacy data just but unprofiled/not-recreatable using the profiles...

@lidel Is this written down in any of the specs? Or is it just in the code at this point?

@lidel @hsanjuan Trying to understand/resolve this thread. Can you confirm if this is current kubo behavior?

HAMTDirectory threshold (max Directory size before switching to HAMTDirectory): based on an estimate of the block size by counting the size of PNNode.Links

AFAK decision when to do HAMTDirectory is an implementation-specific behavior. So far the rule of thumb is to keep blocks under 1-2MiB and usually good idea to match chunk size defined (default or defined by user).

Implementation-wise both GO (Boxo/Kubo) and JS (Helia) have size-based heuristic that makes decision when to switch from normal Directory to HAMTDirectory:

Kubo: Import.UnixFSHAMTDirectorySizeThreshold (with size threshold of 256KiB, as that is still the default chunk size in Kubo, user can override globally)

Helia (ipfs-unixfs-importer ): shardSplitThresholdBytes (same, 256KiB by default, user can override per operation)

iirc (from 2 year old memory, something to check/confirm) is that the size estimation details may/are likely different between GO and JS. They both estimate the serialized DAGNode size by calculating the aggregate byte length of directory entries (link names + CIDs), though the JavaScript implementation appears to include additional metadata in its calculation:

Kubo's size estimation method is likely estimatedSize = sum(len(link.Name) + len(link.Cid.Bytes()) for each link)

Helia is likely "the size of the final DAGNode (including link names, sizes, optional metadata fields etc)"

If true, the slight differences in calculation methods might result in directories sharding at marginally different sizes.

If you want to be exact you have to take into account any non-zero value fields in the serialized root UnixFS metadata since these affect the block size.

It's quite possible that Kubo will produce a HAMT block that's too big with a certain combination of directory entry names if someone has also changed the encoded directory's default mtime or whatever, probably because the "should-I-shard" feature pre-dates Kubo's ability to add UnixFSv1.5 metadata to things.

Really there's no need to estimate anything - it's trivial to count the actual bytes that a block will take up and then shard if necessary.

hsanjuan · 2025-08-18T13:23:01Z

src/ipips/ipip-0499.md

+1. Whether empty directories are included in the DAG
+  - Some implementations apply filtering before merkleizing filesystem entries in the DAG.


This is weird, because then we need to consider empty files, hidden files, unreadable files, symlinks and symlink follows, so probably need to mention all those as part of the profile too?

This is motivated by Git's default behaviour which ignores empty directories.

But we can mention here the rest.

wait, do @hsanjuan , do you mean mentioning whether empty files, hidden files, etc affect the decision of whether a directory is empty, or do you mean that each of those files might be divergently handled by different implementations and should be a variable in the profile? I would much rather behavior for all of those file types be a UnixFS concern and specified in UnixFS spec, modulo any historic variations worth including in a profile...

Do we have existing implementations that support filtering differently on all of these? Because unless we do, I would really rather not specify all possible variants. And I agree with @bumblefudge: let's have two behaviours if possible, and punt to the UnixFS spec for how to describe them.

Yeah, these choices are being made today and it'd be nice to be explicit about them. e.g. default Helia leaves all of these options default (false): https://github.com/ipfs/helia/blob/027bd3549da9ef5a6f07eaac346942cf24f3fc24/packages/unixfs/src/utils/glob-source.ts#L12-L42

But in filecoin-pin currently I've opted to include hidden files: https://github.com/filecoin-project/filecoin-pin/blob/9ab3f8c110ce0b6c6bf21c1fcdbcf84ade557953/src/core/unixfs/car-builder.ts#L30-L32 (I'm rethinking that choice now, but I'd like to know Kubo's defaults as well).

I'd prefer to align to a standard profile for file filtering so we collectively have "one standard default behaviour", but I understand it's a bit more work to explicate all of that. So maybe it can be a hand-wave for now and tightened up later because you could argue it's external to a unixfs spec and more about the choice of what to feed into a unixfsification process.

Feels like mixing abstractions, no? To me filtering of input is an userland software decision, out of scope. I don't think Ext4, NTFS or APFS specs guide implementers if hidden files are included when copying directories. We could mention it in "Appendix: Notes for Implementers" to flag potential discrepancy if filtering is involved, and that implementation should provide user with ability to disable/adjust default filters.

Right, so we would include a directory if it's there, even if empty. I was never sure if there is a good reason for git's approach but always seemed exotic.

filtering of input is an userland software decision

And yet it still impacts the resulting root CID, so I think it's in scope here, no? How about we make it a "SHOULD" advisory.

For consistent output implementations should by default not apply file and directory filtering and include empty directories, but may opt to allow user-driven decisions to filter out entities such as hidden files, dot files, and empty directories.

src/ipips/ipip-0499.md

Co-authored-by: Hector Sanjuan <[email protected]>

Co-authored-by: Rod Vagg <[email protected]>

darobin · 2025-10-31T15:25:23Z

src/ipips/ipip-0499.md

+1. UnixFS DAG layout (e.g. balanced, trickle etc...)
+1. UnixFS DAG width (max number of links per `File` node)
+1. `HAMTDirectory` bitwidth, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves).
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links


Suggested change

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.

Bit odd to discourage, when both most popular implementations in GO and JS use size-based heurstic - #499 (comment)

Unsure how to handle this. Perhaps clarify the heuristic is implementation-specific, and when deterministic behavior is expected, a specific heuristic should be used?

I don't think we should be estimating the block size as it's trivial to calculate it exactly. Can we not just define this (and punt to the spec for the details) to make it less hand-wavey?

Suggested change

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on the final size of the serialized form of the [PBNode protobuf message](https://specs.ipfs.tech/unixfs/#dag-pb-node) that represents the directory.

rvagg · 2025-11-12T12:39:32Z

Hey, I'd love to be able to reference this, even if it's in "draft" form, could we just merge it and continue to iterate on top of it to get it right?

Fixed outdated references, consistent profile names, streamlined Summary and Motivation sections.

github-actions · 2025-11-15T01:24:01Z

🚀 Build Preview on IPFS ready

🔎 Commit: 70514b9
🔏 CID bafybeieklx2odund2xybiw2c34edusicnnigox6ho6svlpb2y6plrprauu
📦 Preview:

mishmosh · 2025-11-15T01:35:10Z

I made a few changes/fixes, aiming to land this early next week.

Added links to UnixFS spec (now that it exists)
Specified calendar versioning for profile names (line 64), per @b5 suggestion
- @lidel I gave the 3 kubo profiles names that matched the naming scheme. This would mean minor updates to kubo, but is probably better for future-proofing. Acceptable? Also happy to discuss live.
Changed the "current defaults" section into a series of legacy profile names, that implementations MAY support. This allows those profile sets to be referenced/used across implementations.
We were using fanout and bitwidth interchangeably. I changed them all to fanout, in keeping with the UnixFS terminology. If we prefer bitwidth, I can PR that to UnixFS spec and then also here.
Streamlined lots of duplicate language from Summary and Motivation sections

Open questions:

How to handle Test fixtures section (line 120)? (Not-blocking, IMO)
Thread on empty directory filtering (blocking)
Thread on threshold size (blocking)

src/ipips/ipip-0499.md

lidel · 2025-11-20T15:04:03Z

src/ipips/ipip-0499.md

+
+As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.
+
+## Test fixtures


Just noting this is (imo) a blocker.

We did not merge UnixFS spec until we had sensible set of fixtures that people could use as reference.

The spec may be incomplete, but a fixture will let people reverse-engineer any details, and then PR improvement to spec.

Without fixtures for each UnixFS node type, we risk unknown unknown silently impacting final CID (e.g. because we did not know that someone may decide to place leaves one level sooner as "optimization" and someone else always at bottom, as "formal consistency")

Tracking this in ipfs/kubo#11071

Thanks!

I will implement kubo-* profiles as part of 0.40 and test fixtures will be part of that work.

Then we will be able to link to them form spec, like we did in https://specs.ipfs.tech/unixfs/#appendix-test-vectors

Co-authored-by: Rod Vagg <[email protected]>

mishmosh · 2025-11-20T15:40:42Z

Just synced with @lidel. He wants to ship this with test fixtures in place, (tracked in kubo/issues/11071). In the meantime, we don't anticipate changes to the profiles themselves so you can can reference this PR.

Co-authored-by: Rod Vagg <[email protected]>

icidasset · 2025-11-21T14:56:43Z

Great work, glad to see this!

Couple notes/questions:

The profiles (legacy + new) don't say if the chunks are of a fixed size, or which algorithm they use.
Small typo under "Compatibility": "support the the set of" (double the)
Would it also be interesting to note if an implementation respects symlinks and if so, how the different kinds of symlinks are translated?

Create ipip-0000.md

8842176

mishmosh requested a review from a team as a code owner April 3, 2025 14:03

Update and rename ipip-0000.md to ipip-0499.md

4ba68f0

mishmosh changed the title ~~Create ipip-0000.md: CID profiles~~ IPIP 0499: CID Profiles Apr 3, 2025

2color reviewed Apr 3, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color mentioned this pull request Apr 3, 2025

Inconsistent CID Calculation with Example: Addressing a file by CID with UnixFS ipfs/helia#765

Closed

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel reviewed Apr 11, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel mentioned this pull request Apr 11, 2025

feat(config): ipfs add and Import options for controling UnixFS DAG Width ipfs/kubo#10774

Merged

lidel added a commit to ipfs/kubo that referenced this pull request Apr 15, 2025

refactor: test-cid-v1-wide with UnixFSHAMTDirectoryMaxFanout=1024

b08bc4d

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

lidel and others added 2 commits April 15, 2025 23:41

add extra attributes proposed in review

6cc64cb

Co-authored-by: Bumblefudge <[email protected]>

incorporate kubo#10774

d8b8389

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

This was referenced Apr 18, 2025

Initial UnixFS specification #331

Merged

Protocol stewardship and improvements — IPFS/2025 ipshipyard/roadmaps#16

Open

Merge branch 'main' into patch-1

600d1fc

BrewTestBot mentioned this pull request May 21, 2025

ipfs 0.35.0 Homebrew/homebrew-core#224309

Merged

2color reviewed May 23, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color reviewed May 29, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

jaller94 reviewed Jun 30, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

This comment was marked as off-topic.

Sign in to view

SethDocherty mentioned this pull request Jul 1, 2025

Difference in CID Generation between IPFS and Singularity data-preservation-programs/singularity#525

Open

2color and others added 6 commits August 12, 2025 09:21

Update src/ipips/ipip-0499.md

595588c

Co-authored-by: Christian Paul <[email protected]>

add daniel as editor

41f9b86

edit summary and motivation

229988f

edit summary

f37e610

edit parameters and design

7a12f0a

edit user benefit and compatibility

ff69e56

2color requested review from aschmahmann, hsanjuan, lidel and rvagg August 12, 2025 10:29

hsanjuan reviewed Aug 18, 2025

View reviewed changes

2color mentioned this pull request Aug 18, 2025

IPFS hash feature use non-specified algorithm which is not widely compatible in the ecosystem argotorg/solidity#14389

Open

rvagg reviewed Aug 20, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Aug 20, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Aug 20, 2025

View reviewed changes

src/ipips/ipip-0499.md Show resolved Hide resolved

2color and others added 5 commits August 20, 2025 10:31

Apply suggestions from code review

cffade8

Co-authored-by: Hector Sanjuan <[email protected]>

edit based on hector's feedback

0402c84

Apply suggestions from code review

ec07e30

Co-authored-by: Rod Vagg <[email protected]>

add multibase encoding

f454912

address feedback from rvagg

9c621ba

darobin reviewed Oct 31, 2025

View reviewed changes

Update ipip-0499.md

c109c1a

Fixed outdated references, consistent profile names, streamlined Summary and Motivation sections.

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Show resolved Hide resolved

lidel reviewed Nov 20, 2025

View reviewed changes

Update src/ipips/ipip-0499.md

383f9e3

Co-authored-by: Rod Vagg <[email protected]>

mishmosh mentioned this pull request Nov 20, 2025

Implement modern CID profile from IPIP-499 ipfs/kubo#11071

Open

4 tasks

Update src/ipips/ipip-0499.md

e564968

Co-authored-by: Rod Vagg <[email protected]>

Update src/ipips/ipip-0499.md

bbd547f

Co-authored-by: Rod Vagg <[email protected]>

fix typo (the the)

70514b9

		1. Whether empty directories are included in the DAG
		- Some implementations apply filtering before merkleizing filesystem entries in the DAG.

	1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
	1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.


		As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.

		## Test fixtures

IPIP 0499: CID Profiles #499

Are you sure you want to change the base?

IPIP 0499: CID Profiles #499

Conversation

mishmosh commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidel commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2color Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidel Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achingbrain Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rvagg commented Nov 12, 2025

mishmosh commented Apr 3, 2025 •

edited

Loading

lidel Nov 13, 2025 •

edited

Loading

2color Aug 20, 2025 •

edited

Loading

lidel Nov 13, 2025 •

edited

Loading

lidel Nov 13, 2025 •

edited

Loading

achingbrain Nov 13, 2025 •

edited

Loading

github-actions bot commented Nov 15, 2025 •

edited

Loading