Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions src/ipips/ipip-0499.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
# IPIP number should match its pull request number. After you open a PR,
# please update title and update the filename to `ipip0000`.
title: "IPIP-0499: CID Profiles"
date: 2025-04-03
ipip: proposal
editors:
- name: Michelle Lee
relatedIssues:
- n/a
order: 0000
tags: ['ipips']
---

## Summary

<!--One paragraph explanation of the IPIP.-->
This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters.

## Motivation

Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation.

## Detailed design

We introduce a profile naming system,

Each profile must specify the following characteristics:

1. CID version (CIDv0 or CIDv1)
2. Hash algorithm
3. Chunk size
4. DAG width
5. DAG layout
6. Required

Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves.

| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL |
|-------------|---------------|-----------------------------|------------------|-----------------------|---------------|
| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified |
| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified |
| DAG layout | balanced | balanced | balanced | balanced | not specified |


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of other things to consider?

  • Directory wrapping at the top level (for just files, kubo has an option to wrap in a directory so you get file metadata)
  • Presence and accurate setting of Tsize - at one point we were going to deprecate this field for some cases, although I think all our encoders now do it properly, you could just mandate this in the spec though -- all valid profiles must properly encode Tsize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this as a parameter.

According to the latest version of https://github.com/ipfs/specs/pull/331/files, the calculation is done as follows:

To compute the Tsize of a child DAG, sum the length of the dag-pb outside message binary length and the blocksizes of all nodes in the child DAG.

If calculated according to this, does it make accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds about right, I remember there being some nuance in exactly what's included in the size calculation, making it not super stable if you get it slightly wrong (as we did for some variants in go-unixfsnode for a while)


This would be specified as a table in (forthcoming UnixFS spec).



## Design rationale

The profile names are chosen to be easy to pronounce.

Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel:

| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL |
|-------------|---------------|-----------------------------|------------------|-----------------------|---------------|
| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified |
| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified |
| DAG layout | balanced | balanced | balanced | balanced | not specified |

* Kubo has 2 different default DAG widths:
* For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256.
* For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174

See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/

### User benefit

Reliable, deterministic CIDs allow independent verification of content across tools and ipmlementations.

### Compatibility

Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles.

Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability.

### Security

TODO

### Alternatives

Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted.

## Test fixtures

TODO

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Loading