Skip to content

Commit 5d7b1bb

Browse files
committed
docs: link to official UnixFS specification
- add links to specs.ipfs.tech/unixfs throughout docs - replace inline protobuf with link to spec - update Go implementation link from kubo to boxo - add HAMT spec link to glossary - clarify raw leaves are recommended, not legacy The UnixFS spec is now published at specs.ipfs.tech/unixfs, replacing the old GitHub location. This commit updates all references and reduces duplication by linking to the authoritative specification instead of maintaining inline technical details.
1 parent 3a46e03 commit 5d7b1bb

File tree

3 files changed

+11
-50
lines changed

3 files changed

+11
-50
lines changed

docs/concepts/file-systems.md

Lines changed: 8 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,9 @@ await ipfs.files.rm('/my/beautiful/directory')
217217

218218
## Unix File System (UnixFS)
219219

220-
When you add a _file_ to IPFS, it might be too big to fit in a single block, so it needs metadata to link all its blocks together. UnixFS is a [protocol-buffers](https://developers.google.com/protocol-buffers/)-based format for describing files, directories, and symlinks in IPFS. This data format is used to represent files and all their links and metadata in IPFS. UnixFS creates a block (or a tree of blocks) of linked objects.
220+
When you add a _file_ to IPFS, it might be too big to fit in a single block, so it needs metadata to link all its blocks together. UnixFS is a [protocol-buffers](https://developers.google.com/protocol-buffers/)-based format for describing files, directories, and symlinks in IPFS. This data format is used to represent files and all their links and metadata in IPFS. UnixFS creates a block (or a tree of blocks) of linked objects. See the [UnixFS specification](https://specs.ipfs.tech/unixfs/) for the complete technical details.
221221

222-
UnixFS currently has [Javascript](https://github.com/ipfs/helia/tree/main/packages/unixfs) and [Go](https://github.com/ipfs/kubo/tree/b3faaad1310bcc32dc3dd24e1919e9edf51edba8/unixfs) implementations. These implementations have modules written in to run different functions:
222+
UnixFS currently has [Javascript](https://github.com/ipfs/helia/tree/main/packages/unixfs) and [Go](https://github.com/ipfs/boxo/tree/v0.34.0/ipld/unixfs) implementations. These implementations have modules written in to run different functions:
223223

224224
- **Data Formats**: manage the serialization/deserialization of UnixFS objects to protocol buffers
225225

@@ -229,50 +229,11 @@ UnixFS currently has [Javascript](https://github.com/ipfs/helia/tree/main/packag
229229

230230
### Data Formats
231231

232-
On UnixFS-v1 the data format is represented by this protobuf:
232+
UnixFS uses protocol buffers to define how files and directories are represented in IPFS. The data format includes fields for file types, sizes, permissions, and timestamps.
233233

234-
```
235-
message Data {
236-
enum DataType {
237-
Raw = 0;
238-
Directory = 1;
239-
File = 2;
240-
Metadata = 3;
241-
Symlink = 4;
242-
HAMTShard = 5;
243-
}
244-
245-
required DataType Type = 1;
246-
optional bytes Data = 2;
247-
optional uint64 filesize = 3;
248-
repeated uint64 blocksizes = 4;
249-
optional uint64 hashType = 5;
250-
optional uint64 fanout = 6;
251-
optional uint32 mode = 7;
252-
optional UnixTime mtime = 8;
253-
}
254-
255-
message Metadata {
256-
optional string MimeType = 1;
257-
}
258-
259-
message UnixTime {
260-
required int64 Seconds = 1;
261-
optional fixed32 FractionalNanoseconds = 2;
262-
}
263-
```
264-
265-
This `Data` object is used for all non-leaf nodes in UnixFS:
266-
267-
- For files that are comprised of more than a single block, the `Type` field will be set to `File`, the `filesize` field will be set to the total number of bytes in the files, and `blocksizes` will contain a list of the filesizes of each child node.
268-
269-
- For files comprised of a single block, the `Type` field will be set to `File`, `filesize` will be set to the total number of bytes in the file, and file data will be stored in the `Data` field.
270-
271-
UnixFS also supports two optional metadata format fields:
272-
273-
- `mode` - used for persisting the file permissions in [numeric notation](https://en.wikipedia.org/wiki/File_system_permissions#Numeric_notation). If unspecified, this field defaults to `0755` for directories/HAMT shards and `0644` for all the other types where applicable.
274-
275-
- `mtime` - is a two-element structure (`Seconds`, `FractionalNanoseconds`) representing the modification time in seconds relative to the Unix epoch `1970-01-01T00:00:00Z`.
234+
::: tip Want to see the complete specification?
235+
For the full protobuf definitions, field descriptions, and technical details about how UnixFS nodes are structured, visit the [official UnixFS specification](https://specs.ipfs.tech/unixfs/#dag-pb-node).
236+
:::
276237

277238
### Importer
278239

@@ -286,7 +247,7 @@ The leaf format takes two format options, UnixFS leaves and raw leaves:
286247

287248
- The UnixFS leaves format adds a data wrapper on newly added objects to produce UnixFS leaves with additional data sizes. This wrapper is used to determine whether newly added objects are files or directories. This format is the default for CIDv0.
288249

289-
- The raw leaves format on IPFS where nodes output from chunking will be raw data from the file with a CID codec of 'raw'. This is mainly configured for backward compatibility with formats that used a UnixFS Data object. This format is the default for CIDv1 created with `ipfs add --cid-version 1`, soon to become the global default.
250+
- The raw leaves format on IPFS where nodes output from chunking will be raw data from the file with a CID codec of 'raw' (0x55). This format provides canonical CIDs for single-block files and is recommended over dag-pb wrapped blocks. This format is the default for CIDv1 created with `ipfs add --cid-version 1`.
290251

291252
The chunking strategy is used to determine the size options available during the chunking process. The strategy currently has two different options, 'fixed size' and 'rabin'.
292253

@@ -315,5 +276,5 @@ You can find additional resources to familiarize with these file systems at:
315276
- [Protoschool MFS tutorial](https://proto.school/mutable-file-system)
316277
- [Understanding how the InterPlanetary File System deals with Files](https://github.com/ipfs/camp/tree/master/CORE_AND_ELECTIVE_COURSES/CORE_COURSE_A), from IPFS Camp 2019
317278
- [Jeromy Coffee Talks - Files API](https://www.youtube.com/watch?v=FX_AXNDsZ9k)
318-
- [UnixFS Specification](https://github.com/ipfs/specs/blob/master/UNIXFS.md)
279+
- [UnixFS Specification](https://specs.ipfs.tech/unixfs/)
319280
- [ResNetLab on Tour - Mutable Content](https://research.protocol.ai/tutorials/resnetlab-on-tour/mutable-content/)

docs/concepts/glossary.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ Graphsync is a legacy content replication protocol, similar to [Bitswap](#bitswa
230230

231231
### HAMT-sharding
232232

233-
The sharding technique used for [sharding](#sharding) big UnixFS directories. It leverages properties of hash array mapped tries (HAMT). [More about HAMT](https://en.wikipedia.org/wiki/Hash_array_mapped_trie).
233+
The sharding technique used for [sharding](#sharding) big UnixFS directories. It leverages properties of hash array mapped tries (HAMT). [UnixFS HAMT specification](https://specs.ipfs.tech/unixfs/#dag-pb-hamtdirectory) | [More about HAMT](https://en.wikipedia.org/wiki/Hash_array_mapped_trie).
234234

235235
### Hash
236236

@@ -512,7 +512,7 @@ In [IPLD](#ipld), the act of walking across the [Data Model](#data-model). [More
512512

513513
### UnixFS
514514

515-
The Unix File System (UnixFS) is the data format used to represent files and all their links and metadata in IPFS. It is loosely based on how files work in Unix. Adding a file to IPFS creates a block, or a _tree_ of blocks, in the UnixFS format and protects it from being garbage-collected. [More about UnixFS](file-systems.md#unix-file-system-unixfs)
515+
The Unix File System (UnixFS) is the data format used to represent files and all their links and metadata in IPFS. It is loosely based on how files work in Unix. Adding a file to IPFS creates a block, or a _tree_ of blocks, in the UnixFS format and protects it from being garbage-collected. [UnixFS specification](https://specs.ipfs.tech/unixfs/) | [More about UnixFS](file-systems.md#unix-file-system-unixfs)
516516

517517
### Urlstore
518518

docs/concepts/lifecycle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ description: Learn about the lifecycle of data in IPFS.
1414

1515
The first stage in the lifecycle of data in IPFS is to address it by CID. This is a local operation that takes arbitrary data and encodes it so it can be addressed by a CID. This is also known as _merkleizing_ the data, because the input data is transformed into a [Merkle DAG](./merkle-dag.md).
1616

17-
The exact process depends on the type of data. For files and directories, this is done by constructing a [UnixFS](./file-systems.md#unix-file-system-unixfs) [Merkle DAG](./merkle-dag.md). For other data types, such as dag-cbor, this is done by encoding the data with [dag-cbor](https://ipld.io/docs/codecs/known/dag-cbor/) which is hashed to produce a CID.
17+
The exact process depends on the type of data. For files and directories, this is done by constructing a [UnixFS](./file-systems.md#unix-file-system-unixfs) [Merkle DAG](./merkle-dag.md) ([specification](https://specs.ipfs.tech/unixfs/)). For other data types, such as dag-cbor, this is done by encoding the data with [dag-cbor](https://ipld.io/docs/codecs/known/dag-cbor/) which is hashed to produce a CID.
1818

1919
For example, merkleizing a static web application into a UnixFS DAG looks like this, where the whole application is addressed by the CID in the top block (`bafy...jomu`):
2020

0 commit comments

Comments
 (0)