@@ -135,6 +135,27 @@ message PBNode {
135135}
136136```
137137
138+ ::: note
139+
140+ The ` PBNode ` definition above lists ` Links ` (field 2) before ` Data ` (field 1).
141+ This field order is stricter than the intuitive protobuf convention of
142+ serializing fields by field number.
143+
144+ Decoders MUST accept both field orderings, as existing IPFS data contains
145+ blocks encoded in either order.
146+
147+ Encoders that want to be compliant with the ` unixfs-v0-2015 ` and
148+ ` unixfs-v1-2025 ` profiles from
149+ [ IPIP-499] ( https://specs.ipfs.tech/ipips/ipip-0499/ ) SHOULD produce ` Links `
150+ before ` Data ` , matching the [ ` dag-pb ` ] [ ipld-dag-pb ] wire encoding order used
151+ by those profiles. A future IPIP introducing new profiles MAY adopt a
152+ different field order.
153+
154+ See the "Protobuf Strictness" section of the [ ` dag-pb ` spec] [ ipld-dag-pb ]
155+ for the full set of encoding constraints.
156+
157+ :::
158+
138159After decoding the node, we obtain a ` PBNode ` . This ` PBNode ` contains a field
139160` Data ` that contains the bytes that require the second decoding. This will also be
140161a protobuf message specified in the UnixFSV1 format:
@@ -180,6 +201,23 @@ it is implied that the `PBNode.Data` field is protobuf-encoded.
180201A ` dag-pb ` UnixFS node supports different types, which are defined in
181202` decode(PBNode.Data).Type ` . Every type is handled differently.
182203
204+ ::: warning
205+
206+ ** Streaming parser consideration:** In the [ ` dag-pb ` ] [ ipld-dag-pb ] encoding
207+ order required by [ IPIP-499] ( https://specs.ipfs.tech/ipips/ipip-0499/ )
208+ profiles, all ` PBNode.Links ` entries are serialized before ` PBNode.Data ` .
209+ Since ` DataType ` (which determines how to interpret the node and its links) is
210+ encoded inside ` PBNode.Data ` , a streaming or incremental protobuf parser cannot
211+ determine the node type until after all links have been read.
212+
213+ This affects implementations that attempt to interpret links during parsing:
214+ In particular, a streaming parser cannot determine whether link ` Name ` fields
215+ carry [ HAMT hex-prefixed bucket indices] ( #hamt-structure-and-parameters ) or
216+ plain [ directory entry names] ( #dag-pb-directory ) without first buffering all
217+ links.
218+
219+ :::
220+
183221### ` dag-pb ` ` File `
184222
185223A : dfn [ File] is a container over an arbitrary sized amount of bytes. Files are either
@@ -851,6 +889,7 @@ Test vectors for UnixFS directory structures, progressing from simple flat direc
851889 ```
852890 - Purpose: Directory listing, link sorting, deduplication (ascii.txt and ascii-copy.txt share same CID)
853891 - Validation: Links sorted lexicographically by Name, each has valid Tsize
892+ - Wire order: `Links`(x4) then `Data` ([`dag-pb`][ipld-dag-pb] field order per [IPIP-499](https://specs.ipfs.tech/ipips/ipip-0499/) profiles)
854893
855894### Nested Directories
856895
@@ -956,6 +995,7 @@ Test vectors for UnixFS directory structures, progressing from simple flat direc
956995 - Fanout field = 256
957996 - Link Names in HAMT have 2-character hex prefix (hash buckets)
958997 - Can retrieve any file by name through hash bucket calculation
998+ - Wire order: `Links`(x252) then `Data` ([`dag-pb`][ipld-dag-pb] field order per [IPIP-499](https://specs.ipfs.tech/ipips/ipip-0499/) profiles)
959999
9601000## Special Cases and Advanced Features
9611001
@@ -1186,6 +1226,36 @@ Below section explains some of historical decisions. This is not part of specifi
11861226and is provided here only for extra context.
11871227:::
11881228
1229+ ## ` PBNode ` Field Order: Legacy Constraint and Compatibility Guidance
1230+
1231+ The [ ` dag-pb ` ] [ ipld-dag-pb ] encoding order required by
1232+ [ IPIP-499] ( https://specs.ipfs.tech/ipips/ipip-0499/ ) profiles (` unixfs-v0-2015 `
1233+ and ` unixfs-v1-2025 ` ) serializes ` PBNode.Links ` (field 2) before ` PBNode.Data `
1234+ (field 1). This is stricter than the intuitive protobuf convention of encoding
1235+ fields by field number.
1236+
1237+ This ordering is a historical artifact: early protobuf serializers (notably
1238+ the original JavaScript implementation) wrote fields in source declaration
1239+ order rather than field number order. The original ` .proto ` definition listed
1240+ ` Links ` before ` Data ` (while assigning them field numbers 2 and 1
1241+ respectively). Once blocks with this byte ordering were written to the IPFS
1242+ network, the encoding became permanent: changing it would produce different
1243+ CIDs for the same logical content. The [ ` dag-pb ` specification] [ ipld-dag-pb ]
1244+ codified this field order for existing profiles.
1245+
1246+ Following the [ Robustness Principle] ( https://specs.ipfs.tech/architecture/principles/#robustness ) ,
1247+ implementations writing backward and forward compatible software should be
1248+ conservative in what they produce (use the field order expected by the target
1249+ profile) and liberal in what they accept (decode blocks regardless of field
1250+ order). A future IPIP introducing new profiles may adopt a different field
1251+ order convention.
1252+
1253+ A practical consequence of the current ` Links ` -before-` Data ` order is that
1254+ streaming protobuf parsers encounter all link entries before ` PBNode.Data ` .
1255+ For UnixFS, this means the node type (` DataType ` ) and associated metadata
1256+ (e.g., HAMT ` fanout ` and ` hashType ` ) are not available until after all links
1257+ have been parsed. See the [ ` dag-pb ` Types] ( #dag-pb-types ) section for details.
1258+
11891259## Design Considerations: Extra Metadata
11901260
11911261Metadata support in UnixFSv1.5 has been expanded to increase the number of possible
@@ -1305,4 +1375,4 @@ the fractional part is represented as a 4-byte `fixed32`,
13051375[ multicodec ] : https://github.com/multiformats/multicodec
13061376[ multihash ] : https://github.com/multiformats/multihash
13071377[ Bitswap ] : https://specs.ipfs.tech/bitswap-protocol/
1308- [ ipld-dag-pb ] : https://ipld.io/specs/codecs/dag-pb/spec/
1378+ [ ipld-dag-pb ] : https://web.archive.org/web/20260305020653/https:// ipld.io/specs/codecs/dag-pb/spec/
0 commit comments