Skip to content

Commit 9141da7

Browse files
authored
Merge pull request #328 from ipfs/feat/gateway-json-cbor
IPIP-328: JSON and CBOR Response Formats on HTTP Gateways
2 parents a2ce974 + e61c242 commit 9141da7

File tree

2 files changed

+254
-33
lines changed

2 files changed

+254
-33
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# IPIP-328: JSON and CBOR Response Formats on HTTP Gateways
2+
3+
- Start Date: 2022-10-07
4+
- Related Issues:
5+
- [ipfs/in-web-browsers/issues/182]
6+
- [ipfs/specs/pull/328]
7+
- [ipfs/kubo/issues/8823]
8+
- [ipfs/kubo/pull/9335]
9+
- [ipfs/go-ipfs/issues/7552]
10+
11+
## Summary
12+
13+
Add support for the [DAG-JSON], [DAG-CBOR], JSON and CBOR response formats in
14+
the [HTTP Gateway](../http-gateways/).
15+
16+
## Motivation
17+
18+
Currently, the gateway supports requesting data in the [DAG-PB], RAW, [CAR] and
19+
TAR formats. In addition, it allows for traversing of links encoded through CBOR
20+
Tag 42, as long as they are intermediate links, and not the final document.
21+
It works on both DAG-CBOR, and its JSON representation, DAG-JSON. However, it
22+
should be possible to download deserialized versions of the final JSON/CBOR document
23+
in raw format (not wrapped in UnixFS).
24+
25+
The main functional gap in the IPFS ecosystem is the lack of support for
26+
non-UnixFS DAGs on HTTP gateways. Users are able to create custom DAGs based on
27+
traversable DAG-CBOR thanks to [CBOR tag 42 being reserved for CIDs][cbor-42]
28+
and DAG-JSON documents, but they are unable to load deserialized documents from
29+
a local gateway, which is severely decreasing the utility of non-UnixFS DAGs.
30+
31+
Adding JSON and CBOR response types will also benefit UnixFS. DAG-PB has a
32+
[logical format][dag-pb-format] which makes it possible to represent a DAG-PB
33+
directory as a [DAG-JSON] document. This means that, if we support DAG-JSON in
34+
the gateway, then we would support
35+
[JSON responses for directory listings][ipfs/go-ipfs/issues/7552], which has been
36+
requested by our users in the past.
37+
38+
In addition, this functionality is already present on the current Kubo CLI. By
39+
bringing it to the gateways, we provide users with more power when it comes
40+
to storing and fetching CBOR and JSON in IPFS.
41+
42+
## Detailed design
43+
44+
The solution is to allow the Gateway to support serializing data as [DAG-JSON],
45+
[DAG-CBOR], JSON and CBOR by requesting them using either the `Accept` HTTP header
46+
or the `format` URL query. In addition, if the resolved CID is of one of the
47+
aforementioned types, the gateway should be able to resolve them instead of
48+
failing with `node type unknown`.
49+
50+
## Test fixtures
51+
52+
- [`bafybeiegxwlgmoh2cny7qlolykdf7aq7g6dlommarldrbm7c4hbckhfcke`][f-dag-pb] is a
53+
DAG-PB directory.
54+
- [`bafkreidmwhhm6myajxlpu7kofe3aqwf4ezxxn46cp5fko7mb6x74g4k5nm`][f-dag-pb-json]
55+
is the aforementioned DAG-PB directory's [Logical DAG-JSON representation][dag-pb-format] that
56+
is expected to be returned when using `?format=dag-json`.
57+
58+
## Design rationale
59+
60+
The current gateway already supports different response formats via the
61+
`Accept` HTTP header and the `format` URL query. This IPIP proposes adding
62+
JSON and CBOR formats to that list.
63+
64+
In addition, the current gateway already supports traversing through DAG-CBOR
65+
and DAG-JSON links if they are intermediary documents. With this IPIP, we aim
66+
to be able to download the DAG-CBOR, DAG-JSON, JSON and CBOR documents
67+
themselves, with correct `Content-Type` headers.
68+
69+
### User benefit
70+
71+
The user benefits from this change as they will now be able to retrieve
72+
content encoded in the traversable DAG-JSON and DAG-CBOR formats. This is
73+
something that has been [requested before][ipfs/go-ipfs/issues/7552].
74+
75+
In addition, both UX and DX are significantly improved, since every UnixFS directory can
76+
now be inspected in a regular web browser via `?format=json`. This can remove the
77+
need for parsing HTML with directory listing.
78+
79+
### Compatibility
80+
81+
This IPIP adds new response types and does not modify existing ones,
82+
making it a backwards-compatible change.
83+
84+
### Security
85+
86+
Serializers and deserializers for the JSON and CBOR must follow the security
87+
considerations of the original specifications, found in:
88+
89+
- [RFC 8259 (JSON), Section 12][rfc8259-sec12]
90+
- [RFC 8949 (CBOR), Section 10][rfc8949-sec10]
91+
92+
DAG-JSON and DAG-CBOR follow the same security considerations as JSON and CBOR.
93+
Note that DAG-JSON and DAG-CBOR are stricter subsets of JSON and CBOR, respectively.
94+
Therefore they must follow their specification and error if the payload is not
95+
strict enough:
96+
97+
- [DAG-JSON Spec][dag-json-spec]
98+
- [DAG-CBOR Spec][dag-cbor-spec]
99+
100+
### Alternatives
101+
102+
#### Why four content types?
103+
104+
If we do not introduce DAG-JSON, DAG-CBOR, JSON and CBOR response formats in
105+
the gateway, the usage of IPFS is constricted to files and directories represented
106+
by UnixFS (DAG-PB) codec. Therefore, if a user wants to store JSON and/or CBOR
107+
in IPFS, they have to wrap it as a UnixFS file in order to be able to fetch it
108+
through the gateway. That adds size and processing overhead.
109+
110+
In addition, we could introduce only DAG-JSON and DAG-CBOR. However, not
111+
supporting the generic variants, JSON and CBOR, would lead to poor UX. The
112+
ability to retrieve DAG-JSON as `application/json` is an important step
113+
for the interoperability of the HTTP Gateway with web browsers and other tools
114+
that expect specific Content Types. Namely, `Content-Type: application/json` with
115+
`Content-Disposition: inline` allows for JSON preview to be rendered in a web browser
116+
and webdev tools.
117+
118+
#### Why JSON/CBOR pathing is limited to full blocks?
119+
120+
Finally, we considered supporting pathing within both DAG and non-DAG variants
121+
of the JSON and CBOR codecs. Pathing within these documents could lead to responses
122+
with extracts from the document. For example, if we have the document:
123+
124+
```json
125+
{
126+
"link" {
127+
"to": {
128+
"some": {
129+
"cid2": <cbor tag 42 pointing at different CID>
130+
}
131+
}
132+
}
133+
}
134+
```
135+
136+
With CID `bafy`, and we navigate to `/ipfs/bafy/link/to`, we would be able to
137+
retrieve an extract from the document.
138+
139+
```json
140+
{
141+
"some": {
142+
"cid2": <cbor tag 42 pointing at different CID>
143+
}
144+
}
145+
```
146+
147+
However, supporting this raises questions whose answers are not clearly defined
148+
or agreed upon yet. Right now, pathing is only supported over CID-based Links,
149+
such as Tag 42 in CBOR. In addition, some HTTP headers regarding caching are based
150+
on the CID, and adding extraction pathings would not be clear. Giving users the
151+
possibility to retrieve JSON, CBOR, DAG-JSON AND DAG-CBOR documents through the
152+
gateway is, in itself, a progress and will open the doors for new tools and explorations.
153+
154+
### Copyright
155+
156+
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
157+
158+
[cbor-42]: https://github.com/core-wg/yang-cbor/issues/13#issuecomment-524378859
159+
[DAG-PB]: https://ipld.io/docs/codecs/known/dag-pb/
160+
[dag-pb-format]: https://ipld.io/specs/codecs/dag-pb/spec/#logical-format
161+
[DAG-JSON]: https://ipld.io/docs/codecs/known/dag-json/
162+
[DAG-CBOR]: https://ipld.io/docs/codecs/known/dag-cbor/
163+
[CAR]: https://ipld.io/specs/transport/car/
164+
[ipfs/in-web-browsers/issues/182]: https://github.com/ipfs/in-web-browsers/issues/182
165+
[ipfs/specs/pull/328]: https://github.com/ipfs/specs/pull/328
166+
[ipfs/kubo/issues/8823]: https://github.com/ipfs/kubo/issues/8823
167+
[ipfs/kubo/pull/9335]: https://github.com/ipfs/kubo/pull/9335
168+
[ipfs/go-ipfs/issues/7552]: https://github.com/ipfs/go-ipfs/issues/7552
169+
[f-dag-pb]: https://dweb.link/ipfs/bafybeiegxwlgmoh2cny7qlolykdf7aq7g6dlommarldrbm7c4hbckhfcke
170+
[f-dag-pb-json]: https://dweb.link/ipfs/bafkreidmwhhm6myajxlpu7kofe3aqwf4ezxxn46cp5fko7mb6x74g4k5nm
171+
[rfc8259-sec12]: https://datatracker.ietf.org/doc/html/rfc8259#section-12
172+
[rfc8949-sec10]: https://datatracker.ietf.org/doc/html/rfc8949#section-10
173+
[dag-json-spec]: https://ipld.io/specs/codecs/dag-json/spec/
174+
[dag-cbor-spec]: https://ipld.io/specs/codecs/dag-cbor/spec/

http-gateways/PATH_GATEWAY.md

Lines changed: 80 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ where client prefers to perform all validation locally.
7979
- [Content resolution](#content-resolution)
8080
- [Finding the content root](#finding-the-content-root)
8181
- [Traversing remaining path](#traversing-remaining-path)
82+
- [Traversing through UnixFS](#traversing-through-unixfs)
83+
- [Traversing through DAG-JSON and DAG-CBOR](#traversing-through-dag-json-and-dag-cbor)
8284
- [Handling traversal errors](#handling-traversal-errors)
8385
- [Best practices for HTTP caching](#best-practices-for-http-caching)
8486
- [Denylists](#denylists)
@@ -182,10 +184,10 @@ For example:
182184
- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned
183185
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned
184186
- [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing)) – returns UnixFS tree (files and directories) as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. Returned tree starts at a root item which name is the same as the requested CID. Produces 400 Bad Request for content that is not UnixFS.
185-
<!-- TODO: https://github.com/ipfs/go-ipfs/issues/8823
186-
- application/vnd.ipld.dag-json OR application/json – requests IPLD Data Model representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/)
187-
- application/vnd.ipld.dag-cbor OR application/cbor - requests IPLD Data Model representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/)
188-
-->
187+
- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/). If the requested CID already has `dag-json` (0x0129) codec, data is validated as DAG-JSON before being returned as-is. Invalid DAG-JSON produces HTTP Error 500.
188+
- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/). If the requested CID already has `dag-cbor` (0x71) codec, data is validated as DAG-CBOR before being returned as-is. Invalid DAG-CBON produces HTTP Error 500.
189+
- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec already is `json` (0x0200). Then, the raw JSON block can be returned as-is without any conversion.
190+
- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec already is `cbor` (0x51). Then, the raw CBOR block can be returned as-is without any conversion.
189191

190192
### `Range` (request header)
191193

@@ -246,11 +248,14 @@ parameter, if present)
246248

247249
Optional, `format=<format>` can be used to request specific response format.
248250

249-
This is a URL-friendly alternative to sending
250-
`Accept: application/vnd.ipld.<format>` header, see [`Accept`](#accept-request-header)
251-
for more details.
252-
253-
In case of `Accept: application/x-tar`, the `?format=` equivalent is `tar`.
251+
This is a URL-friendly alternative to sending an [`Accept`](#accept-request-header) header. These are the equivalents:
252+
- `format=raw``Accept: application/vnd.ipld.raw`
253+
- `format=car``Accept: application/vnd.ipld.car`
254+
- `format=tar``Accept: application/x-tar`
255+
- `format=dag-json``Accept: application/vnd.ipld.dag-json`
256+
- `format=dag-cbor``Accept: application/vnd.ipld.dag-cbor`
257+
- `format=json``Accept: application/json`
258+
- `format=cbor``Accept: application/cbor`
254259

255260
<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
256261
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
@@ -584,24 +589,38 @@ A good practice is to always return it with HTTP error [status codes](#response-
584589

585590
## Response Payload
586591

587-
Data sent with HTTP response depends on the type of requested IPFS resource:
592+
Data sent with HTTP response depends on the type of the requested IPFS resource, and the requested response type.
593+
594+
By default, implicit deserialized response type is based on `Accept` header and the codec of the resolved CID:
588595

589-
- UnixFS (implicit default)
590-
- File
591-
- Bytes representing file contents
596+
- UnixFS, either `dag-pb` (0x70) or `raw` (0x55)
597+
- File or `raw` block
598+
- Bytes representing file/block contents
599+
- When `Range` is present, only the requested byte range is returned.
592600
- Directory
593601
- Generated HTML with directory index (see [additional notes here](#generated-html-with-directory-index))
594-
- When `index.html` is present, gateway can skip generating directory index and return it instead
595-
- Raw block
596-
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)
597-
- CAR
598-
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
599-
- TAR
600-
- Deserialized UnixFS files and directories as a TAR file or a stream, see [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing))
601-
<!-- TODO: https://github.com/ipfs/go-ipfs/issues/8823
602-
- dag-json / dag-cbor
603-
- See [https://github.com/ipfs/go-ipfs/issues/8823](https://github.com/ipfs/go-ipfs/issues/8823)
604-
-->
602+
- When `index.html` is present, gateway MUST skip generating directory index and return content from `index.html` instead.
603+
- JSON (0x0200)
604+
- Bytes representing a JSON file, see [application/json](https://www.iana.org/assignments/media-types/application/json).
605+
- Works exactly the same as `raw`, but returned `Content-Type` is `application/json`
606+
- CBOR (0x51)
607+
- Bytes representing a CBOR file, see [application/cbor](https://www.iana.org/assignments/media-types/application/cbor)
608+
- Works exactly the same as `raw`, but returned `Content-Type` is `application/cbor`
609+
- DAG-JSON (0x0129)
610+
- If the `Accept` header includes `text/html`, implementation should return a generated HTML with options to download DAG-JSON as-is, or converted to DAG-CBOR.
611+
- Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json)
612+
- DAG-CBOR (0x71)
613+
- If the `Accept` header includes `text/html`: implementation should return a generated HTML with options to download DAG-CBOR as-is, or converted to DAG-JSON.
614+
- Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor)
615+
616+
The following response types require an explicit opt-in, can only be requested with [`format`](#format-request-query-parameter) query parameter or [`Accept`](#accept-request-header) header:
617+
618+
- Raw Block (`?format=raw`)
619+
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw).
620+
- CAR (`?format=car`)
621+
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car).
622+
- TAR (`?format=tar`)
623+
- Deserialized UnixFS files and directories as a TAR file or a stream, see [IPIP-288](https://github.com/ipfs/specs/pull/288)
605624

606625
# Appendix: notes for implementers
607626

@@ -627,13 +646,32 @@ and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).
627646

628647
### Traversing remaining path
629648

630-
UnixFS pathing over files and directories is the implicit default used for
631-
resolving content paths that start with `/ipfs/` and `/ipns/`. It allows for
632-
traversal based on link names, which provides a better user experience than
633-
low level logical pathing from IPLD:
649+
After the content root CID is found, the remaining of the path should be traversed
650+
and resolved. Depending on the data type, that may occur through UnixFS pathing,
651+
or DAG-JSON, and DAG-CBOR pathing.
652+
653+
### Traversing through UnixFS
654+
655+
UnixFS is an abstraction over the low level [logical DAG-PB pathing][dag-pb-format]
656+
from IPLD, providing a better user experience:
634657

635658
- Example of UnixFS pathing: `/ipfs/cid/dir-name/file-name.txt`
636659

660+
For more details regarding DAG-PB pathing, please read the "Path Resolution" section
661+
of [this document](https://ipld.io/design/tricky-choices/dag-pb-forms-impl-and-use/#path-resolution).
662+
663+
### Traversing through DAG-JSON and DAG-CBOR
664+
665+
Traversing through [DAG-JSON][dag-json] and [DAG-CBOR][dag-cbor] is possible
666+
through fields that encode a link:
667+
668+
- DAG-JSON: link are represented as a base encoded CID under the `/` reserved
669+
namespace, see [specification](https://ipld.io/specs/codecs/dag-json/spec/#links).
670+
- DAG-CBOR: links are tagged with CBOR tag 42, indicating that they encode a CID,
671+
see [specification](https://ipld.io/specs/codecs/dag-cbor/spec/#links).
672+
673+
Note: pathing into [IPLD Kind](https://ipld.io/docs/data-model/kinds/) other than Link (CID) is not supported at the moment. Implementations should return HTTP 501 Not Implemented when fully resolved content path has any remainder left. This feature may be specified in a future [IPIP that introduces data onboarding](https://github.com/ipfs/in-web-browsers/issues/189) and [IPLD Patch](https://ipld.io/specs/patch/) semantics.
674+
637675
### Handling traversal errors
638676

639677
Gateway MUST respond with HTTP error when it is not possible to traverse the requested content path:
@@ -693,15 +731,24 @@ It should be always fast, even when a directory has 10k of items.
693731
The usual optimizations involve:
694732

695733
- Skipping size and type resolution for child UnixFS items, and using `Tsize`
696-
from [logical format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format)
697-
instead, allows gateway to respond much faster, as it no longer need to fetch
698-
root nodes of child items.
699-
- Additional information about child nodes can be fetched lazily
700-
with JS, but only for items in the browser's viewport.
734+
from [logical format][dag-pb-format] instead, allows gateway to respond much
735+
faster, as it no longer need to fetch root nodes of child items.
736+
- Instead of showing "file size" GUIs should show "IPFS DAG size". This
737+
remains useful for quick inspection, but does not require fetching child
738+
blocks, making directory listing fast, even with tens of thousands of
739+
blocks. Example with 10k items:
740+
`bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm`.
741+
- Additional information about child nodes, such as exact file size without
742+
DAG overhead, can be fetched lazily with JS, but only for items in the
743+
browser's viewport.
701744

702745
- Alternative approach is resolving child items, but providing pagination UI.
703746
- Opening a big directory can return HTTP 302 to the current URL with
704747
additional query parameters (`?page=0&limit=100`),
705748
limiting the cost of a single page load.
706749
- The downside of this approach is that it will always be slower than
707750
skipping child block resolution.
751+
752+
[dag-pb-format]: https://ipld.io/specs/codecs/dag-pb/spec/#logical-format
753+
[dag-json]: https://ipld.io/specs/codecs/dag-json/spec/
754+
[dag-cbor]: https://ipld.io/specs/codecs/dag-cbor/spec/

0 commit comments

Comments
 (0)