-
Notifications
You must be signed in to change notification settings - Fork 236
IPIP-332: Streaming Error Handling on Web Gateways #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
8a175f6
a84fb63
4e73544
8ccb87d
9480caa
a20a665
e7f6b54
3e3b0fe
0fcdd02
4dd1293
c0c6af7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # IPIP 0000: Streaming Error Handling in HTTP Gateways | ||
|
|
||
| - Start Date: 2022-10-12 | ||
| - Related Issues: | ||
| - [ipfs/kubo/pull/9333](https://github.com/ipfs/kubo/pull/9333) | ||
| - [mdn/browser-compat-data/issues/14703](https://github.com/mdn/browser-compat-data/issues/14703) | ||
|
|
||
| ## Summary | ||
|
|
||
| Ensure streaming error handling in web gateways is clear and consistent. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Web gateways provide different functionalities where users can download files. | ||
| The download of this files is streamed from the server to the client using HTTP. | ||
| However, there is no good way of presenting to the client an error that happens | ||
| during the stream. | ||
|
|
||
| For example, if during the download of a TAR file, the server detects some error | ||
| and is not able to continue, the user can get a valid, yet incomplete TAR. However, | ||
| the user will not know that the TAR is incomplete. By introducing consistent error | ||
| handling, the server attempts to notify the user. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| If the server encounters an error before streaming the contents to the client, | ||
| the server must fail with the respective `4xx` or `5xx` HTTP status code (no change). | ||
|
|
||
| If the server encounters an error while streaming the contents, the server must | ||
| force-close the HTTP stream to the user. This way, the user will receive a | ||
| network error, making it clear that the downloaded file is not valid. | ||
|
|
||
| ## Test fixtures | ||
|
|
||
| There are no relevant test fixures for this IPIP. | ||
|
|
||
| ## Design rationale | ||
|
|
||
| Before starting to stream the body of the response, the server is able to set | ||
| an HTTP status code for the error. However, after the HTTP headers are set | ||
| and the body started being streamed, there are no clear ways in the HTTP | ||
| specification to show an error. Since the gateway is browser-first, it is | ||
| important to show an error and avoid users receiving an incomplete file. | ||
| Therefore, the server can force-close the HTTP stream, leading to a network | ||
| error. This tells the user that an error happened. | ||
|
|
||
| ### User benefit | ||
|
|
||
| The user will know that an error happened while receiving the file. Otherwise, | ||
| the user might receive incomplete, but still valid, files that could be mistaken | ||
| but the real file. | ||
|
|
||
| ### Compatibility | ||
|
|
||
| This RFC is backwards compatible. | ||
|
|
||
| ### Alternatives | ||
|
|
||
| Using [`Trailer`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Trailer) HTTP headers | ||
| was considered. However, trailer headers are [not supported in browsers](https://github.com/mdn/browser-compat-data/issues/14703). | ||
| In addition, even if trailer headers were supported in browsers, there is no clear | ||
| standard for which header would be used to indicate errors. | ||
|
|
||
| ### Copyright | ||
|
|
||
| Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -1,6 +1,6 @@ | ||||||||
| # Path Gateway Specification | ||||||||
|
|
||||||||
|  | ||||||||
|  | ||||||||
|
|
||||||||
| **Authors**: | ||||||||
|
|
||||||||
|
|
@@ -83,6 +83,7 @@ where client prefers to perform all validation locally. | |||||||
| - [Best practices for HTTP caching](#best-practices-for-http-caching) | ||||||||
| - [Denylists](#denylists) | ||||||||
| - [Generated HTML with directory index](#generated-html-with-directory-index) | ||||||||
| - [Streaming errors](#streaming-errors) | ||||||||
|
|
||||||||
| # HTTP API | ||||||||
|
|
||||||||
|
|
@@ -194,7 +195,6 @@ blocks. | |||||||
| Gateway implementations SHOULD be smart enough to require only the minimal DAG subset | ||||||||
| necessary for handling the range request. | ||||||||
|
|
||||||||
|
|
||||||||
| NOTE: for more advanced use cases such as partial DAG/CAR streaming, or | ||||||||
| non-UnixFS data structures, see the `selector` query parameter | ||||||||
| [proposal](https://github.com/ipfs/go-ipfs/issues/8769). | ||||||||
|
|
@@ -256,7 +256,6 @@ for more details. | |||||||
| - This is a powerful primitive that allows for fetching subsets of data in specific order, either as raw bytes, or a CAR stream. Think “HTTP range requests”, but for IPLD, and more powerful. | ||||||||
| --> | ||||||||
|
|
||||||||
|
|
||||||||
| # HTTP Response | ||||||||
|
|
||||||||
| ## Response Status Codes | ||||||||
|
|
@@ -372,7 +371,6 @@ and CDNs, implementations should base it on both CID and response type: | |||||||
| should be based on requested range in addition to CID and response format: | ||||||||
| `Etag: "bafy..foo.0-42` | ||||||||
|
|
||||||||
|
|
||||||||
| ### `Cache-Control` (response header) | ||||||||
|
|
||||||||
| Used for HTTP caching. | ||||||||
|
|
@@ -433,6 +431,7 @@ or optional [`filename`](#filename-request-query-parameter) parameter) | |||||||
| and magic bytes to improve the utility of produced responses. | ||||||||
|
|
||||||||
| For example: | ||||||||
|
|
||||||||
| - detect plain text file | ||||||||
| and return `Content-Type: text/plain` instead of `application/octet-stream` | ||||||||
| - detect SVG image | ||||||||
|
|
@@ -446,6 +445,7 @@ Returned when `download`, `filename` query parameter, or a custom response | |||||||
| The first parameter passed in this header indicates if content should be | ||||||||
| displayed `inline` by the browser, or sent as an `attachment` that opens the | ||||||||
| “Save As” dialog: | ||||||||
|
|
||||||||
| - `Content-Disposition: inline` is the default, returned when request was made | ||||||||
| with `download=false` or a custom `filename` was provided with the request | ||||||||
| without any explicit `download` parameter. | ||||||||
|
|
@@ -457,13 +457,14 @@ The remainder is an optional `filename` parameter that will be prefilled in the | |||||||
|
|
||||||||
| NOTE: when the `filename` includes non-ASCII characters, the header must | ||||||||
| include both ASCII and UTF-8 representations for compatibility with legacy user | ||||||||
| agents and existing web browsers. | ||||||||
| agents and existing web browsers. | ||||||||
|
|
||||||||
| To illustrate, `?filename=testтест.pdf` should produce: | ||||||||
| `Content-Disposition inline; filename="test____.jpg"; filename*=UTF-8''test%D1%82%D0%B5%D1%81%D1%82.jpg` | ||||||||
| - ASCII representation must have non-ASCII characters replaced with `_` | ||||||||
| - UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). | ||||||||
| - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) | ||||||||
|
|
||||||||
| - ASCII representation must have non-ASCII characters replaced with `_` | ||||||||
| - UTF-8 representation must be wrapped in Percent Encoding ([RFC 3986, Section 2.1](https://www.rfc-editor.org/rfc/rfc3986.html#section-2.1)). | ||||||||
| - NOTE: `UTF-8''` is not a typo – see [Examples in RFC5987](https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2) | ||||||||
|
|
||||||||
| `Content-Disposition` must be also set when a binary response format was requested: | ||||||||
|
|
||||||||
|
|
@@ -510,8 +511,9 @@ This header is more widely used in [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md | |||||||
|
|
||||||||
| Gateway MUST return a redirect when a valid UnixFS directory was requested | ||||||||
| without the trailing `/`, for example: | ||||||||
|
|
||||||||
| - response for `https://ipfs.io/ipns/en.wikipedia-on-ipfs.org/wiki` | ||||||||
| (no trailing slash) will be HTTP 301 redirect with | ||||||||
| (no trailing slash) will be HTTP 301 redirect with | ||||||||
| `Location: /ipns/en.wikipedia-on-ipfs.org/wiki/` | ||||||||
|
|
||||||||
| ### `X-Ipfs-Path` (response header) | ||||||||
|
|
@@ -614,7 +616,7 @@ IPLD data, starting from that data which the CID identified. | |||||||
| **Note:** Other types of gateway may allow for passing CID by other means, such | ||||||||
| as `Host` header, changing the rules behind path splitting. | ||||||||
| (See [SUBDOMAIN_GATEWAY.md](./SUBDOMAIN_GATEWAY.md) | ||||||||
| and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)). | ||||||||
| and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)). | ||||||||
|
|
||||||||
| ### Traversing remaining path | ||||||||
|
|
||||||||
|
|
@@ -628,6 +630,7 @@ low level logical pathing from IPLD: | |||||||
| ### Handling traversal errors | ||||||||
|
|
||||||||
| Gateway MUST respond with HTTP error when it is not possible to traverse the requested content path: | ||||||||
|
|
||||||||
| - [`404 Not Found`](#404-not-found) should be returned when the root CID is valid and traversable, but | ||||||||
| the DAG it represents does not include content path remainder. | ||||||||
| - Error response body should indicate which part of immutable content path (`/ipfs/{cid}/path/to/file`) is missing | ||||||||
|
|
@@ -655,6 +658,7 @@ Implementations are encouraged to support pluggable denylists to allow IPFS | |||||||
| node operators to opt into not hosting previously flagged content. | ||||||||
|
|
||||||||
| Gateway MUST respond with HTTP error when requested CID is on any of active denylists: | ||||||||
|
|
||||||||
| - [410 Gone](#410-gone) returned when CID is denied for non-legal reasons, or when the exact reason is unknown | ||||||||
| - [451 Unavailable For Legal Reasons](#451-unavailable-for-legal-reasons) returned when denylist indicates that content was blocked on legal basis | ||||||||
|
|
||||||||
|
|
@@ -694,3 +698,12 @@ The usual optimizations involve: | |||||||
| limiting the cost of a single page load. | ||||||||
| - The downside of this approach is that it will always be slower than | ||||||||
| skipping child block resolution. | ||||||||
|
|
||||||||
| ## Streaming errors | ||||||||
|
|
||||||||
| To avoid users receiving an incomplete, yet valid, files, the gateway MUST | ||||||||
| close the HTTP stream if an error occurs while streaming a file to the client. | ||||||||
| This can be done via the following mechanisms: | ||||||||
|
|
||||||||
| - Sending a `RST` (reset) frame for HTTP/1.1 | ||||||||
| - Sending a `RST_STREAM` (reset stream) frame for HTTP/2 | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't read the entire spec, but according to BCP 56, HTTP APIs shouldn't go into this level of detail. HTTP is designed in a way that allows building applications without referring to the specific HTTP version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Jorropo Again, haven't read the entire spec, but I are we actually doing server push (which is the only context in which CANCEL_PUSH would be valid)? That feature of HTTP is going to be removed from major browser implementations very soon. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't actually red the RFC, I know RST_STREAM on HTTP/2 is what we want, I just CTRL + F RST_STREAM in the HTTP/3 RFC and found this text:
I guess it should be whatever QUIC's frame is to close a stream unexpectedly.
Streaming errors is not a thing HTTP supports, you can setup trailler headers but they aren't supported by browsers. I don't know if we should add this to the gateway spec, because most HTTP server implementations give you as much control as the go std does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @marten-seemann as @Jorropo mentioned, the big issue here is that HTTP doesn't have any error handling for streaming. One of my motivations behind this IPIP regards the new TAR format for the gateway (ipfs/kubo#9029). It is possible that the TAR creation fails while streaming the file to the client due to many reasons. However, if you just stop streaming the TAR, you still get a valid TAR file, but it is incomplete. There's no feedback. Printing a trailer header is useless since browsers are not able to parse them. The only way we found so far to tell the user that something is wrong was by force-closing the HTTP stream. I also have mixed feelings about having this on the spec since it is so specific. In addition, as @Jorropo mentioned it may be worth it having an opt-out of the behaviour through some header. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is an important question. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Jorropo please see the updates I've made to the IPIP. |
||||||||
Uh oh!
There was an error while loading. Please reload this page.