Skip to content

Commit ae80513

Browse files
authored
Onboard unencoded-digest to HTTP WG repo (#3129)
The I-D, for legacy reasons, was still using `identity-digest` in its handle https://datatracker.ietf.org/doc/draft-pardue-httpbis-identity-digest/, since we need to change the handle anyway, now seems a good time to change it away from the term everyone hated cc @mikewest
1 parent 42e157b commit ae80513

File tree

1 file changed

+337
-0
lines changed

1 file changed

+337
-0
lines changed
Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
---
2+
title: "HTTP Unencoded Digest"
3+
abbrev: "HTTP Unencoded Digest"
4+
category: std
5+
6+
docname: draft-ietf-httpbis-unencoded-digest-latest
7+
submissiontype: IETF
8+
number:
9+
date: {DATE}
10+
11+
v: 3
12+
area: Web and Internet Transport
13+
workgroup: HTTP
14+
keyword:
15+
- next generation
16+
- unicorn
17+
- sparkling distributed ledger
18+
venue:
19+
group: HTTP
20+
type: Working Group
21+
home: https://httpwg.org/
22+
23+
arch: https://lists.w3.org/Archives/Public/ietf-http-wg/
24+
repo: https://github.com/httpwg/http-extensions/labels/unecoded-digest
25+
github-issue-label: unencoded-digest
26+
27+
author:
28+
-
29+
fullname: Lucas Pardue
30+
organization: Cloudflare
31+
32+
-
33+
fullname: Mike West
34+
organization: Google
35+
36+
37+
normative:
38+
39+
informative:
40+
41+
42+
--- abstract
43+
44+
The Repr-Digest and Content-Digest integrity fields are subject to HTTP content
45+
coding considerations. There are some use cases that benefit from the
46+
unambiguous exchange of integrity digests of unencoded representation. The
47+
Unencoded-Digest and Want-Unencoded-Digest fields complement existing integrity
48+
fields for this purpose.
49+
50+
51+
--- middle
52+
53+
# Introduction
54+
55+
The `Repr-Digest` and `Content-Digest` integrity fields defined in
56+
{{!DIGEST-FIELDS=RFC9530}} are suitable for a range of use cases. However,
57+
because the fields are subject to HTTP content coding considerations, it is
58+
difficult to support use cases that could benefit from the exchange of integrity
59+
digests of the unencoded representation.
60+
61+
As a simple example, an application using HTTP might be presented with request
62+
or response representation data that has been transparently decoded. Attempting
63+
to verify the integrity of the data against the `Repr-Digest` would first require
64+
re-encoding that data using the same coding indicated by the Content-Encoding
65+
header field ({{Section 8.4 of !HTTP=RFC9110}}), which is not always possible
66+
(see {{Section 6.5 of DIGEST-FIELDS}}).
67+
68+
Although receivers could feasibly re-encode data in order to carry out
69+
`Repr-Digest` validation, it might be impractical for certain kinds of
70+
environments. For instance, browsers tend to provide built-in support for
71+
transparent decoding but little support for encoding; while this could be done
72+
via the use of additional libraries it would create work in JavaScript that
73+
could contend with other activities. Even on the server side, the re-encoding of
74+
received data might not be acceptable; some coding algorithms are optimized
75+
towards efficient decoding at the cost of complex encoding. A Content-Encoding
76+
field value that indicates a series of encodings adds further complexity.
77+
78+
A more complex example involves HTTP Range Requests ({{Section 14 of
79+
HTTP}}), where a client fetches multiple partial representations from
80+
different origins and "stitches" them back into a whole. Unfortunately, if the
81+
origins apply different content coding, the `Repr-Digest` field will vary by the
82+
server's selected encoding (i.e. the Content-Encoding header field, {{Section
83+
8.4 of HTTP}}). This provides a challenge for a client - in order to verify the
84+
integrity of the pieced-together whole it would need to remove the encoding of
85+
each part, combine them, and then encode the result in order to compare against
86+
one or more `Repr-Digest`s.
87+
88+
The Accept-Encoding header field ({{Section 12.5.3 of HTTP}}) provides the means
89+
to indicate preferences for content coding. It is possible for an endpoint to
90+
indicate a preference for no encoding, for example by sending the "identity"
91+
token. However, codings often provide data compression that is advantageous.
92+
Disabling content coding in order to simplify integrity checking is possibly an
93+
unacceptable trade off.
94+
95+
For a variety of reasons, decoding and re-encoding content in order to benefit
96+
from HTTP integrity fields is not preferable. This specification defines the
97+
Unencoded-Digest and Want-Unencoded-Digest fields to support a simpler validation
98+
workflow in some scenarios where content coding is applied. These fields
99+
complement the other integrity fields defined in {{DIGEST-FIELDS}}.
100+
101+
102+
# Conventions and Definitions
103+
104+
{::boilerplate bcp14-tagged}
105+
106+
This document uses the Augmented BNF defined in {{!RFC5234}} and updated by
107+
{{!RFC7405}}. This includes the rules: LF (line feed)
108+
109+
This document uses the following terminology from {{Section 3 of
110+
!STRUCTURED-FIELDS=RFC9651}} to specify syntax and parsing: Byte Sequence,
111+
Dictionary, and Integer.
112+
113+
The definitions "representation", "selected representation", "representation
114+
data", "representation metadata", and "content" in this document are to be
115+
interpreted as described in {{!HTTP=RFC9110}}.
116+
117+
"Integrity fields" is the collective term for `Content-Digest`, `Repr-Digest`,
118+
and `Unencoded-Digest`.
119+
120+
"Integrity preference fields" is the collective term for `Want-Repr-Digest`,
121+
`Want-Content-Digest`, and `Want-Unencoded-Digest`.
122+
123+
# The Unencoded-Digest Field {#unencoded-digest}
124+
125+
The `Unencoded-Digest` HTTP field can be used in requests and responses to
126+
communicate digests that are calculated using a hashing algorithm applied to the
127+
representation with no content coding ({{Section 8.4.1 of HTTP}}).
128+
129+
Apart from the content coding concerns, `Unencoded-Digest` behaves similarly
130+
to `Repr-Digest` ({{Section 3 of DIGEST-FIELDS}}). In the absence of content
131+
coding, `Unencoded-Digest` is identical to `Repr-Digest`.
132+
133+
`Unencoded-Digest` is a `Dictionary` (see {{Section 3.2 of STRUCTURED-FIELDS}})
134+
where each:
135+
136+
* key conveys the hashing algorithm (see {{Section 5 of DIGEST-FIELDS}}) used to
137+
compute the digest;
138+
* value is a `Byte Sequence` ({{Section 3.3.5 of STRUCTURED-FIELDS}}), that
139+
conveys an encoded version of the byte output produced by the digest
140+
calculation.
141+
142+
For example:
143+
144+
~~~ http-message
145+
NOTE: '\' line wrapping per RFC 8792
146+
147+
Unencoded-Digest: \
148+
sha-512=:YMAam51Jz/jOATT6/zvHrLVgOYTGFy1d6GJiOHTohq4yP+pgk4vf2aCs\
149+
yRZOtw8MjkM7iw7yZ/WkppmM44T3qg==:
150+
~~~
151+
152+
The `Dictionary` type can be used, for example, to attach multiple digests
153+
calculated using different hashing algorithms in order to support a population
154+
of endpoints with different or evolving capabilities. Such an approach could
155+
support transitions away from weaker algorithms (see
156+
{{Section 6.6 of DIGEST-FIELDS}}).
157+
158+
~~~ http-message
159+
NOTE: '\' line wrapping per RFC 8792
160+
161+
Unencoded-Digest: \
162+
sha-256=:d435Qo+nKZ+gLcUHn7GQtQ72hiBVAgqoLsZnZPiTGPk=:,\
163+
sha-512=:YMAam51Jz/jOATT6/zvHrLVgOYTGFy1d6GJiOHTohq4yP+pgk4vf2aCs\
164+
yRZOtw8MjkM7iw7yZ/WkppmM44T3qg==:
165+
~~~
166+
167+
A recipient MAY ignore any or all digests. Application-specific behavior or
168+
local policy MAY set additional constraints on the processing and validation
169+
practices of the conveyed digests. The security considerations cover some of
170+
the issues related to ignoring digests (see {{Section 6.6 of DIGEST-FIELDS}})
171+
and validating multiple digests (see {{Section 6.7 of DIGEST-FIELDS}}).
172+
173+
A sender MAY send a digest without knowing whether the recipient supports a
174+
given hashing algorithm. A sender MAY send a digest if it knows the recipient
175+
will ignore it.
176+
177+
`Unencoded-Digest` can be sent in a trailer section. In this case,
178+
`Unencoded-Digest` MAY be merged into the header section; see {{Section 6.5.1 of
179+
HTTP}}.
180+
181+
# The Want-Unencoded-Digest Field {#want-unencoded-digest}
182+
183+
`Want-Unencoded-Digest` is an integrity preference field; see {{Section 4 of
184+
DIGEST-FIELDS}}. It indicates that the sender would like to receive (via the
185+
`Unencoded-Digest` field) a representation digest on messages associated with the
186+
request URI and representation metadata where no content coding is applied.
187+
188+
If `Want-Unencoded-Digest` is used in a response, it indicates that the server
189+
would like the client to provide the `Unencoded-Digest` field on future requests.
190+
191+
`Want-Unencoded-Digest` is only a hint. The receiver of the field can ignore it
192+
and send an `Unencoded-Digest` field using any algorithm or omit one entirely. It
193+
is not a protocol error if preferences are ignored. Applications that use
194+
`Unencoded-Digest` and `Want-Unencoded-Digest` can define expectations or
195+
constraints that operate in addition to this specification.
196+
197+
`Want-Unencoded-Digest` is of type `Dictionary` where each:
198+
199+
* key conveys the hashing algorithm;
200+
* value is an `Integer` ({{Section 3.3.1 of STRUCTURED-FIELDS}}) that conveys an
201+
ascending, relative, weighted preference. It must be in the range 0 to 10
202+
inclusive. 1 is the least preferred, 10 is the most preferred, and a value of
203+
0 means "not acceptable".
204+
205+
Examples:
206+
207+
~~~ http-message
208+
Want-Unencoded-Digest: sha-256=1
209+
Want-Unencoded-Digest: sha-512=3, sha-256=10, unixsum=0
210+
~~~
211+
212+
# Messages containing both Unencoded-Digest and Content-Encoding {#encoding-and-unencoded}
213+
214+
Digests delivered through `Unencoded-Digest` apply to the unencoded representation. If a message is
215+
received with content coding, a recipient needs to decode the message in order
216+
to calculate the digest that can subsequently be used for validation. If
217+
multiple content codings are applied, the recipient needs to decode all
218+
encodings in order before validation.
219+
220+
# Integrity Fields are Complementary
221+
222+
Integrity fields can be used in combination to address different and
223+
complementary needs, particularly the cases described in {{introduction}}.
224+
225+
In the following examples, the unencoded response data is the string "An
226+
unexceptional string" following by an LF.
227+
228+
The first example demonstrates a request that uses content negotiation.
229+
230+
~~~ http-message
231+
GET /boringstring HTTP/1.1
232+
Host: example.org
233+
Accept-Encoding: gzip
234+
235+
~~~
236+
{: title="GET request with content negotiation"}
237+
238+
The server responds with the full GZIP-encoded representation. The `Repr-Digest`
239+
and `Unencoded-Digest` therefore differ.
240+
241+
~~~ http-message
242+
NOTE: '\' line wrapping per RFC 8792
243+
244+
HTTP/1.1 200 OK
245+
Content-Encoding: gzip
246+
Repr-Digest: \
247+
sha-256=:XyjvEuFb1P5rqc2le3vQm7M96DwZhvmOwqHLu2xVpY4=:
248+
Unencoded-Digest: \
249+
sha-256=:5Bv3NIx05BPnh0jMph6v1RJ5Q7kl9LKMtQxmvc9+Z7Y=:
250+
251+
1f 8b 08 00 79 1f 08 64 00 ff
252+
73 cc 53 28 cd 4b ad 48 4e 2d
253+
28 c9 cc cf 4b cc 51 28 2e 29
254+
ca cc 4b e7 02 00 7e af 07 44
255+
18 00 00 00
256+
257+
~~~
258+
{: title="GET response with GZIP-encoded content"}
259+
260+
The second example demonstrates a range request with content negotiation.
261+
262+
~~~ http-message
263+
GET /boringstring HTTP/1.1
264+
Host: example.org
265+
Accept-Encoding: gzip
266+
Range: bytes=0-10
267+
268+
~~~
269+
{: title="Range request with content negotiation"}
270+
271+
The server responds with a 206 Partial Content response using GZIP encoding, it
272+
has three different Integrity fields. The `Content-Digest` relates to the
273+
response message content that can be used to validate the integrity of the
274+
received part. `Repr-Digest` and `Unencoded-Digest` can be used later once the
275+
entire object is reconstructed. The choice of which to use is left to the
276+
application that would consider a range of factors outside the scope of
277+
this document.
278+
279+
~~~ http-message
280+
NOTE: '\' line wrapping per RFC 8792
281+
282+
HTTP/1.1 206 Partial Content
283+
Content-Encoding: gzip
284+
Content-Range: bytes 0-9/44
285+
Content-Digest: \
286+
sha-256=:SotB7Pa5A7iHSBdh9mg1Ev/ktAzrxU4Z8ldcCIUyfI4=:
287+
Repr-Digest: \
288+
sha-256=:XyjvEuFb1P5rqc2le3vQm7M96DwZhvmOwqHLu2xVpY4=:
289+
Unencoded-Digest: \
290+
sha-256=:5Bv3NIx05BPnh0jMph6v1RJ5Q7kl9LKMtQxmvc9+Z7Y=:
291+
292+
1f 8b 08 00 79 1f 08 64 00 ff
293+
~~~
294+
{: title="Partial response with GZIP encoding"}
295+
296+
297+
# Security Considerations
298+
299+
All the same considerations documented in {{DIGEST-FIELDS}} apply.
300+
301+
This document introduces a further consideration related to the process of
302+
validation when an HTTP message contains both Content-Encoding and
303+
Unencoded-Digest ({{encoding-and-unencoded}}). In order to validate the
304+
Unencoded-Digest, encoded content needs to be decoded. This provides an
305+
opportunity for an attacker to direct malicious data into a decoder. One
306+
possible mitigation would be to also provide a Content-Digest or Repr-Digest in
307+
the message, allowing for validation of the received bytes before further
308+
processing. An attacker that can substitute various parts of an HTTP message
309+
presents several risks, {{Sections 6.1, 6.2 and 6.3 of DIGEST-FIELDS}}
310+
describe relevant considerations and mitigations.
311+
312+
313+
# IANA Considerations
314+
315+
Should this document be adopted and achieve working group consensus, IANA is
316+
asked to update the "Hypertext Transfer Protocol (HTTP) Field Name Registry"
317+
{{?HTTP=RFC9110}} as shown in the table below:
318+
319+
|-----------------------|-----------|-----------------|--------------------------------------------|
320+
| Field Name | Status | Structured Type | Reference |
321+
|-----------------------|-----------|-----------------|--------------------------------------------|
322+
| Unencoded-Digest | permanent | Dictionary | {{unencoded-digest}} of this document |
323+
| Want-Unencoded-Digest | permanent | Dictionary | {{want-unencoded-digest}} of this document |
324+
|-----------------------|-----------|-----------------|--------------------------------------------|
325+
{: #iana-field-name-table title="Hypertext Transfer Protocol (HTTP) Field Name Registry Update"}
326+
327+
328+
--- back
329+
330+
# Acknowledgments
331+
{:numbered="false"}
332+
333+
Early drafts of {{DIGEST-FIELDS}} included a mechanism to support the exchange
334+
of digests where no content coding is applied, which was removed before
335+
publication. While the design here is different, it is motivated by discussion
336+
of the previous design in the HTTP WG. The motivating use cases still mostly
337+
apply identically.

0 commit comments

Comments
 (0)