|
| 1 | +--- |
| 2 | +title: "HTTP Unencoded Digest" |
| 3 | +abbrev: "HTTP Unencoded Digest" |
| 4 | +category: std |
| 5 | + |
| 6 | +docname: draft-ietf-httpbis-unencoded-digest-latest |
| 7 | +submissiontype: IETF |
| 8 | +number: |
| 9 | +date: {DATE} |
| 10 | + |
| 11 | +v: 3 |
| 12 | +area: Web and Internet Transport |
| 13 | +workgroup: HTTP |
| 14 | +keyword: |
| 15 | + - next generation |
| 16 | + - unicorn |
| 17 | + - sparkling distributed ledger |
| 18 | +venue: |
| 19 | + group: HTTP |
| 20 | + type: Working Group |
| 21 | + home: https://httpwg.org/ |
| 22 | + |
| 23 | + arch: https://lists.w3.org/Archives/Public/ietf-http-wg/ |
| 24 | + repo: https://github.com/httpwg/http-extensions/labels/unecoded-digest |
| 25 | +github-issue-label: unencoded-digest |
| 26 | + |
| 27 | +author: |
| 28 | + - |
| 29 | + fullname: Lucas Pardue |
| 30 | + organization: Cloudflare |
| 31 | + |
| 32 | + - |
| 33 | + fullname: Mike West |
| 34 | + organization: Google |
| 35 | + |
| 36 | + |
| 37 | +normative: |
| 38 | + |
| 39 | +informative: |
| 40 | + |
| 41 | + |
| 42 | +--- abstract |
| 43 | + |
| 44 | +The Repr-Digest and Content-Digest integrity fields are subject to HTTP content |
| 45 | +coding considerations. There are some use cases that benefit from the |
| 46 | +unambiguous exchange of integrity digests of unencoded representation. The |
| 47 | +Unencoded-Digest and Want-Unencoded-Digest fields complement existing integrity |
| 48 | +fields for this purpose. |
| 49 | + |
| 50 | + |
| 51 | +--- middle |
| 52 | + |
| 53 | +# Introduction |
| 54 | + |
| 55 | +The `Repr-Digest` and `Content-Digest` integrity fields defined in |
| 56 | +{{!DIGEST-FIELDS=RFC9530}} are suitable for a range of use cases. However, |
| 57 | +because the fields are subject to HTTP content coding considerations, it is |
| 58 | +difficult to support use cases that could benefit from the exchange of integrity |
| 59 | +digests of the unencoded representation. |
| 60 | + |
| 61 | +As a simple example, an application using HTTP might be presented with request |
| 62 | +or response representation data that has been transparently decoded. Attempting |
| 63 | +to verify the integrity of the data against the `Repr-Digest` would first require |
| 64 | +re-encoding that data using the same coding indicated by the Content-Encoding |
| 65 | +header field ({{Section 8.4 of !HTTP=RFC9110}}), which is not always possible |
| 66 | +(see {{Section 6.5 of DIGEST-FIELDS}}). |
| 67 | + |
| 68 | +Although receivers could feasibly re-encode data in order to carry out |
| 69 | +`Repr-Digest` validation, it might be impractical for certain kinds of |
| 70 | +environments. For instance, browsers tend to provide built-in support for |
| 71 | +transparent decoding but little support for encoding; while this could be done |
| 72 | +via the use of additional libraries it would create work in JavaScript that |
| 73 | +could contend with other activities. Even on the server side, the re-encoding of |
| 74 | +received data might not be acceptable; some coding algorithms are optimized |
| 75 | +towards efficient decoding at the cost of complex encoding. A Content-Encoding |
| 76 | +field value that indicates a series of encodings adds further complexity. |
| 77 | + |
| 78 | +A more complex example involves HTTP Range Requests ({{Section 14 of |
| 79 | +HTTP}}), where a client fetches multiple partial representations from |
| 80 | +different origins and "stitches" them back into a whole. Unfortunately, if the |
| 81 | +origins apply different content coding, the `Repr-Digest` field will vary by the |
| 82 | +server's selected encoding (i.e. the Content-Encoding header field, {{Section |
| 83 | +8.4 of HTTP}}). This provides a challenge for a client - in order to verify the |
| 84 | +integrity of the pieced-together whole it would need to remove the encoding of |
| 85 | +each part, combine them, and then encode the result in order to compare against |
| 86 | +one or more `Repr-Digest`s. |
| 87 | + |
| 88 | +The Accept-Encoding header field ({{Section 12.5.3 of HTTP}}) provides the means |
| 89 | +to indicate preferences for content coding. It is possible for an endpoint to |
| 90 | +indicate a preference for no encoding, for example by sending the "identity" |
| 91 | +token. However, codings often provide data compression that is advantageous. |
| 92 | +Disabling content coding in order to simplify integrity checking is possibly an |
| 93 | +unacceptable trade off. |
| 94 | + |
| 95 | +For a variety of reasons, decoding and re-encoding content in order to benefit |
| 96 | +from HTTP integrity fields is not preferable. This specification defines the |
| 97 | +Unencoded-Digest and Want-Unencoded-Digest fields to support a simpler validation |
| 98 | +workflow in some scenarios where content coding is applied. These fields |
| 99 | +complement the other integrity fields defined in {{DIGEST-FIELDS}}. |
| 100 | + |
| 101 | + |
| 102 | +# Conventions and Definitions |
| 103 | + |
| 104 | +{::boilerplate bcp14-tagged} |
| 105 | + |
| 106 | +This document uses the Augmented BNF defined in {{!RFC5234}} and updated by |
| 107 | +{{!RFC7405}}. This includes the rules: LF (line feed) |
| 108 | + |
| 109 | +This document uses the following terminology from {{Section 3 of |
| 110 | +!STRUCTURED-FIELDS=RFC9651}} to specify syntax and parsing: Byte Sequence, |
| 111 | +Dictionary, and Integer. |
| 112 | + |
| 113 | +The definitions "representation", "selected representation", "representation |
| 114 | +data", "representation metadata", and "content" in this document are to be |
| 115 | +interpreted as described in {{!HTTP=RFC9110}}. |
| 116 | + |
| 117 | +"Integrity fields" is the collective term for `Content-Digest`, `Repr-Digest`, |
| 118 | +and `Unencoded-Digest`. |
| 119 | + |
| 120 | +"Integrity preference fields" is the collective term for `Want-Repr-Digest`, |
| 121 | +`Want-Content-Digest`, and `Want-Unencoded-Digest`. |
| 122 | + |
| 123 | +# The Unencoded-Digest Field {#unencoded-digest} |
| 124 | + |
| 125 | +The `Unencoded-Digest` HTTP field can be used in requests and responses to |
| 126 | +communicate digests that are calculated using a hashing algorithm applied to the |
| 127 | +representation with no content coding ({{Section 8.4.1 of HTTP}}). |
| 128 | + |
| 129 | +Apart from the content coding concerns, `Unencoded-Digest` behaves similarly |
| 130 | +to `Repr-Digest` ({{Section 3 of DIGEST-FIELDS}}). In the absence of content |
| 131 | +coding, `Unencoded-Digest` is identical to `Repr-Digest`. |
| 132 | + |
| 133 | +`Unencoded-Digest` is a `Dictionary` (see {{Section 3.2 of STRUCTURED-FIELDS}}) |
| 134 | +where each: |
| 135 | + |
| 136 | +* key conveys the hashing algorithm (see {{Section 5 of DIGEST-FIELDS}}) used to |
| 137 | + compute the digest; |
| 138 | +* value is a `Byte Sequence` ({{Section 3.3.5 of STRUCTURED-FIELDS}}), that |
| 139 | + conveys an encoded version of the byte output produced by the digest |
| 140 | + calculation. |
| 141 | + |
| 142 | +For example: |
| 143 | + |
| 144 | +~~~ http-message |
| 145 | +NOTE: '\' line wrapping per RFC 8792 |
| 146 | + |
| 147 | +Unencoded-Digest: \ |
| 148 | + sha-512=:YMAam51Jz/jOATT6/zvHrLVgOYTGFy1d6GJiOHTohq4yP+pgk4vf2aCs\ |
| 149 | + yRZOtw8MjkM7iw7yZ/WkppmM44T3qg==: |
| 150 | +~~~ |
| 151 | + |
| 152 | +The `Dictionary` type can be used, for example, to attach multiple digests |
| 153 | +calculated using different hashing algorithms in order to support a population |
| 154 | +of endpoints with different or evolving capabilities. Such an approach could |
| 155 | +support transitions away from weaker algorithms (see |
| 156 | +{{Section 6.6 of DIGEST-FIELDS}}). |
| 157 | + |
| 158 | +~~~ http-message |
| 159 | +NOTE: '\' line wrapping per RFC 8792 |
| 160 | + |
| 161 | +Unencoded-Digest: \ |
| 162 | + sha-256=:d435Qo+nKZ+gLcUHn7GQtQ72hiBVAgqoLsZnZPiTGPk=:,\ |
| 163 | + sha-512=:YMAam51Jz/jOATT6/zvHrLVgOYTGFy1d6GJiOHTohq4yP+pgk4vf2aCs\ |
| 164 | + yRZOtw8MjkM7iw7yZ/WkppmM44T3qg==: |
| 165 | +~~~ |
| 166 | + |
| 167 | +A recipient MAY ignore any or all digests. Application-specific behavior or |
| 168 | +local policy MAY set additional constraints on the processing and validation |
| 169 | +practices of the conveyed digests. The security considerations cover some of |
| 170 | +the issues related to ignoring digests (see {{Section 6.6 of DIGEST-FIELDS}}) |
| 171 | +and validating multiple digests (see {{Section 6.7 of DIGEST-FIELDS}}). |
| 172 | + |
| 173 | +A sender MAY send a digest without knowing whether the recipient supports a |
| 174 | +given hashing algorithm. A sender MAY send a digest if it knows the recipient |
| 175 | +will ignore it. |
| 176 | + |
| 177 | +`Unencoded-Digest` can be sent in a trailer section. In this case, |
| 178 | +`Unencoded-Digest` MAY be merged into the header section; see {{Section 6.5.1 of |
| 179 | +HTTP}}. |
| 180 | + |
| 181 | +# The Want-Unencoded-Digest Field {#want-unencoded-digest} |
| 182 | + |
| 183 | +`Want-Unencoded-Digest` is an integrity preference field; see {{Section 4 of |
| 184 | +DIGEST-FIELDS}}. It indicates that the sender would like to receive (via the |
| 185 | +`Unencoded-Digest` field) a representation digest on messages associated with the |
| 186 | +request URI and representation metadata where no content coding is applied. |
| 187 | + |
| 188 | +If `Want-Unencoded-Digest` is used in a response, it indicates that the server |
| 189 | +would like the client to provide the `Unencoded-Digest` field on future requests. |
| 190 | + |
| 191 | +`Want-Unencoded-Digest` is only a hint. The receiver of the field can ignore it |
| 192 | +and send an `Unencoded-Digest` field using any algorithm or omit one entirely. It |
| 193 | +is not a protocol error if preferences are ignored. Applications that use |
| 194 | +`Unencoded-Digest` and `Want-Unencoded-Digest` can define expectations or |
| 195 | +constraints that operate in addition to this specification. |
| 196 | + |
| 197 | +`Want-Unencoded-Digest` is of type `Dictionary` where each: |
| 198 | + |
| 199 | +* key conveys the hashing algorithm; |
| 200 | +* value is an `Integer` ({{Section 3.3.1 of STRUCTURED-FIELDS}}) that conveys an |
| 201 | + ascending, relative, weighted preference. It must be in the range 0 to 10 |
| 202 | + inclusive. 1 is the least preferred, 10 is the most preferred, and a value of |
| 203 | + 0 means "not acceptable". |
| 204 | + |
| 205 | +Examples: |
| 206 | + |
| 207 | +~~~ http-message |
| 208 | +Want-Unencoded-Digest: sha-256=1 |
| 209 | +Want-Unencoded-Digest: sha-512=3, sha-256=10, unixsum=0 |
| 210 | +~~~ |
| 211 | + |
| 212 | +# Messages containing both Unencoded-Digest and Content-Encoding {#encoding-and-unencoded} |
| 213 | + |
| 214 | +Digests delivered through `Unencoded-Digest` apply to the unencoded representation. If a message is |
| 215 | +received with content coding, a recipient needs to decode the message in order |
| 216 | +to calculate the digest that can subsequently be used for validation. If |
| 217 | +multiple content codings are applied, the recipient needs to decode all |
| 218 | +encodings in order before validation. |
| 219 | + |
| 220 | +# Integrity Fields are Complementary |
| 221 | + |
| 222 | +Integrity fields can be used in combination to address different and |
| 223 | +complementary needs, particularly the cases described in {{introduction}}. |
| 224 | + |
| 225 | +In the following examples, the unencoded response data is the string "An |
| 226 | +unexceptional string" following by an LF. |
| 227 | + |
| 228 | +The first example demonstrates a request that uses content negotiation. |
| 229 | + |
| 230 | +~~~ http-message |
| 231 | +GET /boringstring HTTP/1.1 |
| 232 | +Host: example.org |
| 233 | +Accept-Encoding: gzip |
| 234 | + |
| 235 | +~~~ |
| 236 | +{: title="GET request with content negotiation"} |
| 237 | + |
| 238 | +The server responds with the full GZIP-encoded representation. The `Repr-Digest` |
| 239 | +and `Unencoded-Digest` therefore differ. |
| 240 | + |
| 241 | +~~~ http-message |
| 242 | +NOTE: '\' line wrapping per RFC 8792 |
| 243 | + |
| 244 | +HTTP/1.1 200 OK |
| 245 | +Content-Encoding: gzip |
| 246 | +Repr-Digest: \ |
| 247 | + sha-256=:XyjvEuFb1P5rqc2le3vQm7M96DwZhvmOwqHLu2xVpY4=: |
| 248 | +Unencoded-Digest: \ |
| 249 | + sha-256=:5Bv3NIx05BPnh0jMph6v1RJ5Q7kl9LKMtQxmvc9+Z7Y=: |
| 250 | + |
| 251 | +1f 8b 08 00 79 1f 08 64 00 ff |
| 252 | +73 cc 53 28 cd 4b ad 48 4e 2d |
| 253 | +28 c9 cc cf 4b cc 51 28 2e 29 |
| 254 | +ca cc 4b e7 02 00 7e af 07 44 |
| 255 | +18 00 00 00 |
| 256 | + |
| 257 | +~~~ |
| 258 | +{: title="GET response with GZIP-encoded content"} |
| 259 | + |
| 260 | +The second example demonstrates a range request with content negotiation. |
| 261 | + |
| 262 | +~~~ http-message |
| 263 | +GET /boringstring HTTP/1.1 |
| 264 | +Host: example.org |
| 265 | +Accept-Encoding: gzip |
| 266 | +Range: bytes=0-10 |
| 267 | + |
| 268 | +~~~ |
| 269 | +{: title="Range request with content negotiation"} |
| 270 | + |
| 271 | +The server responds with a 206 Partial Content response using GZIP encoding, it |
| 272 | +has three different Integrity fields. The `Content-Digest` relates to the |
| 273 | +response message content that can be used to validate the integrity of the |
| 274 | +received part. `Repr-Digest` and `Unencoded-Digest` can be used later once the |
| 275 | +entire object is reconstructed. The choice of which to use is left to the |
| 276 | +application that would consider a range of factors outside the scope of |
| 277 | +this document. |
| 278 | + |
| 279 | +~~~ http-message |
| 280 | +NOTE: '\' line wrapping per RFC 8792 |
| 281 | + |
| 282 | +HTTP/1.1 206 Partial Content |
| 283 | +Content-Encoding: gzip |
| 284 | +Content-Range: bytes 0-9/44 |
| 285 | +Content-Digest: \ |
| 286 | + sha-256=:SotB7Pa5A7iHSBdh9mg1Ev/ktAzrxU4Z8ldcCIUyfI4=: |
| 287 | +Repr-Digest: \ |
| 288 | + sha-256=:XyjvEuFb1P5rqc2le3vQm7M96DwZhvmOwqHLu2xVpY4=: |
| 289 | +Unencoded-Digest: \ |
| 290 | + sha-256=:5Bv3NIx05BPnh0jMph6v1RJ5Q7kl9LKMtQxmvc9+Z7Y=: |
| 291 | + |
| 292 | +1f 8b 08 00 79 1f 08 64 00 ff |
| 293 | +~~~ |
| 294 | +{: title="Partial response with GZIP encoding"} |
| 295 | + |
| 296 | + |
| 297 | +# Security Considerations |
| 298 | + |
| 299 | +All the same considerations documented in {{DIGEST-FIELDS}} apply. |
| 300 | + |
| 301 | +This document introduces a further consideration related to the process of |
| 302 | +validation when an HTTP message contains both Content-Encoding and |
| 303 | +Unencoded-Digest ({{encoding-and-unencoded}}). In order to validate the |
| 304 | +Unencoded-Digest, encoded content needs to be decoded. This provides an |
| 305 | +opportunity for an attacker to direct malicious data into a decoder. One |
| 306 | +possible mitigation would be to also provide a Content-Digest or Repr-Digest in |
| 307 | +the message, allowing for validation of the received bytes before further |
| 308 | +processing. An attacker that can substitute various parts of an HTTP message |
| 309 | +presents several risks, {{Sections 6.1, 6.2 and 6.3 of DIGEST-FIELDS}} |
| 310 | +describe relevant considerations and mitigations. |
| 311 | + |
| 312 | + |
| 313 | +# IANA Considerations |
| 314 | + |
| 315 | +Should this document be adopted and achieve working group consensus, IANA is |
| 316 | +asked to update the "Hypertext Transfer Protocol (HTTP) Field Name Registry" |
| 317 | +{{?HTTP=RFC9110}} as shown in the table below: |
| 318 | + |
| 319 | +|-----------------------|-----------|-----------------|--------------------------------------------| |
| 320 | +| Field Name | Status | Structured Type | Reference | |
| 321 | +|-----------------------|-----------|-----------------|--------------------------------------------| |
| 322 | +| Unencoded-Digest | permanent | Dictionary | {{unencoded-digest}} of this document | |
| 323 | +| Want-Unencoded-Digest | permanent | Dictionary | {{want-unencoded-digest}} of this document | |
| 324 | +|-----------------------|-----------|-----------------|--------------------------------------------| |
| 325 | +{: #iana-field-name-table title="Hypertext Transfer Protocol (HTTP) Field Name Registry Update"} |
| 326 | + |
| 327 | + |
| 328 | +--- back |
| 329 | + |
| 330 | +# Acknowledgments |
| 331 | +{:numbered="false"} |
| 332 | + |
| 333 | +Early drafts of {{DIGEST-FIELDS}} included a mechanism to support the exchange |
| 334 | +of digests where no content coding is applied, which was removed before |
| 335 | +publication. While the design here is different, it is motivated by discussion |
| 336 | +of the previous design in the HTTP WG. The motivating use cases still mostly |
| 337 | +apply identically. |
0 commit comments