Skip to content

Commit 679ec85

Browse files
committed
Improve contentEncoding explanations
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
1 parent fe0662f commit 679ec85

File tree

6 files changed

+57
-13
lines changed

6 files changed

+57
-13
lines changed

assets/main.scss

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,10 @@ article table {
118118
@extend .table;
119119
}
120120

121+
article table td:first-child {
122+
word-break: keep-all;
123+
}
124+
121125
.table-light > th {
122126
@include color-mode(dark) {
123127
color: $gray-300;

content/2019-09/content/contentEncoding.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
keyword: "contentEncoding"
33
signature: "String"
4-
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686)
4+
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686) and [RFC 2045](https://www.rfc-editor.org/info/rfc2045.html)
55
summary: "The string instance should be interpreted as encoded binary data and decoded using the encoding named by this property."
66
kind: [ "annotation" ]
77
instance: [ "string" ]

content/2020-12/content/contentEncoding.markdown

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
keyword: "contentEncoding"
33
signature: "String"
4-
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686)
4+
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686) and [RFC 2045](https://www.rfc-editor.org/info/rfc2045.html)
55
summary: "The string instance should be interpreted as encoded binary data and decoded using the encoding named by this property."
66
kind: [ "annotation" ]
77
instance: [ "string" ]
@@ -20,19 +20,59 @@ related:
2020
keyword: contentSchema
2121
---
2222

23-
The `contentEncoding` keyword is an annotation used to specify the encoding used to store the contents of a string, particularly when it represents binary data. It indicates how the string value should be interpreted and decoded. This keyword is not directly involved in the validation process but provides metadata about the content.
23+
The `contentEncoding` keyword signifies that an instance value (such as a
24+
specific object property) should be considered binary data encoded into a JSON
25+
string using the given encoding. This keyword does not affect validation, but
26+
the evaluator will collect its value as an annotation. The use of this and
27+
related keywords is a common technique to encode and describe arbitrary binary
28+
data (such as image, audio, and video) in JSON.
2429

25-
* `contentEncoding` doesn't enforce strict validation. However, it's recommended to use it correctly to ensure compatibility with applications that might interpret the encoding.
26-
* It represents the type of binary encoding used for the string under question. Some of the common encodings are listed [here](#common-encodings).
27-
* The JSON Schema specification doesn't publish a predefined list of possible encodings
30+
{{<best-practice>}}
2831

29-
## Common Encodings
32+
It is recommended to set this keyword along with the [`contentMediaType`]({{<
33+
ref "2020-12/content/contentmediatype" >}}) keyword to declare the type of data
34+
being encoded (for example, an image in PNG format). Otherwise, the receiver
35+
must treat the instance value as a binary blob without knowing for sure the
36+
type of information it represents.
37+
38+
{{</best-practice>}}
39+
40+
{{<common-pitfall>}}
41+
42+
The JSON Schema specification prohibits implementations, for security reasons,
43+
from automatically attempting to decode, parse, or validate encoded data
44+
without the consumer explicitly opting in to such behaviour. If you require
45+
this feature, consult the documentation of your tooling of choice to see if it
46+
supports content encoding/decoding and how to enable it.
47+
48+
{{</common-pitfall>}}
49+
50+
{{<learning-more>}}
51+
52+
This keyword is inspired by the
53+
[`Content-Transfer-Encoding`](https://www.rfc-editor.org/rfc/rfc2045.html#section-6)
54+
RFC 2045 MIME header used to transmit non-ASCII data over e-mail. For example,
55+
if you send a picture as an e-mail attachment, your e-mail client will likely
56+
send a multipart message that includes the Base64-encoded representation of
57+
such picture, while setting the `Content-Transfer-Encoding` header to `base64`.
58+
59+
{{</learning-more>}}
60+
61+
[RFC 2045](https://datatracker.ietf.org/doc/html/rfc2045) _(Format of Internet
62+
Message Bodies)_ defines the following standard encodings. In the interest of
63+
interoperability, avoid defining new content encodings. While the JSON Schema
64+
specification does not provide explicit guidance on this, [RFC 2045 Section
65+
6.3](https://datatracker.ietf.org/doc/html/rfc2045#section-6.3) suggests that
66+
if a custom content encoding is really needed, it must be prefixed with `x-`.
67+
For example, `x-my-new-encoding`.
3068

3169
| Encoding | Description | Reference |
3270
|------------|-------------------------------------------------------------------------------------------------|-----------|
33-
| `"base16"` | Encoding scheme for binary data using a 16-character hexadecimal alphabet | [RFC 4648 §8](https://datatracker.ietf.org/doc/html/rfc4648#section-8) |
34-
| `"base32"` | Encoding scheme for binary data using a 32-character hexadecimal alphabet | [RFC 4648 §6](https://datatracker.ietf.org/doc/html/rfc4648#section-6) |
35-
| `"base64"` | Encoding scheme for binary data using a 64-character hexadecimal alphabet | [RFC 4648 §4](https://datatracker.ietf.org/doc/html/rfc4648#section-4) |
71+
| `"7bit"` | Encoding scheme that constrains ASCII to disallow octets greater than 127, disallow `NUL`, and restricts `CR` and `LF` to `CRLF` sequences | [RFC 2045 Section 2.7](https://datatracker.ietf.org/doc/html/rfc2045#section-2.7) |
72+
| `"8bit"` | Encoding scheme that constrains ASCII to permit octets greater than 127, disallow `NUL`, and restrict `CR` and `LF` to `CRLF` sequences | [RFC 2045 Section 2.8](https://datatracker.ietf.org/doc/html/rfc2045#section-2.8) |
73+
| `"binary"` | Encoding scheme where any sequence of octets is allowed | [RFC 2045 Section 2.9](https://datatracker.ietf.org/doc/html/rfc2045#section-2.9) |
74+
| `"quoted-printable"` | Encoding scheme that preserves ASCII printable characters and escapes the rest using a simple algorithm based on an hexadecimal alphabet | [RFC 2045 Section 6.7](https://datatracker.ietf.org/doc/html/rfc2045#section-6.7) |
75+
| `"base64"` | Encoding scheme using a 64-character hexadecimal alphabet | [RFC 2045 Section 6.8](https://datatracker.ietf.org/doc/html/rfc2045#section-6.8) |
3676

3777
## Examples
3878

content/2020-12/format-annotation/format.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ annotation, never enabling validation support at the implementation level (even
6666
if supported), and performing validation using the [`pattern`]({{< ref
6767
"2020-12/validation/pattern" >}}) keyword. {{</learning-more>}}
6868

69-
The supported formats are the following:
69+
The supported formats are the following.
7070

7171
| Format | Category | Specification |
7272
|---------------------------|----------------------|---------------|

content/2020-12/format-assertion/format.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ undefined behavior. As a best practice, stick to standardised formats. If
6161
needed, introduce a new keyword for custom string logical
6262
types.{{</best-practice>}}
6363

64-
The supported formats are the following:
64+
The supported formats are the following.
6565

6666
| Format | Category | Specification |
6767
|---------------------------|----------------------|---------------|

content/draft7/validation/contentEncoding.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
keyword: "contentEncoding"
33
signature: "String"
4-
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686)
4+
value: This keyword should be set to a standard (to increase interoperability) encoding name such as those defined in [RFC 4648](https://www.rfc-editor.org/info/rfc4686) and [RFC 2045](https://www.rfc-editor.org/info/rfc2045.html)
55
summary: "The string instance should be interpreted as encoded binary data and decoded using the encoding named by this property."
66
kind: [ "annotation" ]
77
instance: [ "string" ]

0 commit comments

Comments
 (0)