You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: draft-mcnally-deterministic-cbor.md
+40-32Lines changed: 40 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,15 +107,39 @@ This section defines requirements and practices falling in the purview of the dC
107
107
dCBOR encoders MUST only emit CBOR conforming to the requirements of {{-CBOR}} §4.2.1. To summarize:
108
108
109
109
* Variable-length integers MUST be as short as possible.
110
-
* Floating-point values MUST use the shortest form that preseves the value.
110
+
* Floating-point values MUST use the shortest form that preserves the value.
111
111
* Indefinite-length arrays and maps MUST NOT be used.
112
-
* Map keys MUST be sorted in bytewise lexicographic order of their deterministic encodings.
112
+
* Map keys MUST be sorted in byte-wise lexicographic order of their deterministic encodings.
113
113
114
114
dCBOR codecs MUST validate and return errors for any CBOR that is not conformant.
115
115
116
-
## Reduction of Floating Point Values to Integers
116
+
## Reduction of Floating Point Values
117
117
118
-
While there is no requirement that dCBOR codecs implement support for floating point numbers, dCBOR codecs that do support them MUST reduce floating point values with no fractional part to the integer value that can accurately represent it in the fewest bits. If a numeric value has a fractional part or an exponent that takes it out of the range of representable integers, then it SHALL be encoded as a floating point value. If it cannot be represented as a floating point value, then it SHALL be encoded as a BIGNUM by encoders that support them.
118
+
While there is no requirement that dCBOR codecs implement support for floating point numbers (CBOR major type 7), dCBOR codecs that do support them MUST reduce floating point values with a non-zero fractional part, or values that fall outside the range of integer encodings specified below, to the floating point encoding that can accurately represent it in the fewest bits. For dCBOR codecs that support floating point {{IEEE754}} binary16 MUST be supported, and is the most-preferred encoding for floating point values.
119
+
120
+
For floating point values, from most to least preferred:
121
+
122
+
~~~
123
+
binary16 (half-width)
124
+
binary32 (float)
125
+
binary64 (double)
126
+
~~~
127
+
128
+
### Reduction of Negative Zero.
129
+
130
+
{{IEEE754}} defines a negative zero value `-0.0`. dCBOR encoders that support floating point MUST reduce all negative zero values to the integer value `0`. dCBOR decoders MUST reject any negative zero values.
131
+
132
+
### Reduction of NaNs and Infinities.
133
+
134
+
{{IEEE754}} defines the `NaN` (Not a Number) value {{NAN}}. This is usually divided into two types: *quiet NaNs* and *signalling NaNs*, and the sign bit is used to distinguish between these two types. However, the specification also includes a range of "payload" bits. These bit fields have no definite purpose and could be used to break CBOR determinism.
135
+
136
+
dCBOR encoders that support floating point MUST reduce all `NaN` values to the half-width quiet `NaN` value having the canonical bit pattern `0x7e00`.
137
+
138
+
Similarly, encoders that support floating point MUST reduce all `+INF` values to the half-width `+INF` having the canonical bit pattern `0x7c00` and likewise with `-INF` to `0xfc00`.
139
+
140
+
### Reduction of Floating Point Values to Integers
141
+
142
+
dCBOR codecs that support floating point values (major type 7) MUST reduce floating point values with no fractional part to the integer value that can accurately represent it in the fewest bits. If a numeric value has a non-zero fractional part or an exponent that takes it out of the range of representable integers, then it SHALL be encoded as a floating point value as specified above.
119
143
120
144
For the unsigned integers, from most to least preferred:
Float: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …]
139
-
BIGNUM: [… -2^63 - 1 U 2^63 …] [… -9223372036854775809 U 9223372036854775808 …]
140
162
~~~
141
163
142
164
This practice still produces well-formed CBOR according to the standard, and all existing implementations will be able to read it. It does exclude a map such as the following from being validated as dCBOR, as it would have a duplicate key:
@@ -148,22 +170,6 @@ This practice still produces well-formed CBOR according to the standard, and all
148
170
}
149
171
~~~
150
172
151
-
### Reduction of Negative Zero.
152
-
153
-
{{IEEE754}} defines a negative zero value `-0.0`. dCBOR encoders that support floating point MUST reduce all negative zero values to the integer value `0`. dCBOR decoders MUST reject any negative zero values.
154
-
155
-
## Reduction of NaNs and Infinities.
156
-
157
-
{{IEEE754}} defines the `NaN` (Not a Number) value {{NAN}}. This is usually divided into two types: *quiet NaNs* and *signalling NaNs*, and the sign bit is used to distinguish between these two types. However, the specification also includes a range of "payload" bits. These bit fields have no definite purpose and could be used to break CBOR determinism.
158
-
159
-
dCBOR encoders that support floating point MUST reduce all `NaN` values to the half-width quiet `NaN` value having the canonical bit pattern `0x7e00`.
160
-
161
-
Similarly, encoders that support floating point MUST reduce all `+INF` values to the half-width `+INF` having the canonical bit pattern `0x7c00` and likewise with `-INF` to `0xfc00`.
162
-
163
-
## Reduction of BigNums to Integers
164
-
165
-
While there is no requirement that dCBOR codecs implement support for BigNums ≥ 2^64 (tags 2 and 3), codecs that do support them MUST use regular integer encodings where integers can represent the value.
166
-
167
173
## 65-bit negative integers disallowed
168
174
169
175
The largest negative integer that can be represented in 64-bit two's complement (STANDARD_NEGATIVE_INT_MAX) is -2^63 (0x8000000000000000).
@@ -172,8 +178,6 @@ However, CBOR can encode negative integers as low as CBOR_NEGATIVE_INT_MAX, whic
172
178
173
179
Because of this incompatibility between the CBOR and standard representations, dCBOR disallows encoding negative integer values in the range [CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1]: conformant encoders MUST never encode these values and conformant decoders MUST reject these values as invalid.
174
180
175
-
Implementations that support BIGNUM are able to encode and decode these values as BIGNUM.
176
-
177
181
# Application Level
178
182
179
183
## Optional/Default Values
@@ -201,15 +205,17 @@ The codec API SHOULD afford conveniences such as protocol conformances that allo
201
205
202
206
# Future Work
203
207
204
-
The following issues are currently left for future work:
208
+
The following issues are currently left for future work, which may become part of this draft or left for future specifications. Community input is welcome.
205
209
206
-
* How to deal with subnormal floating point values {{SUBNORMAL}}.
210
+
* How to deal with subnormal floating point values {{SUBNORMAL}}, including whether there should be special handling for them, and if so, how.
211
+
* How to numerically reduce values aside from machine-sized integers and floating point values (major types 0, 1, and 7): For example, big integers, decimal fractions, and rational numbers.
212
+
* How to canonicalize other tagged CBOR constructs such as dates (tag 1).
207
213
208
214
# API-Level Recommendations
209
215
210
216
This section is informative.
211
217
212
-
Many existing CBOR implementations give little or no guidance at the API level as to whether the CBOR being read conforms to the CBOR specification for deterministic encoding {{-CBOR}} §4.2, for example by emitting errors or warnings at deserialization time. Conversely, many existing implementations do not carry any burden of ensuring that CBOR is serialized in conformance with the CBOR determinstic encoding specification, again putting that burden on developers.
218
+
Many existing CBOR implementations give little or no guidance at the API level as to whether the CBOR being read conforms to the CBOR specification for deterministic encoding {{-CBOR}} §4.2, for example by emitting errors or warnings at deserialization time. Conversely, many existing implementations do not carry any burden of ensuring that CBOR is serialized in conformance with the CBOR deterministic encoding specification, again putting that burden on developers.
213
219
214
220
The authors of this document believe that for applications where dCBOR correctness as specified in this document is important, the codec itself should carry as much of this burden as possible. This is important both to minimize cognitive load during development, and help ensure interoperability between implementations.
215
221
@@ -229,7 +235,7 @@ It is RECOMMENDED that dCBOR APIs provide a dCBOR `Map` structure or similar tha
229
235
* Supports iteration through entries in dCBOR canonical key order.
230
236
* Supports treating keys as duplicate that have identical dCBOR encodings, e.g., `10` and `10.0`.
231
237
232
-
The dCBOR decoder SHOULD return an error if it encounters misordered or duplicate map keys.
238
+
The dCBOR decoder SHOULD return an error if it encounters mis-ordered or duplicate map keys.
233
239
234
240
## API Handling of Numeric Values
235
241
@@ -238,15 +244,15 @@ The authors do make the following recommendations:
238
244
* The encoder API SHOULD accept any supported numeric type for insertion into the CBOR stream and decide the dCBOR-conformant form for its encoding.
239
245
* The API SHOULD allow any supported numeric type to be extracted, and return errors when the actual type encountered is not representable in the requested type. For example,
240
246
* If the encoded value is "1.5" then requesting extraction of the value as floating point will succeed but requesting extraction as an integer will fail.
241
-
* Similarly, if the value has a large exponent and therefore can be represented as either a floating point value or a BigNum, then attempting to extract it as a machine integer will fail.
247
+
* Similarly, if the value has a large exponent and therefore can be represented as either a floating point value (or possibly another supported type such as BigNum), then attempting to extract it as a machine integer will fail.
242
248
243
249
## Validation Errors
244
250
245
251
It is RECOMMENDED that a dCBOR decoder return errors when it encounters any of these conditions in the input stream:
* `nonCanonicalNumeric`: An integer, floating-point value, or BigNum was encoded in non-canonical form
255
+
* `nonCanonicalNumeric`: An integer, floating-point value, or other supported numerical type (e.g. bignum) was encoded in non-canonical form
250
256
* `invalidString`: An invalid UTF-8 string was encountered
251
257
* `unusedData`: Unused data encountered past the expected end of the input stream
252
258
* `misorderedMapKey`: A map has keys not in canonical order
@@ -276,11 +282,13 @@ The first consideration is unlikely due to the Law of Identity (A is A). The sec
276
282
277
283
This document makes no requests of IANA.
278
284
279
-
We considered requesting a new media type {{-MIME}} for deterministic CBOR, e.g., `application/d+cbor`, but chose not to pursue this as all dCBOR is well-formed CBOR. Therefore, existing CBOR codecs can read dCBOR, and many existing codecs can also write dCBOR if the encoding rules are observed. Protocols that adopt dCBOR will simply have more stringent requirments for the CBOR they emit and ingest.
285
+
We considered requesting a new media type {{-MIME}} for deterministic CBOR, e.g., `application/d+cbor`, but chose not to pursue this as all dCBOR is well-formed CBOR. Therefore, existing CBOR codecs can read dCBOR, and many existing codecs can also write dCBOR if the encoding rules are observed. Protocols that adopt dCBOR will simply have more stringent requirements for the CBOR they emit and ingest.
280
286
281
287
--- back
282
288
283
289
# Acknowledgments
284
290
{:numbered="false"}
285
291
286
-
TODO acknowledge.
292
+
The authors are grateful for the contributions of Carsten Bormann and Anders Lundgren in the CBOR working group.
0 commit comments