Skip to content

Commit 401ea1e

Browse files
committed
Corrections and minor clarifications.
1 parent c826aff commit 401ea1e

File tree

2 files changed

+49
-16
lines changed

2 files changed

+49
-16
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ This is the working area for the individual Internet-Draft, "draft-mcnally-deter
99

1010
## Change History
1111

12+
### August 5, 2023 - 03
13+
14+
* Clarifications and minor corrections.
15+
1216
### August 4, 2023 - 02
1317

1418
* Updated to reflect feedback up to IETF 117.

draft-mcnally-deterministic-cbor.md

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,9 @@ CBOR (RFC 8949) defines "Deterministically Encoded CBOR" in its Section 4.2. The
6060

6161
# Introduction
6262

63-
CBOR has many advantages over other data serialization formats. One of its strengths is specifications and guidelines for serializing data deterministically, such that multiple agents serializing the same data automatically achieve consensus on the exact byte-level form of that serialized data. This is particularly useful when data must be compared for semantic equivalence by comparing the hash of its contents.
63+
CBOR {{-CBOR}} has many advantages over other data serialization formats. One of its strengths is specifications and guidelines for serializing data deterministically, such that multiple agents serializing the same data automatically achieve consensus on the exact byte-level form of that serialized data. This is particularly useful when data must be compared for semantic equivalence by comparing the hash of its contents.
6464

65-
Nonetheless, determinism is an opt-in feature of {{-CBOR}}, and most existing CBOR codecs put the primary burden of correct deterministic serialization and validation of deterministic encoding during deserialization on the engineer. This document specifies a set of requirements for the application profile "dCBOR" that MUST be implemented at the codec level. These requirements include but go beyond {{-CBOR}} §4.2.
65+
Nonetheless, determinism is an opt-in feature of CBOR, and most existing CBOR codecs put the primary burden of correct deterministic serialization and validation of deterministic encoding during deserialization on the engineer. This document specifies a set of requirements for the application profile "dCBOR" that MUST be implemented at the codec level. These requirements include but go beyond {{-CBOR}} §4.2.
6666

6767
## Conventions and Definitions
6868

@@ -80,24 +80,40 @@ This application profile is intended to be used in conjunction with an applicati
8080

8181
## Base Requirements
8282

83-
dCBOR encoders MUST only emit CBOR conforming to the requirements of {{-CBOR}} §4.2.1. To summarize:
83+
dCBOR encoders MUST only emit CBOR conforming to the requirements "Core Deterministic Encoding Requirements" of {{-CBOR}} §4.2.1.
8484

85-
* Variable-length integers MUST be as short as possible.
86-
* Floating-point values MUST use the shortest form that preserves the value.
87-
* Indefinite-length arrays and maps MUST NOT be used.
88-
* Map keys MUST be sorted in byte-wise lexicographic order of their deterministic encodings.
85+
To summarize, dCBOR codecs:
8986

90-
dCBOR codecs MUST validate and return errors for any CBOR that is not conformant.
87+
1. MUST encode variable-length integers using the shortest form possible.
88+
2. MUST encode floating-point values using the shortest form that preserves the value.
89+
3. MUST NOT encode indefinite-length arrays or maps.
90+
4. MUST sort map keys in bytewise lexicographic order of their deterministic encodings.
91+
92+
In addition, dCBOR decoders:
93+
94+
 5\. MUST validate and return errors for any encoded CBOR that is not conformant to any part of this specification.
9195

9296
## Duplicate Map Keys
9397

94-
Standard CBOR {{-CBOR}} defines maps with duplicate keys as invalid, but leaves how to handle such cases to the implementor (§2.2, §3.1, §5.4, §5.6). dCBOR encoders MUST NOT emit CBOR that contains duplicate map keys, and dCBOR decoders MUST reject maps with duplicate keys.
98+
Standard CBOR {{-CBOR}} defines maps with duplicate keys as invalid, but leaves how to handle such cases to the implementor (§2.2, §3.1, §5.4, §5.6).
99+
100+
dCBOR codecs:
101+
102+
1. MUST NOT emit CBOR that contains duplicate map keys.
103+
2. MUST reject encoded maps with duplicate keys.
95104

96105
## Numeric Reduction
97106

98-
While there is no requirement that dCBOR codecs implement support for floating point numbers (CBOR major type 7), dCBOR codecs that do support them MUST reduce floating point values with a non-zero fractional part to the floating point encoding that can accurately represent it in the fewest bits. For dCBOR codecs that support floating point {{IEEE754}} binary16 MUST be supported, and is the most-preferred encoding for floating point values, followed by binary32 then binary64.
107+
dCBOR codecs that support floating point numbers (CBOR major type 7):
108+
109+
1. MUST reduce floating point values with no fractional part to the shortest integer encoding that can accurately represent it.
110+
2. MUST reduce floating point values with a non-zero fractional part to the shortest floating point encoding that can accurately represent it.
111+
3. MUST support floating point {{IEEE754}} binary16 as the most-preferred encoding for floating point values, followed by binary32, then binary64.
99112

100-
This practice still produces well-formed CBOR according to the standard, and all existing generic codecs will be able to read it. It does exclude a map such as the following that would be allowed in standard CBOR from being validated as dCBOR, as `10.0` is an invalid numeric value in dCBOR, and using the unsigned integer value `10` more than once as a map key is not allowed:
113+
This practice still produces well-formed CBOR according to the standard, and all existing generic decoders will be able to read it. It does exclude a map such as the following from being validated as dCBOR, even though it would be allowed in standard CBOR because:
114+
115+
* `10.0` is an invalid numeric value in dCBOR, and
116+
* using the unsigned integer value `10` more than once as a map key is not allowed.
101117

102118
~~~
103119
{
@@ -108,23 +124,36 @@ This practice still produces well-formed CBOR according to the standard, and all
108124

109125
### Reduction of Negative Zero
110126

111-
{{IEEE754}} defines a negative zero value `-0.0`. dCBOR encoders that support floating point MUST reduce all negative zero values to the integer value `0`. dCBOR decoders MUST reject any negative zero values. Therefore with dCBOR, `0.0`, `-0.0`, and `0` all encode to the same canonical single-byte value `0x00`.
127+
{{IEEE754}} defines a negative zero value `-0.0`. dCBOR codecs that support floating point:
128+
129+
1. MUST reduce all negative zero values to the integer value `0`.
130+
2. MUST reject any encoded negative zero values.
131+
132+
Therefore with dCBOR, `0.0`, `-0.0`, and `0` all encode to the same canonical single-byte value `0x00`.
112133

113134
### Reduction of NaNs and Infinities
114135

115136
{{IEEE754}} defines the `NaN` (Not a Number) value {{NAN}}. This is usually divided into two types: *quiet NaNs* and *signalling NaNs*, and the sign bit is used to distinguish between these two types. The specification also includes a range of "payload" bits. These bit fields have no definite purpose and could be used to break determinism or exfiltrate data.
116137

117-
dCBOR encoders that support floating point MUST reduce all `NaN` values to the binary16 quiet `NaN` value having the canonical bit pattern `0x7e00`.
138+
dCBOR encoders that support floating point:
118139

119-
Similarly, encoders that support floating point MUST reduce all `+INF` values to the binary16 `+INF` having the canonical bit pattern `0x7c00` and likewise with `-INF` to `0xfc00`.
140+
1. MUST reduce all `NaN` values to the binary16 quiet `NaN` value having the canonical bit pattern `0x7e00`.
141+
2. MUST reject any other encoded `NaN` values.
142+
3. MUST reduce all `+INF` values to the binary16 `+INF` having the canonical bit pattern `0x7c00` and likewise with `-INF` to `0xfc00`.
143+
4. MUST reject any encoded `INF` or `-INF` values other than these.
120144

121145
## 65-bit Negative Integers
122146

123147
The largest negative integer that can be represented in 64-bit two's complement (STANDARD_NEGATIVE_INT_MAX) is -2^63 (0x8000000000000000).
124148

125-
However, standard CBOR major type 1 can encode negative integers as low as CBOR_NEGATIVE_INT_MAX, which is -2^64 (two's complement: 0x10000000000000000, CBOR: 0x3BFFFFFFFFFFFFFFFF). Negative integers in the range \[CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1\] require 65 bits of precision, and are thus not representable in typical machine-sized integers.
149+
However, standard CBOR major type 1 can encode negative integers as low as CBOR_NEGATIVE_INT_MAX, which is -2^64 (two's complement: 0x10000000000000000, CBOR: 0x3BFFFFFFFFFFFFFFFF).
150+
151+
Negative integers in the range \[CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1\] require 65 bits of precision, and are thus not representable in typical machine-sized integers.
152+
153+
Because of this incompatibility between standard CBOR and typical machine-size representations, dCBOR disallows encoding negative integer values in the range \[CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1\]. dCBOR codecs:
126154

127-
Because of this incompatibility between standard CBOR and typical machine-size representations, dCBOR disallows encoding negative integer values in the range \[CBOR_NEGATIVE_INT_MAX ... STANDARD_NEGATIVE_INT_MAX - 1\]: conformant encoders MUST NOT encode these values as CBOR major type 1, and conformant decoders MUST reject these major type 1 CBOR values.
155+
1. MUST NOT encode these values as CBOR major type 1.
156+
2. MUST reject these encoded major type 1 CBOR values.
128157

129158
# Reference Implementations
130159

0 commit comments

Comments
 (0)