Skip to content

Commit 6689929

Browse files
DRIVERS-3123 Relaxes requirement on reads of the ignored bits of PACKED_BIT vectors. (#1812)
1 parent db69351 commit 6689929

File tree

3 files changed

+72
-14
lines changed

3 files changed

+72
-14
lines changed

source/bson-binary-vector/bson-binary-vector.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -180,13 +180,18 @@ End Function
180180

181181
#### Validation
182182

183-
Drivers MUST validate vector metadata and raise an error if any invariant is violated:
183+
Drivers MUST validate vector metadata and raise an exception if any invariant is violated:
184184

185-
- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within [0, 7] for PACKED_BIT.
186-
- A PACKED_BIT vector MUST NOT be empty if padding is in the range [1, 7].
187-
- For a PACKED_BIT vector, ignored bits must be zero.
188185
- When unpacking binary data into a FLOAT32 Vector structure, the length of the binary data following the dtype and
189186
padding MUST be a multiple of 4 bytes.
187+
- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within [0, 7] for PACKED_BIT.
188+
- A PACKED_BIT vector MUST NOT be empty if padding is in the range [1, 7].
189+
- For a PACKED_BIT vector with non-zero padding, ignored bits SHOULD be zero.
190+
- When encoding, if ignored bits aren't zero, drivers SHOULD raise an exception, but drivers MAY leave them as-is if
191+
backwards-compatibility is a concern.
192+
- When decoding, drivers SHOULD raise an exception if decoding non-zero ignored bits, but drivers MAY choose not to
193+
for backwards compatibility.
194+
- Drivers SHOULD use the next major release to conform to ignored bits being zero.
190195

191196
Drivers MUST perform this validation when a numeric vector and padding are provided through the API, and when unpacking
192197
binary data (BSON or similar) into a Vector structure.
@@ -249,13 +254,15 @@ See the [README](tests/README.md) for tests.
249254
example in Python, see
250255
[numpy.unpackbits](https://numpy.org/doc/2.0/reference/generated/numpy.unpackbits.html#numpy.unpackbits).
251256

252-
- In PACKED_BIT, why are ignored bits required to be zero?
257+
- In PACKED_BIT, why are ignored bits recommended to be zero?
253258

254259
- To ensure the same data representation has the same encoding. For drivers supporting comparison operations, this
255260
avoids comparing different unused bits.
256261

257262
## Changelog
258263

264+
- 2025-06-23: In PACKED_BIT vectors, ignored bits MAY be zero for backwards-compatibility. Prose tests added.
265+
259266
- 2025-04-08: In PACKED_BIT vectors, ignored bits must be zero.
260267

261268
- 2025-03-07: Update tests to use Extended JSON representation of +/-Infinity. (DRIVERS-3095)

source/bson-binary-vector/tests/README.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,66 @@ MUST assert that the input float array is the same after encoding and decoding.
5555
- if the canonical_bson field is present, raise an exception when attempting to deserialize it into the corresponding
5656
numeric values, as the field contains corrupted data.
5757

58+
## Prose Tests
59+
60+
### Treatment of non-zero ignored bits
61+
62+
All drivers MUST test encoding and decoding behavior according to their design and version. For drivers that haven't
63+
been completed, raise exceptions in both cases. For those that have, update to this behavior according to semantic
64+
versioning rules, and update tests accordingly.
65+
66+
In both cases, [255], a single byte PACKED_BIT vector of length 1 (hence padding of 7) provides a good example to use,
67+
as all of its bits are ones.
68+
69+
#### 1. Encoding
70+
71+
- Test encoding with non-zero ignored bits. Use the driver API that validates vector metadata.
72+
- If the driver validates ignored bits are zero (preferred), expect an error. Otherwise expect the ignored bits are
73+
preserved.
74+
75+
```python
76+
with pytest.raises(ValueError):
77+
Binary.from_vector([0b11111111], BinaryVectorDtype.PACKED_BIT, padding=7)
78+
```
79+
80+
### 2. Decoding
81+
82+
- Test the behaviour of your driver when one attempts to decode from binary to vector.
83+
- e.g. As of pymongo 4.14, a warning is raised. From 5.0, it will be an exception.
84+
85+
```python
86+
b = Binary(b'\x10\x07\xff', subtype=9)
87+
with pytest.warns():
88+
Binary.as_vector(b)
89+
```
90+
91+
Drivers MAY skip this test if they choose not to implement a `Vector` type.
92+
93+
### 3. Comparison
94+
95+
Once we can guarantee that all ignored bits are non-zero, then equality can be tested on the binary subtype. Until then,
96+
equality is ambiguous, and depends on whether one compares by bits (uint1), or uint8. Drivers SHOULD test equality
97+
behavior according to their design and version.
98+
99+
For example, in `pymongo < 5.0`, we define equality of a BinaryVector by matching padding, dtype, and integer. This
100+
means that two single bit vectors in which 7 bits are ignored do not match unless all bits match. This mirrors what the
101+
server does.
102+
103+
```python
104+
b1 = Binary.from_vector([0b10000000], BinaryVectorDtype.PACKED_BIT, padding=7)
105+
assert b1 == Binary(b'\x10\x07\x80', subtype=9) # This is effectively a roundtrip.
106+
v1 = Binary.as_vector(b1)
107+
108+
b2 = Binary.from_vector([0b11111111], BinaryVectorDtype.PACKED_BIT, padding=7)
109+
assert b2 == Binary(b'\x10\x07\xff', subtype=9)
110+
v2 = Binary.as_vector(b2)
111+
112+
assert b1 != b2 # Unequal at naive Binary level
113+
assert v2 != v1 # Also chosen to be unequal at BinaryVector level as [255] != [128]
114+
```
115+
116+
Drivers MAY skip this test if they choose not to implement a `Vector` type.
117+
58118
## FAQ
59119

60120
- What MongoDB Server version does this apply to?

source/bson-binary-vector/tests/packed_bit.json

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,6 @@
2929
"padding": 3,
3030
"canonical_bson": "1600000005766563746F7200040000000910037F0800"
3131
},
32-
{
33-
"description": "PACKED_BIT with inconsistent padding",
34-
"valid": false,
35-
"vector": [127, 7],
36-
"dtype_hex": "0x10",
37-
"dtype_alias": "PACKED_BIT",
38-
"padding": 3,
39-
"canonical_bson": "1600000005766563746F7200040000000910037F0700"
40-
},
4132
{
4233
"description": "Empty Vector PACKED_BIT",
4334
"valid": true,

0 commit comments

Comments
 (0)