-
Notifications
You must be signed in to change notification settings - Fork 245
DRIVERS-3123 Require ignored bits in BSON binary vector PACKED_BITS to be zero. #1783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
c13784b
2d9565d
f270a25
d7ccb03
ea265ea
87f75e1
9461d62
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,6 +184,7 @@ Drivers MUST validate vector metadata and raise an error if any invariant is vio | |
|
||
- Padding MUST be 0 for all dtypes where padding doesn’t apply, and MUST be within \[0, 7\] for PACKED_BIT. | ||
- A PACKED_BIT vector MUST NOT be empty if padding is in the range \[1, 7\]. | ||
- For a PACKED_BIT vector, bits lower than the given padding (those ignored) must be zero. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this for Encoding and Decoding? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. That's what we came to, and exceptions are thrown in the Python implementation in both cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, it's specified a few lines after: |
||
- When unpacking binary data into a FLOAT32 Vector structure, the length of the binary data following the dtype and | ||
padding MUST be a multiple of 4 bytes. | ||
kevinAlbs marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
@@ -237,16 +238,26 @@ See the [README](tests/README.md) for tests. | |
## FAQ | ||
|
||
- What MongoDB Server version does this apply to? | ||
|
||
- Files in the "specifications" repository have no version scheme. They are not tied to a MongoDB server version. | ||
|
||
- In PACKED_BIT, why would one choose to use integers in \[0, 256)? | ||
|
||
- This follows a well-established precedent for packing binary-valued arrays into bytes (8 bits), This technique is | ||
widely used across different fields, such as data compression, communication protocols, and file formats, where | ||
you want to store or transmit binary data more efficiently by grouping 8 bits into a single byte (uint8). For an | ||
example in Python, see | ||
[numpy.unpackbits](https://numpy.org/doc/2.0/reference/generated/numpy.unpackbits.html#numpy.unpackbits). | ||
|
||
- In PACKED_BIT, why are ignored bits required to be zero? | ||
|
||
- To ensure the same data representation has the same encoding. For drivers supporting comparison operations, this | ||
avoids comparing different unused bits. | ||
|
||
## Changelog | ||
|
||
- 2025-04-08: In PACKED_BIT vectors, bits lower than the given padding (those ignored) must be zero. | ||
|
||
- 2025-03-07: Update tests to use Extended JSON representation of +/-Infinity. (DRIVERS-3095) | ||
|
||
- 2025-02-04: Update validation for decoding into a FLOAT32 vector. | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -18,26 +18,35 @@ | |||||
"dtype_hex": "0x10", | ||||||
"dtype_alias": "PACKED_BIT", | ||||||
"padding": 0, | ||||||
"canonical_bson": "1600000005766563746F7200040000000910007F0700" | ||||||
"canonical_bson": "1600000005766563746F7200040000000910007F0800" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
vbabanin marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
}, | ||||||
{ | ||||||
"description": "Empty Vector PACKED_BIT", | ||||||
"description": "PACKED_BIT with padding", | ||||||
"valid": true, | ||||||
"vector": [], | ||||||
"vector": [127, 8], | ||||||
"dtype_hex": "0x10", | ||||||
"dtype_alias": "PACKED_BIT", | ||||||
"padding": 0, | ||||||
"canonical_bson": "1400000005766563746F72000200000009100000" | ||||||
"padding": 3, | ||||||
"canonical_bson": "1600000005766563746F7200040000000910037F0800" | ||||||
}, | ||||||
{ | ||||||
"description": "PACKED_BIT with padding", | ||||||
"valid": true, | ||||||
"description": "PACKED_BIT with inconsistent padding", | ||||||
"valid": false, | ||||||
"vector": [127, 7], | ||||||
"dtype_hex": "0x10", | ||||||
"dtype_alias": "PACKED_BIT", | ||||||
"padding": 3, | ||||||
"canonical_bson": "1600000005766563746F7200040000000910037F0700" | ||||||
}, | ||||||
{ | ||||||
"description": "Empty Vector PACKED_BIT", | ||||||
"valid": true, | ||||||
"vector": [], | ||||||
"dtype_hex": "0x10", | ||||||
"dtype_alias": "PACKED_BIT", | ||||||
"padding": 0, | ||||||
"canonical_bson": "1400000005766563746F72000200000009100000" | ||||||
}, | ||||||
{ | ||||||
"description": "Overflow Vector PACKED_BIT", | ||||||
"valid": false, | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, it could be helpful adding some specificity about bit order here since the description is correct only if bits are interpreted in the opposite order used for packed_bit encoding. (packed_bit is otherwise MSB first, but this is referring to the LSB as 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to. Would you please suggest a short alternative? Any longer description that you'd like to make should be done in a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest rewording to avoid "lower":
I expect the phrase "The least-significant bits are ignored." above clarifies which bits are ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kevinAlbs. Updated.