Skip to content

Commit 8bdee88

Browse files
authored
CSHARP-5202: BSON Binary Vector Subtype Support (#1581)
1 parent 52903dc commit 8bdee88

26 files changed

+2273
-129
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Testing Binary subtype 9: Vector
2+
3+
The JSON files in this directory tree are platform-independent tests that drivers can use to prove their conformance to
4+
the specification.
5+
6+
These tests focus on the roundtrip of the list of numbers as input/output, along with their data type and byte padding.
7+
8+
Additional tests exist in `bson_corpus/tests/binary.json` but do not sufficiently test the end-to-end process of Vector
9+
to BSON. For this reason, drivers must create a bespoke test runner for the vector subtype.
10+
11+
## Format
12+
13+
The test data corpus consists of a JSON file for each data type (dtype). Each file contains a number of test cases,
14+
under the top-level key "tests". Each test case pertains to a single vector. The keys provide the specification of the
15+
vector. Valid cases also include the Canonical BSON format of a document {test_key: binary}. The "test_key" is common,
16+
and specified at the top level.
17+
18+
#### Top level keys
19+
20+
Each JSON file contains three top-level keys.
21+
22+
- `description`: human-readable description of what is in the file
23+
- `test_key`: name used for key when encoding/decoding a BSON document containing the single BSON Binary for the test
24+
case. Applies to *every* case.
25+
- `tests`: array of test case objects, each of which have the following keys. Valid cases will also contain additional
26+
binary and json encoding values.
27+
28+
#### Keys of individual tests cases
29+
30+
- `description`: string describing the test.
31+
- `valid`: boolean indicating if the vector, dtype, and padding should be considered a valid input.
32+
- `vector`: (required if valid is true) list of numbers
33+
- `dtype_hex`: string defining the data type in hex (e.g. "0x10", "0x27")
34+
- `dtype_alias`: (optional) string defining the data dtype, perhaps as Enum.
35+
- `padding`: (optional) integer for byte padding. Defaults to 0.
36+
- `canonical_bson`: (required if valid is true) an (uppercase) big-endian hex representation of a BSON byte string.
37+
38+
## Required tests
39+
40+
#### To prove correct in a valid case (`valid: true`), one MUST
41+
42+
- encode a document from the numeric values, dtype, and padding, along with the "test_key", and assert this matches the
43+
canonical_bson string.
44+
- decode the canonical_bson into its binary form, and then assert that the numeric values, dtype, and padding all match
45+
those provided in the JSON.
46+
47+
Note: For floating point number types, exact numerical matches may not be possible. Drivers that natively support the
48+
floating-point type being tested (e.g., when testing float32 vector values in a driver that natively supports float32),
49+
MUST assert that the input float array is the same after encoding and decoding.
50+
51+
#### To prove correct in an invalid case (`valid:false`), one MUST
52+
53+
- if the vector field is present, raise an exception when attempting to encode a document from the numeric values,
54+
dtype, and padding.
55+
- if the canonical_bson field is present, raise an exception when attempting to deserialize it into the corresponding
56+
numeric values, as the field contains corrupted data.
57+
58+
## FAQ
59+
60+
- What MongoDB Server version does this apply to?
61+
- Files in the "specifications" repository have no version scheme. They are not tied to a MongoDB server version.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
{
2+
"description": "Tests of Binary subtype 9, Vectors, with dtype FLOAT32",
3+
"test_key": "vector",
4+
"tests": [
5+
{
6+
"description": "Simple Vector FLOAT32",
7+
"valid": true,
8+
"vector": [127.0, 7.0],
9+
"dtype_hex": "0x27",
10+
"dtype_alias": "FLOAT32",
11+
"padding": 0,
12+
"canonical_bson": "1C00000005766563746F72000A0000000927000000FE420000E04000"
13+
},
14+
{
15+
"description": "Vector with decimals and negative value FLOAT32",
16+
"valid": true,
17+
"vector": [127.7, -7.7],
18+
"dtype_hex": "0x27",
19+
"dtype_alias": "FLOAT32",
20+
"padding": 0,
21+
"canonical_bson": "1C00000005766563746F72000A0000000927006666FF426666F6C000"
22+
},
23+
{
24+
"description": "Empty Vector FLOAT32",
25+
"valid": true,
26+
"vector": [],
27+
"dtype_hex": "0x27",
28+
"dtype_alias": "FLOAT32",
29+
"padding": 0,
30+
"canonical_bson": "1400000005766563746F72000200000009270000"
31+
},
32+
{
33+
"description": "Infinity Vector FLOAT32",
34+
"valid": true,
35+
"vector": ["-inf", 0.0, "inf"],
36+
"dtype_hex": "0x27",
37+
"dtype_alias": "FLOAT32",
38+
"padding": 0,
39+
"canonical_bson": "2000000005766563746F72000E000000092700000080FF000000000000807F00"
40+
},
41+
{
42+
"description": "FLOAT32 with padding",
43+
"valid": false,
44+
"vector": [127.0, 7.0],
45+
"dtype_hex": "0x27",
46+
"dtype_alias": "FLOAT32",
47+
"padding": 3,
48+
"canonical_bson": "1C00000005766563746F72000A0000000927030000FE420000E04000"
49+
},
50+
{
51+
"description": "Insufficient vector data with 3 bytes FLOAT32",
52+
"valid": false,
53+
"dtype_hex": "0x27",
54+
"dtype_alias": "FLOAT32",
55+
"canonical_bson": "1700000005766563746F7200050000000927002A2A2A00"
56+
},
57+
{
58+
"description": "Insufficient vector data with 5 bytes FLOAT32",
59+
"valid": false,
60+
"dtype_hex": "0x27",
61+
"dtype_alias": "FLOAT32",
62+
"canonical_bson": "1900000005766563746F7200070000000927002A2A2A2A2A00"
63+
}
64+
]
65+
}
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
{
2+
"description": "Tests of Binary subtype 9, Vectors, with dtype INT8",
3+
"test_key": "vector",
4+
"tests": [
5+
{
6+
"description": "Simple Vector INT8",
7+
"valid": true,
8+
"vector": [127, 7],
9+
"dtype_hex": "0x03",
10+
"dtype_alias": "INT8",
11+
"padding": 0,
12+
"canonical_bson": "1600000005766563746F7200040000000903007F0700"
13+
},
14+
{
15+
"description": "Empty Vector INT8",
16+
"valid": true,
17+
"vector": [],
18+
"dtype_hex": "0x03",
19+
"dtype_alias": "INT8",
20+
"padding": 0,
21+
"canonical_bson": "1400000005766563746F72000200000009030000"
22+
},
23+
{
24+
"description": "Overflow Vector INT8",
25+
"valid": false,
26+
"vector": [128],
27+
"dtype_hex": "0x03",
28+
"dtype_alias": "INT8",
29+
"padding": 0
30+
},
31+
{
32+
"description": "Underflow Vector INT8",
33+
"valid": false,
34+
"vector": [-129],
35+
"dtype_hex": "0x03",
36+
"dtype_alias": "INT8",
37+
"padding": 0
38+
},
39+
{
40+
"description": "INT8 with padding",
41+
"valid": false,
42+
"vector": [127, 7],
43+
"dtype_hex": "0x03",
44+
"dtype_alias": "INT8",
45+
"padding": 3,
46+
"canonical_bson": "1600000005766563746F7200040000000903037F0700"
47+
},
48+
{
49+
"description": "INT8 with float inputs",
50+
"valid": false,
51+
"vector": [127.77, 7.77],
52+
"dtype_hex": "0x03",
53+
"dtype_alias": "INT8",
54+
"padding": 0
55+
}
56+
]
57+
}
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
{
2+
"description": "Tests of Binary subtype 9, Vectors, with dtype PACKED_BIT",
3+
"test_key": "vector",
4+
"tests": [
5+
{
6+
"description": "Padding specified with no vector data PACKED_BIT",
7+
"valid": false,
8+
"vector": [],
9+
"dtype_hex": "0x10",
10+
"dtype_alias": "PACKED_BIT",
11+
"padding": 1,
12+
"canonical_bson": "1400000005766563746F72000200000009100100"
13+
},
14+
{
15+
"description": "Simple Vector PACKED_BIT",
16+
"valid": true,
17+
"vector": [127, 7],
18+
"dtype_hex": "0x10",
19+
"dtype_alias": "PACKED_BIT",
20+
"padding": 0,
21+
"canonical_bson": "1600000005766563746F7200040000000910007F0700"
22+
},
23+
{
24+
"description": "Empty Vector PACKED_BIT",
25+
"valid": true,
26+
"vector": [],
27+
"dtype_hex": "0x10",
28+
"dtype_alias": "PACKED_BIT",
29+
"padding": 0,
30+
"canonical_bson": "1400000005766563746F72000200000009100000"
31+
},
32+
{
33+
"description": "PACKED_BIT with padding",
34+
"valid": true,
35+
"vector": [127, 7],
36+
"dtype_hex": "0x10",
37+
"dtype_alias": "PACKED_BIT",
38+
"padding": 3,
39+
"canonical_bson": "1600000005766563746F7200040000000910037F0700"
40+
},
41+
{
42+
"description": "Overflow Vector PACKED_BIT",
43+
"valid": false,
44+
"vector": [256],
45+
"dtype_hex": "0x10",
46+
"dtype_alias": "PACKED_BIT",
47+
"padding": 0
48+
},
49+
{
50+
"description": "Underflow Vector PACKED_BIT",
51+
"valid": false,
52+
"vector": [-1],
53+
"dtype_hex": "0x10",
54+
"dtype_alias": "PACKED_BIT",
55+
"padding": 0
56+
},
57+
{
58+
"description": "Vector with float values PACKED_BIT",
59+
"valid": false,
60+
"vector": [127.5],
61+
"dtype_hex": "0x10",
62+
"dtype_alias": "PACKED_BIT",
63+
"padding": 0
64+
},
65+
{
66+
"description": "Exceeding maximum padding PACKED_BIT",
67+
"valid": false,
68+
"vector": [1],
69+
"dtype_hex": "0x10",
70+
"dtype_alias": "PACKED_BIT",
71+
"padding": 8,
72+
"canonical_bson": "1500000005766563746F7200030000000910080100"
73+
},
74+
{
75+
"description": "Negative padding PACKED_BIT",
76+
"valid": false,
77+
"vector": [1],
78+
"dtype_hex": "0x10",
79+
"dtype_alias": "PACKED_BIT",
80+
"padding": -1
81+
}
82+
]
83+
}

specifications/bson-corpus/tests/binary.json

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,36 @@
7474
"description": "$type query operator (conflicts with legacy $binary form with $type field)",
7575
"canonical_bson": "180000000378001000000010247479706500020000000000",
7676
"canonical_extjson": "{\"x\" : { \"$type\" : {\"$numberInt\": \"2\"}}}"
77+
},
78+
{
79+
"description": "subtype 0x09 Vector FLOAT32",
80+
"canonical_bson": "170000000578000A0000000927000000FE420000E04000",
81+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwAAAP5CAADgQA==\", \"subType\": \"09\"}}}"
82+
},
83+
{
84+
"description": "subtype 0x09 Vector INT8",
85+
"canonical_bson": "11000000057800040000000903007F0700",
86+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwB/Bw==\", \"subType\": \"09\"}}}"
87+
},
88+
{
89+
"description": "subtype 0x09 Vector PACKED_BIT",
90+
"canonical_bson": "11000000057800040000000910007F0700",
91+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAB/Bw==\", \"subType\": \"09\"}}}"
92+
},
93+
{
94+
"description": "subtype 0x09 Vector (Zero-length) FLOAT32",
95+
"canonical_bson": "0F0000000578000200000009270000",
96+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwA=\", \"subType\": \"09\"}}}"
97+
},
98+
{
99+
"description": "subtype 0x09 Vector (Zero-length) INT8",
100+
"canonical_bson": "0F0000000578000200000009030000",
101+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwA=\", \"subType\": \"09\"}}}"
102+
},
103+
{
104+
"description": "subtype 0x09 Vector (Zero-length) PACKED_BIT",
105+
"canonical_bson": "0F0000000578000200000009100000",
106+
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAA=\", \"subType\": \"09\"}}}"
77107
}
78108
],
79109
"decodeErrors": [

0 commit comments

Comments
 (0)