Skip to content

Commit bca252e

Browse files
authored
REP-6129 Accommodate type bracketing for pre-v5 find queries (#118)
Pre-v5 server versions don’t recognize indexes in $expr-using partition queries. As a result, such queries entail a collection scan. To prevent the performance problems that that would create, migration-verifier avoids $expr when querying partitions against these server versions. Thus, migration-verifier’s partition queries have been type-bracketed; i.e., they only return results for types in the query itself. For example, consider a query for all documents with `_id` between MinKey and ObjectID(...). An $expr query will return any document with an `_id` that sorts between those two values. A non-$expr query, though, will only return documents with an `_id` of one of those 2 types, despite the fact that many types—including strings and all numeric types—sort between MinKey and ObjectID. As a result of its non-$expr queries, migration-verifier has been quietly skipping documents when querying pre-v5 servers. In migrations to v5+ it’s a minor concern because whatever documents migration-verifier skips on the source will be recorded as “missing” in generation 0 but will “magically” appear in generation 1. (The only problem is when large numbers of “missing” documents happen, which caused REP-6088.) In migrations to pre-v5 versions, though, migration-verifier will simply skip documents. This changeset fixes this problem in pre-v5 partition queries by: - determining the partition bounds’ BSON types - determining the BSON types that type bracketing will exclude - adding an $or clause that includes documents with `_id`s of those types The above example, then, becomes: ``` (_id >= MinKey OR type(_id) > MinKey) AND (_id <= ObjectID(...) OR type(_id) < ObjectID) ``` NB: mongosync solves this problem differently. It uses $expr but adds hint, min, and max to the query. This properly constrains the query, but it excludes the partition’s upper bound. To compensate for that, monosync adds a separate, single-value partition to check the partition’s upper bound. This changeset uses the above-described approach instead because it seems simpler and less disruptive.
1 parent 089607f commit bca252e

File tree

5 files changed

+737
-37
lines changed

5 files changed

+737
-37
lines changed

internal/partitions/bson.go

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
package partitions
2+
3+
import (
4+
"slices"
5+
6+
"github.com/10gen/migration-verifier/mslices"
7+
"github.com/pkg/errors"
8+
"github.com/samber/lo"
9+
"go.mongodb.org/mongo-driver/bson"
10+
"go.mongodb.org/mongo-driver/bson/bsontype"
11+
)
12+
13+
// -------------------------------------------------------------------------
14+
// The sort order defined here derives from:
15+
//
16+
// https://www.mongodb.com/docs/manual/reference/bson-type-comparison-order/
17+
//
18+
// Everything there was verified empirically as part of writing this code.
19+
// Note that, at least as of this writing, that page erroneously omits
20+
// JS Code with Scope in the sort order. The observed behavior in MongoDB 4.4
21+
// is that that type sorts between JavaScript Code and MaxKey.
22+
// ------------------------------------------------------------------------
23+
24+
var numericTypes = mslices.Of(
25+
bson.TypeInt32,
26+
bson.TypeInt64,
27+
bson.TypeDouble,
28+
bson.TypeDecimal128,
29+
)
30+
31+
var stringTypes = mslices.Of(
32+
bson.TypeString,
33+
bson.TypeSymbol,
34+
)
35+
36+
// NB: The server forbids arrays, undefined, and regexes as _id values.
37+
// That simplifies the following greatly because all of those behave
38+
// weirdly at various times:
39+
// - 0-element arrays sort as null.
40+
// - 1-element arrays sort as their contained element.
41+
// - undefined & null sometimes match strangely. (This changed in 8.0.)
42+
// - simple matches against regex actually run the regex.
43+
var bsonTypeSortOrder = lo.Flatten(mslices.Of(
44+
mslices.Of(
45+
bson.TypeMinKey,
46+
bson.TypeNull,
47+
),
48+
numericTypes,
49+
stringTypes,
50+
mslices.Of(
51+
bson.TypeEmbeddedDocument,
52+
bson.TypeBinary,
53+
bson.TypeObjectID,
54+
bson.TypeBoolean,
55+
bson.TypeDateTime,
56+
bson.TypeTimestamp,
57+
bson.TypeDBPointer,
58+
bson.TypeJavaScript,
59+
bson.TypeCodeWithScope,
60+
bson.TypeMaxKey,
61+
),
62+
))
63+
64+
var bsonTypeString = map[bsontype.Type]string{
65+
bson.TypeMinKey: "minKey",
66+
bson.TypeNull: "null",
67+
bson.TypeBoolean: "bool",
68+
bson.TypeInt32: "int",
69+
bson.TypeInt64: "long",
70+
bson.TypeDouble: "double",
71+
bson.TypeDecimal128: "decimal",
72+
bson.TypeString: "string",
73+
bson.TypeSymbol: "symbol",
74+
bson.TypeObjectID: "objectId",
75+
bson.TypeDateTime: "date",
76+
bson.TypeTimestamp: "timestamp",
77+
bson.TypeJavaScript: "javascript",
78+
bson.TypeCodeWithScope: "javascriptWithScope",
79+
bson.TypeEmbeddedDocument: "object",
80+
bson.TypeBinary: "binData",
81+
bson.TypeDBPointer: "dbPointer",
82+
bson.TypeMaxKey: "maxKey",
83+
}
84+
85+
// This returns BSON types that the server’s type bracketing excludes from
86+
// query results when matching against the given value.
87+
//
88+
// The returned slices are types before & after, respectively. They are
89+
// strings rather than bsontype.Type to facilitate easy insertion into queries.
90+
//
91+
// This is kind of like strings.Cut() but against the sort-ordered list of BSON
92+
// types, except that if the given value is a number or string-like, then other
93+
// “like” types will not be in the returned slices.
94+
func getTypeBracketExcludedBSONTypes(val any) ([]string, []string, error) {
95+
bsonType, _, err := bson.MarshalValue(val)
96+
if err != nil {
97+
return nil, nil, errors.Wrapf(err, "marshaling min value (%v)", val)
98+
}
99+
100+
curSortOrder := slices.Index(bsonTypeSortOrder, bsonType)
101+
if curSortOrder < 0 {
102+
return nil, nil, errors.Errorf("go value (%T: %v) marshaled to BSON %s, which is invalid", val, val, bsonType)
103+
}
104+
105+
earlier := bsonTypeSortOrder[:curSortOrder]
106+
later := bsonTypeSortOrder[1+curSortOrder:]
107+
108+
// If the given value is, e.g., an int, then we need to omit
109+
// other numeric types from the returned slices. If we don’t, then we’re
110+
// telling the caller to query on, e.g., [_id >= 123 OR type is double],
111+
// which would match something like float64(12), which, of course, we
112+
// don’t want.
113+
//
114+
// For the same reason, we need the same exclusion for string vs. symbol.
115+
if slices.Contains(numericTypes, bsonType) {
116+
earlier = lo.Without(earlier, numericTypes...)
117+
later = lo.Without(later, numericTypes...)
118+
} else if slices.Contains(stringTypes, bsonType) {
119+
earlier = lo.Without(earlier, stringTypes...)
120+
later = lo.Without(later, stringTypes...)
121+
}
122+
123+
return typesToStrings(earlier), typesToStrings(later), nil
124+
}
125+
126+
func typesToStrings(in []bsontype.Type) []string {
127+
return lo.Map(
128+
in,
129+
func(t bsontype.Type, _ int) string {
130+
return bsonTypeString[t]
131+
},
132+
)
133+
}

0 commit comments

Comments
 (0)