You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This replaces the current `--ignoreFieldOrder` flag with a new `--docCompareMethod` parameter that accepts 3 values: the 2 old modes, and a new one that compares documents via $toHashedIndexKey.
This is not set as the default because of the precision issues described in the README’s new section. The performance gains, though, probably outweigh those concerns for most migrations.
In a reference test, an initial scan that took 37 minutes with `binary` comparison took under 15 minutes with the new method.
Copy file name to clipboardExpand all lines: README.md
+33-1Lines changed: 33 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,7 +131,7 @@ The verifier will now check to completion to make sure that there are no inconsi
131
131
| `--srcNamespace <namespaces>` | source namespaces to check |
132
132
| `--dstNamespace <namespaces>` | destination namespaces to check |
133
133
| `--metaDBName <name>` | name of the database in which to store verification metadata (default: "migration_verification_metadata") |
134
-
| `--ignoreFieldOrder` | Whether or not field order is ignored in documents |
134
+
| `--docCompareMethod` | How to compare documents. See below for details. |
135
135
| `--verifyAll` | If set, verify all user namespaces |
136
136
| `--clean` | If set, drop all previous verification metadata before starting |
137
137
| `--readPreference <value>` | Read preference for reading data from clusters. May be 'primary', 'secondary', 'primaryPreferred', 'secondaryPreferred', or 'nearest' (default: "primary") |
@@ -312,6 +312,38 @@ The migration-verifier optimizes for the case where a migration’s initial sync
312
312
313
313
The migration-verifier is also rather resource-hungry. To mitigate this, try limiting its number of workers (i.e., `--numWorkers`), its partition size (`--partitionSizeMB`), and/or its process group’s resource limits (see the `ulimit` command in POSIX OSes).
314
314
315
+
# Document comparison methods
316
+
317
+
## `binary`
318
+
319
+
The default. This establishes full binary equivalence, including field order and all types.
320
+
321
+
## `ignoreFieldOrder`
322
+
323
+
Like `binary` but ignores the ordering of fields. Incurs extra overhead on this host.
324
+
325
+
## `toHashedIndexKey`
326
+
327
+
Compares document hashes (and lengths) rather than full documents. This minimizes the data sent to migration-verifier, which can dramatically shorten verification time.
328
+
329
+
It carries a few downsides, though:
330
+
331
+
### Lost precision
332
+
333
+
This method ignores certain type changes if the underlying value remains the same. For example, if a Long changes to a Double, and the two values are identical, `toHashedIndexKey` will not notice the discrepancy.
334
+
335
+
The discrepancy _will_, though, usually be seen if the BSON types are of different lengths. For example, if a Long changes to Decimal, `toHashedIndexKey` will notice that.
336
+
337
+
If, however, _multiple_ numeric type changes happen, then `toHashedIndexKey` will only notice the discrepancy if the total document length changes. For example, if an Int changes to a Long, but elsewhere a Long changes to an Int, that will evade notice.
338
+
339
+
The above are all, of course, **highly** unlikely in real-world migrations.
340
+
341
+
### Lost reporting
342
+
343
+
Full-document verification methods allow migration-verifier to diagnose mismatches, e.g., by identifying specific changed fields. The only such detail that `toHashedIndexKey` can discern, though, is a change in document length.
344
+
345
+
Additionally, because the amount of data sent to migration-verifier doesn’t actually reflect the documents’ size, no meaningful statistics are shown concerning the collection data size. Document counts, of course, are still shown.
346
+
315
347
# Known Issues
316
348
317
349
- The verifier may report missing documents on the destination that don’t actually appear to be missing (i.e., a nonexistent problem). This has been hard to reproduce. If missing documents are reported, it is good practice to check for false positives.
0 commit comments