You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+44-44Lines changed: 44 additions & 44 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,32 +1,44 @@
1
1
# Verify Migrations!
2
2
3
-
_If verifying a migration done via [mongosync](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/), please check if it is possible to use the
3
+
_If verifying a migration done via [mongosync](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/), please check if it is possible to use the
4
4
[embedded verifier](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/verification/embedded/#std-label-c2c-embedded-verifier) as that is the preferred approach for verification._
5
5
6
-
# Obtaining
7
-
To fetch the latest release:
6
+
# Quick Start
7
+
8
+
Download the verifier’s latest release:
8
9
```
9
10
curl -sSL https://raw.githubusercontent.com/mongodb-labs/migration-verifier/refs/heads/main/download_latest.sh | sh
10
11
```
11
-
… or, if you prefer to build locally, just do:
12
+
(Alternatively, you can check out this repository then `./build.sh` to build from source.)
13
+
14
+
Then start a local replica set to store verification metadata:
12
15
```
13
-
./build.sh
16
+
docker run -it -p27017:27017 -v ./verifier_db:/data/db --entrypoint bash mongodb/mongodb-community-server -c 'mongod --bind_ip_all --replSet rs & mpid=$! && until mongosh --eval "rs.initiate()"; do sleep 1; done && wait $mpid'
14
17
```
18
+
(This will create a local `verifier_db` directory so that you can resume verification if needed.)
15
19
16
-
# Operational UX Once Running
17
-
18
-
_Assumes no port set, default port for operation webserver is 27020_
19
-
20
-
# Recommendations
20
+
Finally, run verification:
21
+
```
22
+
./migration_verifier \
23
+
--srcURI mongodb://your.source.cluster \
24
+
--dstURI mongodb://your.destination.cluster \
25
+
--serverPort 0 \
26
+
--verifyAll \
27
+
--start
28
+
```
29
+
The above will stream verification logs to standard output. Once writes stop,
30
+
watch for change stream lag to hit 0. The log will report either the found
31
+
mismatches or a confirmation of exact match between the clusters.
21
32
22
33
# Verifier Metadata Considerations
23
34
24
-
migration-verifier needs a database to store its state. This database SHOULD be on its own cluster.
35
+
migration-verifier needs a MongoDB cluster to store its state. This cluster *must* support transactions (i.e., either a replica set or sharded cluster, NOT a standalone instance). By default, this is assumed to run on localhost:27017.
25
36
26
-
The verifier _can_ instead store its metadata on the destination cluster. This can severely degrade performance, though.
27
-
It also requires either disabling mongosync’s destination write blocking or giving the `bypassWriteBlockingMode` to the verifier’s `--metaURI` user.
37
+
See [above](#Quick-Start) for a one-line command to start up a local, single-node replica set that you can use for this purpose.
28
38
29
-
## Launch the Verifier Binary
39
+
The verifier can alternatively store its metadata on the destination cluster. This can severely degrade performance, though. Also, if you’re using mongosync, it requires either disabling mongosync’s destination write blocking or giving the `bypassWriteBlockingMode` to the verifier’s `--metaURI` user.
@@ -70,7 +82,7 @@ To set a port, use `--serverPort <port number>`. The default is 27020. Note that
70
82
71
83
If you give 0 as the port, a random ephemeral port will be chosen. The log will show the chosen port, and you may also query the OS to learn it (e.g., `lsof -a -iTCP -sTCP:LISTEN -p <pid>`).
72
84
73
-
###Using a configuration file
85
+
## Using a configuration file
74
86
75
87
To load configuration options from a YAML configuration file, use the `--configFile` parameter.
1. After launching the verifier (see above), you can send it requests to get it to start verifying. The verification process is started by using the `check`command. An [optional `filter` parameter](#document-filtering) can be passed within the `check` request body to only check documents within that filter. The verification process will keep running until you tell the verifier to stop. It will keep track of the inconsistencies it has found and will keep checking those inconsistencies hoping that eventually they will resolve.
100
+
1. After launching the verifier (see above), you can send it requests to get it to start verifying. If you don’t pass the `--start` parameter, verification is started by using the `check` command. An [optional `filter` parameter](#document-filtering) can be passed within the `check` request body to only check documents within that filter. The verification process will keep running until you tell the verifier to stop. It will keep track of the inconsistencies it has found and will keep checking those inconsistencies hoping that eventually they will resolve.
2. Once mongosync has committed the replication, you can tell the verifier that writes have stopped. You can see the state of mongosync’s replication by hitting mongosync’s `progress` endpoint and checking that the state is `COMMITTED`. See the documentation [here](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/api/progress/#response). \
98
-
The verifier will now check to completion to make sure that there are no inconsistencies. The command you need to send the verifier to tell it that the replication is committed is `writesOff`. The command doesn’t block. This means that you will have to poll the verifier to see the status of the verification (see `progress`).
107
+
2. Once writes on the source cluster have stopped, you can tell the verifier that writes have stopped. (You can see the state of mongosync’s replication by hitting mongosync’s `progress` endpoint and checking that the state is `COMMITTED`. See the documentation [here](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/api/progress/#response)). \
108
+
The verifier will now check to completion to make sure that there are no inconsistencies. The command you need to send the verifier here is `writesOff`. The command doesn’t block. This means that you will have to poll the verifier, or watch its logs, to see the status of the verification (see `progress`).
99
109
100
110
```
101
111
curl -H "Content-Type: application/json" -X POST -d '{}' http://127.0.0.1:27020/api/v1/writesOff
@@ -135,6 +145,7 @@ The verifier will now check to completion to make sure that there are no inconsi
135
145
| `--dstNamespace <namespaces>` | destination namespaces to check |
136
146
| `--metaDBName <name>` | name of the database in which to store verification metadata (default: "migration_verification_metadata") |
137
147
| `--docCompareMethod` | How to compare documents. See below for details. |
148
+
| `--start` | Start checking documents right away rather than waiting for a `/check` API request. |
138
149
| `--verifyAll` | If set, verify all user namespaces |
139
150
| `--clean` | If set, drop all previous verification metadata before starting |
140
151
| `--readPreference <value>` | Read preference for reading data from clusters. May be 'primary', 'secondary', 'primaryPreferred', 'secondaryPreferred', or 'nearest' (default: "primary") |
@@ -171,19 +182,6 @@ generation’s mismatches, aggregate like this on the metadata cluster:
171
182
Note that each mismatch includes timestamps. You can cross-reference
172
183
these with the clusters’ oplogs to diagnose problems.
173
184
174
-
# Benchmarking Results
175
-
176
-
Ran on m6id.metal + M40 with 3 replica sets
177
-
178
-
Command run python3 ./test/benchmark.py --way=recheck remote
179
-
180
-
When running with 1TB of random data on 3 collections
181
-
182
-
**In recheck and normal mode it runs at 1.5-2.5gbps per replica** and is **disk bound on each node** (meaning there are not of easy optimizations to make this faster) \
183
-
On default settings it used about **200GB of RAM on m6id.metal machine when using all the cores**
184
-
185
-
**This means it does about 1TB/20min but it is HIGHLY dependent on the source and dest machines**
186
-
187
185
# Tests
188
186
189
187
This project’s tests run as normal Go tests, to, with `go test`.
@@ -311,9 +309,11 @@ The migration-verifier periodically persists its change stream’s resume token
311
309
312
310
# Performance
313
311
314
-
The migration-verifier optimizes for the case where a migration’s initial sync is completed **and** change events are relatively infrequent. If you start verification before initial sync finishes, or if the source cluster is too busy, the verification may freeze.
312
+
The verifier has been observed handling test source write loads of 15,000 writes per second. Real-world performance will vary according to several factors, including network latency, cluster resources, and the verifier node’s resources.
313
+
314
+
## Per-shard verification
315
315
316
-
The migration-verifier is also rather resource-hungry. To mitigate this, try limiting its number of workers (i.e., `--numWorkers`), its partition size (`--partitionSizeMB`), and/or its process group’s resource limits (see the `ulimit` command in POSIX OSes).
316
+
If migrating shard-to-shard, you can also verify shard-to-shard to scale verification horizontally. Run 1 verifier per source shard. You can colocate all verifiers’ metadata on the same metadata cluster, but each verifier must use its own database (e.g., `verify90`, `verify1`, …). If that metadata cluster buckles under the load, consider splitting verification across multiple hosts.
317
317
318
318
# Document comparison methods
319
319
@@ -323,11 +323,11 @@ The default. This establishes full binary equivalence, including field order and
323
323
324
324
## `ignoreFieldOrder`
325
325
326
-
Like `binary` but ignores the ordering of fields. Incurs extra overhead on this host.
326
+
Like `binary` but ignores the ordering of fields. Incurs extra overhead on the verifier host.
327
327
328
328
## `toHashedIndexKey`
329
329
330
-
Compares document hashes (and lengths) rather than full documents. This minimizes the data sent to migration-verifier, which can dramatically shorten verification time.
330
+
Compares document hashes (and lengths) rather than full documents. This minimizes the data sent to migration-verifier, which can dramatically increase performance.
331
331
332
332
It carries a few downsides, though:
333
333
@@ -339,7 +339,7 @@ The discrepancy _will_, though, usually be seen if the BSON types are of differe
339
339
340
340
If, however, _multiple_ numeric type changes happen, then `toHashedIndexKey` will only notice the discrepancy if the total document length changes. For example, if an Int changes to a Long, but elsewhere a Long changes to an Int, that will evade notice.
341
341
342
-
The above are all, of course, **highly** unlikely in real-world migrations.
342
+
The above are all **highly** unlikely in real-world migrations.
343
343
344
344
### Lost reporting
345
345
@@ -359,6 +359,6 @@ Additionally, because the amount of data sent to migration-verifier doesn’t ac
359
359
360
360
# Limitations
361
361
362
-
- The verifier’s iterative process can handle data changes while it is running, until you hit the writesOff endpoint. However, it cannot handle DDL commands. If the verifier receives a DDL change stream event (drop, dropDatabase, rename), the verification will fail. If an untracked DDL event (create, createIndexes, dropIndexes, modify) occurs, the verifier may miss the change.
362
+
- The verifier’s iterative process can handle data changes while it is running, until you hit the writesOff endpoint. However, it cannot handle DDL commands. If the verifier receives a DDL change stream event, the verification will fail.
363
363
364
364
- The verifier crashes if it tries to compare time-series collections. The error will include a phrase like “Collection has nil UUID (most probably is a view)” and also mention “timeseries”.
0 commit comments