[MongoDB] Improve initial replication performance #219

rkistner · 2025-03-03T10:21:27Z

In some cases, MongoDB initial replication replicated at a rate as low as 150 documents/second, while we're aiming for 5-10k/second. After some investigation, the main bottleneck appears to be reading the documents from the source database, rather than writing to bucket storage.

I can't figure out why, but reading the documents using a MongoDB session appears to be the main culprit. Using the session, it could take as long as 10-20s to read 10k documents in some cases. This is not consistent, which makes it difficult to test. Without the session, that drops down to 0.5-1s for the same 10k documents. ~~The latter is fairly consistent.~~

This changes the query to not use a session anymore. Initially we used a session for snapshot reads, but that is not required anymore, so there is no good reason to use sessions for this.

The performance both before and after is fairly inconsistent, and I could not find clear patterns. Even with the changes here, replication is slow in some cases. However, the median performance does appear to be better with these changes.

Other smaller changes:

Use readBufferedDocuments() to read documents in batches, rather than an async iterator.
Increase the read batch size from the default 1000 to 6000. This has no effect if the cursor is limited due to the size of the documents, so does not increase memory in the worst case scenario. This improves read latency in some cases.
Log the duration of each read batch (up to 6k documents), as well as the flushing of each PersistedBatch (up to 2k documents).
Filter out empty "resumeBatch" batches to avoid the empty Flushed 0 + 0 + 0 updates messages after each actual batch.
Start reading the next batch while processing the current batch.

changeset-bot · 2025-03-03T10:21:31Z

🦋 Changeset detected

Latest commit: 5695410

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
@powersync/service-module-mongodb-storage	Patch
@powersync/service-module-mongodb	Patch
@powersync/service-module-mysql	Patch
@powersync/service-module-postgres	Patch
@powersync/service-image	Patch
@powersync/service-core	Patch
@powersync/service-core-tests	Patch
@powersync/service-module-postgres-storage	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

rkistner · 2025-03-03T12:16:23Z

Update: The changes appear to not completely solve the read issue in all cases - I still get slightly inconsistent results when testing. However, in my main test case, I'm seeing the time to replicate 100k documents get reduced from over 2 minutes (800 docs/s) to around 22s (4500 docs/s). This is much more in line with the performance we expect, and similar to what we get for Postgres.

At this point, replication becomes CPU constrained on the replication process. We can eventually implement concurrent replication threads to improve replication performance further.

Remove session to increase initial replication performance.

8deac07

rkistner requested a review from stevensJourney March 3, 2025 10:21

Add changeset.

174fe80

rkistner marked this pull request as ready for review March 3, 2025 10:23

stevensJourney previously approved these changes Mar 3, 2025

View reviewed changes

Read and write concurrently.

5695410

rkistner dismissed stevensJourney’s stale review via 5695410 March 3, 2025 11:31

rkistner requested a review from stevensJourney March 3, 2025 12:17

stevensJourney approved these changes Mar 3, 2025

View reviewed changes

rkistner merged commit 0dd746a into main Mar 3, 2025
21 checks passed

rkistner deleted the mongo-initial-replication-performance branch March 3, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MongoDB] Improve initial replication performance #219

[MongoDB] Improve initial replication performance #219

Uh oh!

rkistner commented Mar 3, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 3, 2025 •

edited

Loading

Uh oh!

rkistner commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MongoDB] Improve initial replication performance #219

[MongoDB] Improve initial replication performance #219

Uh oh!

Conversation

rkistner commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

rkistner commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rkistner commented Mar 3, 2025 •

edited

Loading

changeset-bot bot commented Mar 3, 2025 •

edited

Loading