Skip to content

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Feb 7, 2025

This refactors large parts of the sync flow to allow better optimizations.

Refactoring:

  1. watchWriteCheckpoint is moved from BucketStorageFactory to SyncRulesBucketStorage, making it operate on a single sync-rules instance/version only. This simplifies the implementation and will make it much easier to do optimizations here, since we don't have to deal with sync rule changes.
    • When the active sync rules change, this is detected, and this stops the stream, which closes the connection to the client. The client then has to reconnect to sync against the new sync rules version.
    • This is different from before where we kept connections open during sync rule changes. However, since we have to re-sync all data anyway when sync rules change, this is not a significant difference.
  2. In sync.ts, the bucket parameter state management is moved to a separate class, keeping the main loop simpler.
  3. For looking up bucket parameters, this is now split into two steps:
    1. Pre-compute the static buckets and lookup values. This is now only done once per connection, instead of for each new checkpoint.
    2. Query the "dynamic buckets" (parameter queries on tables). This is now only done if there are dynamic buckets.
    3. This does not have a big impact by itself, but will allow big optimizations in the next phase.

The biggest actual optimization in this phase is removing fullDocument: 'uploadLookup' from the changestream for watching checkpoints. Since this changeStream could be updated very frequently, this can add significant overhead on the bucket storage database. We now instead merge the incremental updates ourselves.

Builds on some API refactoring in #192.


I added a new test-client command to test concurrent connections:

pnpm concurrent-connections -n 20 -c ../service/powersync.yaml

This just opens connections and keep them open to simulate sync load. No direct reporting of performance is done yet, but you can view the timestamp differences between the first checkpoint_diff and the last checkpoint_complete operations to get an idea of performance.

@changeset-bot
Copy link

changeset-bot bot commented Feb 7, 2025

🦋 Changeset detected

Latest commit: bc5326c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 15 packages
Name Type
@powersync/service-module-postgres-storage Minor
@powersync/service-module-mongodb-storage Minor
@powersync/service-core-tests Minor
@powersync/service-module-postgres Minor
@powersync/service-module-mongodb Minor
@powersync/service-core Minor
@powersync/service-module-mysql Minor
@powersync/service-sync-rules Minor
@powersync/service-errors Patch
@powersync/service-image Patch
test-client Patch
@powersync/lib-services-framework Patch
@powersync/service-rsocket-router Patch
@powersync/lib-service-mongodb Patch
@powersync/lib-service-postgres Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Remove queryBucketDescriptions().

Refactor logic to keep track of checksums per connection.

Refactor BucketParameterQuerier implementation.

Refactor watchWriteCheckpoint.

Refactor bucket position tracking.

For static updates, do diff of individual buckets if available.

Start by always sending an invalidation event.

Remove the need for fullDocument: 'updateLookup' for watching
checkpoints.

Move watchWriteCheckpoint to SyncRulesBucketStorage.

We now always watch checkpoints per sync rules instance.

Fix tests to use the new apis.

Type fixes.

Fix tests.

Test storage on github actions.
@rkistner rkistner force-pushed the optimize-bucket-lookups branch from 5e745e3 to 2b024ff Compare February 10, 2025 12:26
@rkistner rkistner force-pushed the optimize-bucket-lookups branch from 6c02b7b to 5df24a0 Compare February 10, 2025 16:19
@rkistner rkistner marked this pull request as ready for review February 17, 2025 08:38
@rkistner rkistner changed the title [WIP] Optimize incremental sync: Phase 1 Optimize incremental sync: Phase 1 Feb 17, 2025
We just let the SyncRulesBucketStorage instances get garbage collected -
no need for active disposing of listeners. The lifecycle of
SyncRulesBucketStorage is not well-defined, so active disposing could
lead to more confusion down the line.

BucketStorageFactory and BucketStorageBatch are still AsyncDisposable.
stevensJourney
stevensJourney previously approved these changes Feb 18, 2025
Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Havent tested anything.

simolus3
simolus3 previously approved these changes Feb 18, 2025
Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mostly looked at the sync changes, and they look good to me 👍

Base automatically changed from feat/bucket-priorities to main February 18, 2025 12:41
@rkistner rkistner dismissed stale reviews from simolus3 and stevensJourney February 18, 2025 12:41

The base branch was changed.

An error occurred while trying to automatically change base from feat/bucket-priorities to main February 18, 2025 12:41
@rkistner rkistner merged commit 963e7be into main Feb 19, 2025
20 checks passed
@rkistner rkistner deleted the optimize-bucket-lookups branch February 19, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants