Skip to content

Conversation

@rkistner
Copy link
Contributor

Builds on #129.

With MongoDB change streams, if you have very large updates, the 16MB document size limit can be reached with change stream event documents. This specifically affects updates of large fields, where the field can appear twice in the change stream event.

To handle this, MongoDB has the $changeStreamSplitLargeEvent option, which uses multiple change stream events for one logical update. This PR now uses that to fix the issue.

@changeset-bot
Copy link

changeset-bot bot commented Nov 13, 2024

⚠️ No Changeset found

Latest commit: 62b085f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rkistner rkistner marked this pull request as ready for review November 13, 2024 12:13
Base automatically changed from mongodb-skip-currentdata to feat/modular-replication-architecture November 13, 2024 13:23
@rkistner rkistner requested a review from Rentacookie November 13, 2024 13:23
@rkistner rkistner merged commit 28e2269 into feat/modular-replication-architecture Nov 13, 2024
15 checks passed
@rkistner rkistner deleted the mongodb-splitevents branch November 13, 2024 14:26
stevensJourney added a commit that referenced this pull request Nov 27, 2024
* added table debug info calls

* Update Walstream classes to use the factories from the service context
Simplified configuration of Replication module

* added connection schema api method and initial workings for replication lag

* update replication lag API. Cleanup queries

* cleanup

* cleanup retried queries. Add SSL config for MySQL config

* Use framework errors. Update DataSourceConfig and ResolvedDataSourceConfig. Added DatabaseSchemaV2 type to remove Postgres fields.

* more mysql cleanup

* update failing Github action name to be clearer

* share some utilities. Playing around with replication

* unfortunate implementation of replication lag

* track binlog positions in replicated GTID identifier

* Introduces the AbstractReplicator concept as a more flexible alternative to the ReplicationAdapter

* wip

* Cleaned up WalStream Replicator and ReplicationJobs
Fixed test imports

* Removed no longer used Teardown function
Moved Replication Module definition to the replication packaged

* Fixed outdated comments and exporting only used Replication classes

* Updated and re-enabled postgres tests

* Updated lock file

* more hacks to get writerows to work

* wip: before implementing update

* added zongji types

* update typescript

* shuffle files arround

* update after shuffle

* Test: Try changing order of filter expression in github test action

* Replication job start no longer immediately cancels

* use new replicator interfaces

* Move connection factory shutdown
Pass along publication name

* move toSQLite function to common

* update types

* move

* Made column type optional since for Postgres the column type is not propagated on WalStream schema updates.

* Fixed (hopefully) the test filtering for the test github action

* Corrected exclusion filter for test workflow

* Added build:tests directive to postgres module

* Export populate_test_data

* Export the test utils from postgres module

* Moved data generation method to test utils

* Moved data generation utils to tests

* Removed unused export

* Updated test tsconfig to include jpgwire

* Try with outDir specified

* Import from dist

* Revert file move and now using import from dist

* cleanup

* fix build error

* Ensure collections are created for tests

* Added dedicated BucketStorageFactory for Postgres tests
Reverted auto collection creation for BucketStorageFactory in the service-core tests

* Moved custom factory into the test it is being used at

* Revert factory imports

* Revert noEmit for postgres tests

* Trying moved factory again after tsconfig fix

* Create test collection conditionally

* Added a few more TODOs
Cleanup
Updated comments and docs where appropriate

* Some more cleanup
Renamed StorageFactoryProvider to StorageEngine

* Consolidated getDebugTableInfo
Fixed route storage references post rename

* Removed Postgres ZeroLSN from BucketStorage

* Writecheckpoint fix after conflict

* fix: align debug_api field for compatibility (#6)

* Implement powersync instance teardown functionality
Reworked replication config resource cleanup
Streamlined storage provider and engine a bit.
Exposed option to dispose of storage.
Made some engines in the service context optional

* add some helpers

* Adjusted checks in routes for the the RouterEngine

* Updated lockfile

* Renamed storage provider storage dispose method.

* Update lockfile

* Basic MongoDB replication structure.

* Support resuming; multi-document operations.

* Workaround for transaction boundaries; filter ns in pipeline.

* Proper initial snapshot + streaming afterwards.

* Use _id directly as replica identity.

* Configurable defaultSchema.

* Use _id directly as replica identity.

* Configurable defaultSchema.

* Fix tests.

* Use 'test_schema' instead of 'public' for tests.

* Expand comment.

* Tweaks for ReplicaId.

* Fix subkey calculation & tests.

* Fix more tests.

* Fix merge conflicts.

* Fix Dockerfile.

* Fix and test mongo data type conversion.

* More mongo format tests.

* Fix initial snapshot; initial change-stream tests.

* Fixed missing ErrorRateLimiter for WalStreamReplicationJob

* Record keepalive events.

* Another fix; basic replication tests passing.

* Added more logging for compact action

* Handle collection drop and rename events.

* Handle replace event.

* Handle missing fullDocument correctly.

* Improve keepalive stability.

* Added start up and shutdown logging to engines.
Consolidated shutdown functions of engines.

* Use checkpoints for standard replication.

* Support wildcard table patterns.

* Fix merge issue.

* Added changeset

* Made Postgres Module package public
Removed mongodb test dependency in Postgres module

* Updated lockfile

* Update typescript, vitest, prettier.

* Apply prettier changes.

* Fix type issues.

* Fix tests.

* Add changeset.

* Fix aborting logic.

* Normalize mongo connection parameters.

* Remove error message when closing a ChangStreamReplicationJob.

* Support M0 clusters; better error message on unsupported clusters.

* Fix sharded cluster check.

* Fix initial replication regression.

* Refactor schema definitions to not be postgres-specific.

* Rename original_type -> internal_type.

* Include source types in the generated schema comments (optional).

* Explicit TableSchema type.

* Add changeset.

* remove mongodb publish config restriction in order to publish dev packages

* need to explicitly set MongDB package access to public for publish.

* Post-merge test fixes.

* Fix pnpm-lock.

* Implement basic diagnostics apis for MongoDB.

* Add test for getConnectionSchema.

* Improve schema filtering; return defaultSchema in API.

* Initialize tag variable before use in nested function call.

* Add comment for tag default value

* Changeset

* Comment update

* list all databases (#96)

* [MongoDB] Schema Endpoint Fix (#97)

* test: populate schema type field

* pg_type restore

* Lockfile update and merge conflict fixes

* Added dev docker compose for mysql

* Added MySQL connection manager and use appropriate connections

* Added stricter type definition for checkpoints
Moved BinlogStream
Added type mapper for mysql -> sqlite types
Removed schemaV2 since it is no longer needed

* Fixed diagnostics route merge conflict
Renamed binlog replication job

* Add MySQLConnection management
Updated config for the Zongji binlog listener

* Made streamable mysql connections available

* Updated BinlogStream to use appropriate connections for snapshot streaming.
Lots of cleanup and consolidation

* Lockfile update

* [Modules] Feat: Replication Events (#105)

* Ignore auth errors on collections for mongodb schema.

* Add changeset.

* Filter out views.

* Updated dev/test mysql docker compose
Renamed MysqlBinLogStream
Some naming cleanup
Fixed connection usage in BinlogStream

* Added BinlogStream tests (WiP)

* Initializing batch in constructor of MongoBucketBatch

* Some MongoModule merge conflict handling and cleanup

* Made mysql module publishing public

* Updated dockerfile

* Added mysql module to service tsconfig

* Fixed dockerfile mysql module copy

* Updated zongji dev package version
Some cleanup of modules package.json

* updates after pulling in main branch

* Updated vitest

* Added configuration error handling
Filter out tables not in sync rules
Improve abort logic handling

* Don't start replication if already aborted

* Exposed DateStrings connection option in Zongji constructor
Using updated binlog listener package

* Lockfile

* Added ConnectionTester interface
Replication Modules now implement this interface.
Removed singleton module exports

* [Modules] Move Write Checkpoint APIs (#110)

* Made it possible to specify timezone on zongji listener configuration to stop unwanted timezone skew
Added serverId configuration
Added check for binlog existence before starting replication.

* Correctly handle date parsing on binlog events and table snapshotting

* Some cleanup

* fix esm stuff

* Using syncrule id for MySQL serverId

* Changeset

* Prevent mysql connection manager throwing errors on shutdown.

* Improved shutdown logic of binlog stream

* Renamed Checkpoint to ReplicationCheckpoint

* Reduce required permissions for replicating from a single database.

* Add changeset.

* fix: Cached Parsed Sync Rules (#114)

* MySQL Type Consolidation (#115)

MySQL:
- Added generator function for MySQL serverId
- Logging cleanup
- Using JsonContainer for Json values in MySQL
- Unified type mappings for replicated and snapshot mysql data types
- Added test running config to github actions
- Mapping all integer types to bigint
- Handling Set data types as json array
- Added test for float mapping

* Properly replicate additional MongoDB types:

Decimal, RegExp, MinKey, MaxKey.

* Test mongodb on GA.

* Also test with MongoDB 8.0.

* Support compacting specific buckets when compacting manually.

* Add configuration for using mongodb postimages.

* Increase timeout between writes in tests.

* Use "await using" to simplify tests.

* Add postImage tests.

* Test and fix wildcard collections.

* Invalidate changestream if postImage is not available.

* Rename post_images config option.

* Parse returned mysql schema result (#122)

* Bugfix:Parse returned schema info as an object. Previously this was the default, but since a recent change to the mysql connection options, all JSON fields are now returned as strings.

* More MongoDB type fixes and tests.

* Handle improperly formatted mysql version strings (#124)

Handled MySQL version checks better and enabled tests for MySQL 5.7

* Automatically clear errors when restarting replication.

* Use commit instead of keepalive.

* Fix initial snapshot implementation.

* Avoid collMod permission on _powersync_checkpoints.

* Minor cleanup.

* Validate changeStreamPreAndPostImages on existing collections.

* Avoid current_data document storage for MongoDB.

* Fix tests.

* Use $changeStreamSplitLargeEvent to handle large updates. (#130)

* Fix tests.

* Fix managed write checkpoint filtering. (#134)

* Support json_each in parameter queries (#126)

* Proof-of-concept: json_each support in parameter queries.

* Extract out json_each.

* Fixes.

* Tweak usesDangerousRequestParameters check.

* Fix handling of nested json.

* Fix tableName regression.

* Add changeset.

---------

Co-authored-by: stevensJourney <[email protected]>

* Modular Architecture Prep for Merge (#137)

* remove postgres errors from mysql+mongodb rate limiters

* use common typescript version

* update changeset

* update zongji package

---------

Co-authored-by: Steven Ontong <[email protected]>
Co-authored-by: Roland Teichert <[email protected]>
Co-authored-by: stevensJourney <[email protected]>
Co-authored-by: Roland Teichert <[email protected]>
Co-authored-by: Dylan Vorster <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants