Skip to content

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented May 6, 2025

Primary issue

With Postgres bucket storage, there are scenarios when a data line would have has_more: false, even though the bucket has more data. This leads to a checksum failure for the bucket, and the entire bucket re-downloaded.

This only occurred when there are multiple different buckets returned in a single getBucketDataBatch() call, with the first bucket returned not having this issue. This means the client did make progress on each sync attempt, and did not repeatedly run into the same checksum failure. So in most cases, users would not directly see the issue, but sync would take longer than it should.

This fixes the has_more logic to cover this case, with new tests.

Additional fixed issues

The tests picked up another couple of consistency issues with batch metadata. None of these caused any visible symptoms on the client.

  1. MongoDB storage: has_more could be true in some cases where it shouldn't be. This resulted in some duplicate work at worst, but no consistency issues with the data sent.
  2. Postgres storage: Could return 1001 rows in a batch, where the limit should be 1000.
  3. after for a batch could be 0, or the next_after value of a different bucket. These are not used on any client, so did not cause actual issues despite the incorrect values.

Naming things

This updates the variable naming inside the method to make a distinction between the entire "batch", and individual yielded "chunks" inside it. The distinction between length and byte size limits should also be more clear now.

@rkistner rkistner requested a review from simolus3 May 6, 2025 12:25
@changeset-bot
Copy link

changeset-bot bot commented May 6, 2025

🦋 Changeset detected

Latest commit: 17cd9f5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@powersync/service-module-postgres-storage Patch
@powersync/service-module-mongodb-storage Patch
@powersync/service-core-tests Patch
@powersync/service-core Patch
@powersync/service-image Patch
@powersync/service-module-mongodb Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres Patch
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I also like the clear distinction between batches and chunks now. I'm wondering if we can be more consistent with that naming scheme though.

Also, I think the docs on SyncBucketData.has_more could be clearer now. At the moment, it says "True if the response does not contain all the data for this bucket, and another request must be made". But it actually sounds like has_more is true if there could be more data, meaning that we can't guarantee that the batch is complete.

@rkistner rkistner force-pushed the fix-batch-hasmore branch from 1ea4eb4 to 17cd9f5 Compare May 6, 2025 13:33
Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@rkistner rkistner merged commit 23ec406 into main May 7, 2025
21 checks passed
@rkistner rkistner deleted the fix-batch-hasmore branch May 7, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants