-
Notifications
You must be signed in to change notification settings - Fork 26
Fix and test "has_more" for data batches #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 17cd9f5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. I also like the clear distinction between batches and chunks now. I'm wondering if we can be more consistent with that naming scheme though.
Also, I think the docs on SyncBucketData.has_more could be clearer now. At the moment, it says "True if the response does not contain all the data for this bucket, and another request must be made". But it actually sounds like has_more is true if there could be more data, meaning that we can't guarantee that the batch is complete.
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts
Outdated
Show resolved
Hide resolved
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts
Outdated
Show resolved
Hide resolved
modules/module-mongodb-storage/src/storage/implementation/MongoSyncBucketStorage.ts
Show resolved
Hide resolved
1ea4eb4 to
17cd9f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
Primary issue
With Postgres bucket storage, there are scenarios when a data line would have
has_more: false, even though the bucket has more data. This leads to a checksum failure for the bucket, and the entire bucket re-downloaded.This only occurred when there are multiple different buckets returned in a single
getBucketDataBatch()call, with the first bucket returned not having this issue. This means the client did make progress on each sync attempt, and did not repeatedly run into the same checksum failure. So in most cases, users would not directly see the issue, but sync would take longer than it should.This fixes the has_more logic to cover this case, with new tests.
Additional fixed issues
The tests picked up another couple of consistency issues with batch metadata. None of these caused any visible symptoms on the client.
has_morecould be true in some cases where it shouldn't be. This resulted in some duplicate work at worst, but no consistency issues with the data sent.afterfor a batch could be0, or thenext_aftervalue of a different bucket. These are not used on any client, so did not cause actual issues despite the incorrect values.Naming things
This updates the variable naming inside the method to make a distinction between the entire "batch", and individual yielded "chunks" inside it. The distinction between length and byte size limits should also be more clear now.