-
Notifications
You must be signed in to change notification settings - Fork 10
Sync bucket priorities #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I've found two minor things that stood out to me. Maybe we also want to mention bucket priorities on the page introducing buckets in the future, so that it can serve as a complete reference on bucket definitions.
| <Info> | ||
| Note that this strategy is specifically to prioritize data on initial sync, and cannot be used for incremental sync after that. | ||
| </Info> | ||
| # Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't have to do that now, but maybe it's helpful to also expand the "Organize Data Into Buckets" page to mention that buckets can have metadata attached to them (like the priority YAML key or the _priority parameter of a parameter query).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great catch, thanks - I'll work on that next 👍
| ``` | ||
|
|
||
| Note: | ||
| - If multiple parameter queries specify different priorities for the same bucket, the highest priority (lowest number) is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that it says the same thing in the paper doc, but I think this may be confusing for users.
When multiple parameter queries are involved, we would create multiple internal buckets for that. These internal buckets will have independent priorities based on the parameter row defining them. Considering something like this:
bucket_definitions:
projects:
parameters:
- SELECT id AS project_id, 2 AS _priority FROM projects WHERE NOT important AND ...;
- SELECT id AS project_id, 1 AS _priority FROM projects WHERE important AND ...;Then, if there are 2 projects with important = TRUE and 3 projects with important = FALSE, we'll create 5 buckets of which 2 have the higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so that would behave as expected then - i.e. both priorities are considered and data is synced accordingly. So is the paper doc outdated on this note or is there a different nuance here that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the doc is just outdated, I don't see any nuance here either.
Mostly from this doc: https://paper.dropbox.com/doc/2024-10-Bucket-Priorities-PowerSync-Proposal--CgTUkogvSRhCpmq2FOwMuMOiAg-DsdphMZ0JVN6mkEmY8cbh
And this PR: powersync-ja/powersync-service#192