Skip to content

Conversation

@benitav
Copy link
Collaborator

@benitav benitav commented Feb 13, 2025

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I've found two minor things that stood out to me. Maybe we also want to mention bucket priorities on the page introducing buckets in the future, so that it can serve as a complete reference on bucket definitions.

<Info>
Note that this strategy is specifically to prioritize data on initial sync, and cannot be used for incremental sync after that.
</Info>
# Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't have to do that now, but maybe it's helpful to also expand the "Organize Data Into Buckets" page to mention that buckets can have metadata attached to them (like the priority YAML key or the _priority parameter of a parameter query).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch, thanks - I'll work on that next 👍

```

Note:
- If multiple parameter queries specify different priorities for the same bucket, the highest priority (lowest number) is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that it says the same thing in the paper doc, but I think this may be confusing for users.

When multiple parameter queries are involved, we would create multiple internal buckets for that. These internal buckets will have independent priorities based on the parameter row defining them. Considering something like this:

bucket_definitions:
  projects:
    parameters:
      - SELECT id AS project_id, 2 AS _priority FROM projects WHERE NOT important AND ...;
      - SELECT id AS project_id, 1 AS _priority FROM projects WHERE important AND ...;

Then, if there are 2 projects with important = TRUE and 3 projects with important = FALSE, we'll create 5 buckets of which 2 have the higher priority.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so that would behave as expected then - i.e. both priorities are considered and data is synced accordingly. So is the paper doc outdated on this note or is there a different nuance here that I'm missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the doc is just outdated, I don't see any nuance here either.

@benitav benitav changed the title WIP: Sync bucket priorities Sync bucket priorities Mar 3, 2025
@benitav benitav merged commit 889089b into docs Mar 3, 2025
3 checks passed
@benitav benitav deleted the bucket-priorities branch March 3, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants