-
Notifications
You must be signed in to change notification settings - Fork 351
implement stageOnly Commit #2269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pyiceberg/table/update/snapshot.py
Outdated
branch: str = MAIN_BRANCH, | ||
stage_only: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think API wise, this makes more sense. In the case branch is None
then we don't set the ref, if it is not None
then we set the ref.
branch: str = MAIN_BRANCH, | |
stage_only: bool = False, | |
branch: Optional[str] = MAIN_BRANCH |
@kevinjqliu WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that make sense to me. Instead of setting stage_only=True
, it would be branch=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thanks for taking a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it a bit more, I think that's the way forward.
iceberg-python/pyiceberg/table/__init__.py
Lines 434 to 441 in 4b961f7
def update_snapshot(self, snapshot_properties: Dict[str, str] = EMPTY_DICT, branch: Optional[str] = None) -> UpdateSnapshot: | |
"""Create a new UpdateSnapshot to produce a new snapshot for the table. | |
Returns: | |
A new UpdateSnapshot | |
""" | |
if branch is None: | |
branch = MAIN_BRANCH |
We would change that into:
def update_snapshot(self, snapshot_properties: Dict[str, str] = EMPTY_DICT, branch: Optional[str] = MAIN_BRANCH) -> UpdateSnapshot:
"""Create a new UpdateSnapshot to produce a new snapshot for the table.
Returns:
A new UpdateSnapshot
"""
There are a couple more places where we need to change the default to MAIN_BRANCH
. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense.
I will take another deeper look to make sure it is backward compatible.
9bec165
to
884eca9
Compare
@stevie9868 Thanks for resuming the work on this. I did a quick check over the changes, and it looks good to me. Could you resolve the conflicts? For some reason, the CI did not trigger. |
dad5a1e
to
0250e24
Compare
8e1a846
to
08dee72
Compare
Thanks for the quick review, and I have rebased my changes to resolve the conflicts. |
@@ -807,7 +806,7 @@ def upsert( | |||
case_sensitive=case_sensitive, | |||
) | |||
|
|||
if branch is not None: | |||
if branch in self.table_metadata.refs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we silently ignore the branch if it doesn't exists. I think it would be better to raise a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I was thinking for upsert, we also might want to support staging commit as well so I remove the exception.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one minor comment, but apart from that, this looks good to me! 🙌 Thanks for working on this @stevie9868
Rationale for this change
In java, snapshotProducer can create stageOnly snapshot meaning only the snapshot is created but the snapshot is not set to a ref.
This is a prerequisite to support wap.id in py-iceberg
Are these changes tested?
Yes, tests are added.
Are there any user-facing changes?
By default, it will stay with the current existing behavior.