-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[python] introduce update_columns in python api. #6861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document this too.
|
@zhoulii Can you find a rough reference for other systems? This API definition provides a global RowID. |
|
Maybe new API: |
rust/lance/src/dataset/fragment.rs:1605 lance provides update_columns api that can accept data with rowids, but it's more complicated, it can update columns based on other conditions. |
Thanks for the review ! @JingsongLi Comments are addressed, please take a look again, thanks. |
| self._partial_column_write = PartialColumnWrite(self.table, self.commit_user) | ||
|
|
||
| def write_arrow(self, table: pa.Table): | ||
| if self._partial_column_write is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you abstract file_store_write to avoid so many if else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't figured out the exact approach yet, so I just deleted the else branch for now.
| file_store_write.write(partition_tuple, 0, batch) | ||
|
|
||
| # Prepare commit and assign first_row_id | ||
| commit_messages = file_store_write.prepare_commit(BATCH_COMMIT_IDENTIFIER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set sequence number to define the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequence number will be set in commit stage.
| """Find the partition for a given first_row_id using pre-built partition map.""" | ||
| return self.first_row_id_to_partition_map.get(first_row_id) | ||
|
|
||
| def _write_group(self, partition: GenericRow, first_row_id: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do if all the data in the file is not included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some row counts validations, if not match, an error would be raised.
99bacc9 to
6ae9bff
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Linked issue: close #6859
Tests
API and Format
Documentation