-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(lance): Remove extra buffering in Lance writer #17916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lance): Remove extra buffering in Lance writer #17916
Conversation
| root = VectorSchemaRoot.create(getArrowSchema(), allocator); | ||
| } | ||
| // Finalize the arrow writer (sets row count on VectorSchemaRoot) | ||
| arrowWriter.finish(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of finish() should the interface method be renamed to something more clear like setRowCount() or is there other functionality outside of setting rowCount as well that finish() handles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will do some operation on the fields as well, I just made it match the implementation name but I think something like closeBatch or finishBatch may be more telling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense either of those names sounds good to me.
65fea30 to
c16af1e
Compare
c16af1e to
3b56819
Compare
| private final BufferAllocator allocator; | ||
| private final int batchSize; | ||
| /** | ||
| * -- GETTER -- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Do we need this comment here? Or is the @Getter annotation already and field name already self explanatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -- GETTER -- will add the javadoc to the getter method with lombok but I agree this field is self explanatory
Describe the issue this Pull Request addresses
The lance writer is maintaining a list of records in a list as well as writing out to a buffer within the arrow writer. We can remove this intermediate buffer and directly write the values out to the arrow writer. This should avoid extra row copy operations flagged in #17768 (comment)
Summary and Changelog
Impact
Reduces memory overhead
Risk Level
Low, existing test coverage exists for the writer
Documentation Update
Contributor's checklist