Conversation
| :param lazy_load: when true the table metadata isn't loaded | ||
| """ | ||
| self._storage_options = storage_options | ||
| self._table = RawDeltaTable( |
There was a problem hiding this comment.
The table and metadata initialization are already below, I think we should remove them from here?
There was a problem hiding this comment.
Removed and updates docstring and functions
a1cd31d to
e77089f
Compare
|
@dsandesari do you think we should add unit tests for this? On rust side and/or python side? |
6f9704c to
b59a982
Compare
python/src/lib.rs
Outdated
|
|
||
| #[classmethod] | ||
| #[pyo3(signature = (table_uri, version = None, storage_options = None, without_files = false))] | ||
| fn load_lazy( |
There was a problem hiding this comment.
What is the difference between load_lazy and new above?
There was a problem hiding this comment.
load_lazy uses builder.build(), while new uses builder.load(). builder.load() build the DeltaTable and load its state. builder.build() only build DeltaTable. See details here: https://github.com/delta-io/delta-rs/blob/main/crates/deltalake-core/src/table/builder.rs#L269-L293
There was a problem hiding this comment.
Not sure if we need to include all the options in load_lazy
There was a problem hiding this comment.
we can add a test with options and without options for load_lazy to ensure it works as expected.
There was a problem hiding this comment.
builder.build doesn't use version option, so I removed it. https://github.com/delta-io/delta-rs/blob/main/crates/deltalake-core/src/table/builder.rs#L269-L280
There was a problem hiding this comment.
@PengLiVectra for links like the line links, make sure you anchor GitHub to a tag so the stanza is consistent. RN the highlight highlights something else instead of build and load. Looks good otherwise though. We could consider DRY by pulling out the first part of each function (before we call build vs load). Up to you.
There was a problem hiding this comment.
we should consider it, most of the code seems repetitive.
|
How is this work related to github/delta-io/delta-rs/issues/1361? |
|
Do we know why the tests are not passing? |
9ca9531 to
c717f5d
Compare
python/deltalake/table.py
Outdated
| log_buffer_size=log_buffer_size, | ||
| ) | ||
| self._metadata = None | ||
| return |
There was a problem hiding this comment.
We can add a test, using lazy_load.
python/src/lib.rs
Outdated
|
|
||
| #[classmethod] | ||
| #[pyo3(signature = (table_uri, version = None, storage_options = None, without_files = false))] | ||
| fn load_lazy( |
There was a problem hiding this comment.
we can add a test with options and without options for load_lazy to ensure it works as expected.
ad082dd to
e4e074e
Compare
d127edf to
b6fba58
Compare
83478f5 to
6ba7634
Compare
6ba7634 to
b5acd83
Compare
Description
Add lazy loading of tables. When preforming streaming operations we don't need any version of the table loaded. Large tables are slow to load, so we see a huge performance boost by avoiding CPU time spent loading the table metadata.
Related Issue(s)
Partly with delta-io#1361
Testing
Breaking Change
Not a breaking change.
Documentation