Skip to content

feat: add lazy loading of tables#3

Open
PengLiVectra wants to merge 2 commits intomainfrom
add-lazy-loading
Open

feat: add lazy loading of tables#3
PengLiVectra wants to merge 2 commits intomainfrom
add-lazy-loading

Conversation

@PengLiVectra
Copy link

@PengLiVectra PengLiVectra commented Nov 8, 2023

Description

Add lazy loading of tables. When preforming streaming operations we don't need any version of the table loaded. Large tables are slow to load, so we see a huge performance boost by avoiding CPU time spent loading the table metadata.

Related Issue(s)

Partly with delta-io#1361

Testing

Breaking Change

Not a breaking change.

Documentation

:param lazy_load: when true the table metadata isn't loaded
"""
self._storage_options = storage_options
self._table = RawDeltaTable(
Copy link

@ginevragaudioso ginevragaudioso Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table and metadata initialization are already below, I think we should remove them from here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and updates docstring and functions

@PengLiVectra PengLiVectra force-pushed the add-lazy-loading branch 2 times, most recently from a1cd31d to e77089f Compare November 9, 2023 12:38
@ginevragaudioso
Copy link

@dsandesari do you think we should add unit tests for this? On rust side and/or python side?


#[classmethod]
#[pyo3(signature = (table_uri, version = None, storage_options = None, without_files = false))]
fn load_lazy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between load_lazy and new above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_lazy uses builder.build(), while new uses builder.load(). builder.load() build the DeltaTable and load its state. builder.build() only build DeltaTable. See details here: https://github.com/delta-io/delta-rs/blob/main/crates/deltalake-core/src/table/builder.rs#L269-L293

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to include all the options in load_lazy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a test with options and without options for load_lazy to ensure it works as expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@syedashrafulla syedashrafulla Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PengLiVectra for links like the line links, make sure you anchor GitHub to a tag so the stanza is consistent. RN the highlight highlights something else instead of build and load. Looks good otherwise though. We could consider DRY by pulling out the first part of each function (before we call build vs load). Up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider it, most of the code seems repetitive.

@syedashrafulla
Copy link

How is this work related to github/delta-io/delta-rs/issues/1361?

@ginevragaudioso
Copy link

Do we know why the tests are not passing?

@PengLiVectra PengLiVectra force-pushed the add-lazy-loading branch 2 times, most recently from 9ca9531 to c717f5d Compare November 28, 2023 15:57
log_buffer_size=log_buffer_size,
)
self._metadata = None
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a test, using lazy_load.


#[classmethod]
#[pyo3(signature = (table_uri, version = None, storage_options = None, without_files = false))]
fn load_lazy(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a test with options and without options for load_lazy to ensure it works as expected.

@PengLiVectra PengLiVectra force-pushed the add-lazy-loading branch 2 times, most recently from 83478f5 to 6ba7634 Compare December 5, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants