Skip to content

feat: use pyo3-asyncio to get a fresh tokio runtime#2

Open
PengLiVectra wants to merge 1 commit intomainfrom
use-pyo3-asynocio
Open

feat: use pyo3-asyncio to get a fresh tokio runtime#2
PengLiVectra wants to merge 1 commit intomainfrom
use-pyo3-asynocio

Conversation

@PengLiVectra
Copy link

@PengLiVectra PengLiVectra commented Nov 8, 2023

Description

This PR greatly reduces network connections and dns request volume by the delta-rs library when using Python bindings. The approach here is to utilize pyo3-asyncio's tokio-runtime feature as the source of the Runtime. This yields the same runtime across function calls which preserves connections in the connection pool. The previous code created a new runtime per python function call, which established all new socket connections and issued new DNS requests.

Related Issue(s)

Partly delta-io#1315

Testing:

Ran a script that called hundreds of delta operations, and watched tcpdump. Only saw one dns request.

Documentation

@ginevragaudioso
Copy link

Expand on description and add how we tested it. We can take this info from our internal PR that implemented this feature.

mod filesystem;
mod schema;
mod utils;
extern crate pyo3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this crate used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@syedashrafulla
Copy link

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?

Also how might this relate to github/delta-io/delta-rs/1315?

Ok(filesystem::DeltaFileSystemHandler {
inner: self._table.object_store(),
rt: Arc::new(rt()?),
rt: Arc::new(rt_pyo3()?),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaFileSystemHandler (https://github.com/delta-io/delta-rs/blob/main/python/src/filesystem.rs#L58) uses the runtime in utils.rs (https://github.com/delta-io/delta-rs/blob/main/python/src/utils.rs#L10-L13), which is not pyo3-asyncio. So I keep the original rt but change name to rt_pyo3 to avoid errors.
Is there a better way to handle this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so utils.rs and lib.rs both have a runtime that they create? I don't actually know what is the right approach here without more knowledge of runtimes and tokio. Should we get help or learn ourselves?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://sourcecode.vectra.io/projects/DP/repos/delta-rs/pull-requests/26/overview I think this provides the answer. This change seems worthy of pushing upstream, and would require a separate PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answers the question of why we're changing lib.rs. However, my question is related specifically to our having two runtime functions: should we also be using py03-asyncio's tokio-runtime in utils.rs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this out and have the community decide. Frankly @tsh56 and I have higher-priority issues to resolve.

@PengLiVectra
Copy link
Author

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?

Also how might this relate to github/delta-io/delta-rs/1315?

I think it's related to delta-io#1315.

@dsandesari
Copy link
Collaborator

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?
Also how might this relate to github/delta-io/delta-rs/1315?

I think it's related to delta-io#1315.

yeah and especially, felt like we did implement alternative to delta-io#1361

@ginevragaudioso
Copy link

do we know why some tests are failing?

@dsandesari
Copy link
Collaborator

do we know why some tests are failing?
due to linting issues, probably can be fixed by syncing with delta-io/delta-rs or use suggestions based on build failure report here: https://github.com/vectra-ai-research/delta-rs/actions/runs/6926886646/job/18839764113?pr=2

@PengLiVectra PengLiVectra force-pushed the use-pyo3-asynocio branch 2 times, most recently from 556c778 to 46d4f7b Compare November 28, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants