- 
                Notifications
    You must be signed in to change notification settings 
- Fork 133
Use OnceLock to store TokioRuntime #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OnceLock to store TokioRuntime #895
Conversation
6fa05ba    to
    ff1e758      
    Compare
  
    | Because the change was performance related, we should probably benchmark the change. I did a quick & dirty one (making no effort to isolate resources / stop other processes on my laptop), but the results tell me I should run a cleaner one tomorrow before merging. Using  
 
 | 
| From the benchmarking above it appears to be a mostly negative impact. Am I reading that right? | 
| Yes. My ~dirty benchmark had a pretty consistently negative performance impact. Hence, the need for further investigation. That result surprises me, though. | 
18c6098    to
    030ec20      
    Compare
  
    | Alright, second benchmark running after clean boot look like this has basically no impact. I wouldn't mind if @andygrove or anyone else has a nice benchmarking server setup to run their own, but I'm more comfortable with this now. Reminder:  
 | 
efd2046    to
    37d3356      
    Compare
  
    37d3356    to
    85f3ca3      
    Compare
  
    I also included a reference comment in case future users experience problems with using datafusion-python behind a forking app server l ike `gunicorn`.
| Ran it one more time with the  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Michael-J-Ward !
| // If we run into that problem, in the future we can look to `delta-rs` | ||
| // which adds a check in that disallows calls from a forked process | ||
| // https://github.com/delta-io/delta-rs/blob/87010461cfe01563d91a4b9cd6fa468e2ad5f283/python/src/utils.rs#L10-L31 | ||
| static RUNTIME: OnceLock<TokioRuntime> = OnceLock::new(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm understanding it correctly (please correct me if I'm not making sense).
So now it use OnceLock<Runtime> than OnceLock<Arc<Runtime>> to get rid of Python heap thoroughly. And it will not be thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend looking at the PR diff to see the real change.
There is an intermediate commit that went from  OnceLock<Arc<Runtime>> to OnceLock<Runtime>, but that's because the Arc was superfluous.
OnceLock is already Send + Sync and explicitly thread-safe.
So, if you discover that it is not actually thread-safe, please share!
Which issue does this PR close?
Closes #737.
Rationale for this change
To avoid repeatedly recreating the tokio runtime, #341 began storing the tokio runtime on the python heap at
datafusion.runtime.This created the requirement that the
datafusionpython package must be available to any user of the rust cratedatafusion-python.However, some users (#737) want to leverage the rust-crate
datafusion-pythonto create their own custom python package without depending on or using the pythondatafusionpackage.What changes are included in this PR?
This PR uses
std::sync::OnceLockinstead of the python heap to ensure that only one tokio runtime gets created once.This both removes the implicit dependency and avoids
rust --> python --> rustroundtrip every time the runtime is used.Are there any user-facing changes?
datafusion.runtimehas been removed, but that was already unusable for users of the python package.