Making Spark Connect Rust more thread-safe friendly#13
Making Spark Connect Rust more thread-safe friendly#13edmondop wants to merge 7 commits intosjrusso8:mainfrom
Conversation
|
Thanks for creating this PR! I played around with something similar when I was rewriting the client implementation. I just didn't like having the user wrap the SparkSession with Do you know of way to allow for friendlier |
|
Will look into it. I think the biggest argument for Arc is to have a cheap
copy rather than an expensive one. What do you think? I will look into
DataFusion.
…On Tue, Apr 16, 2024 at 10:42 AM Steve Russo ***@***.***> wrote:
Thanks for creating this PR! I played around with something similar when I
was rewriting the client implementation. I just didn't like having the user
wrap the SparkSession with Arc to create the session. It felt clunky, but
that is probably me over thinking the interface.
Do you know of way to allow for friendlier Send and Sync without
requiring the user to create Arc? I was loosely mirroring the
SessionContext from DataFusion
<https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html>
as for how users of a similar *DataFrame* library interact with a session
object. The SessionContext does not get wrapped in Arc.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGGOKN5OEB7S7F4I6AQLALY5U2GBAVCNFSM6AAAAABGHLMBJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJZGI3DKMZXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Add setCatalog and setDatabase to SparkSession
| DataFrame::new(self, LogicalPlanBuilder::from(range_relation)) | ||
| } | ||
|
|
||
| pub fn setCatalog(self: Arc<Self>, catalog: &str) -> DataFrame { |
There was a problem hiding this comment.
So for both setCatalog and setDatabase, these should be implemented on the spark.catalog object as setCurrentCatalog and setCurrentDatabasesince we want to mirror the existing Spark API.
For these actions to take effect on the existing session, the plan has to be submitted to the server via client.execute_and_fetch and receive a successful response. Both of these execution plans return nothing from the server. The code might look like this below
pub async fn setCurrentCatalog(self, catalog: &str) -> Result<(), SparkError> {
let cat_type = Some(spark::catalog::CatType::SetCurrentCatalog(
spark::SetCurrentCatalog { catalog_name: catalog.to_string() },
));
let rel_type = spark::relation::RelType::Catalog(spark::Catalog { cat_type });
let plan = LogicalPlanBuilder::plan_root(LogicalPlanBuilder::from(rel_type));
self.spark_session.client().execute_and_fetch(plan).await
}
Description
Enabling send_guard and using ARC makes the crate much more friendly to usage in multi-threaded environment, making the session
SendandSync