-
Notifications
You must be signed in to change notification settings - Fork 7
[SPARK-51472] Add gRPC SparkConnectClient actor
#7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Client actorSparkConnectClient actor
|
Could you review this PR when you have some time, @HyukjinKwon ? |
|
Could you review this too when you have some time, @viirya , please? |
|
Looking into this. |
|
Thank you so much, @viirya ! |
| init(remote: String, user: String) { | ||
| self.url = URL(string: remote)! | ||
| self.host = url.host() ?? "localhost" | ||
| self.port = self.url.port ?? 15002 | ||
| self.userContext = user.toUserContext | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Scala Configuration, there are a lot configs other than remote and user, e.g., userAgent, isSslEnabled, etc. Are you going to add more configurations later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this project aims to achieve the feature parity eventually. But, each language clients have different development speeds like we did in SparkR/PySpark/Scala Spark. For Spark Connect example, Spark Connect Go doesn't support isSslEnabled.
| var request = AnalyzePlanRequest() | ||
| request.clientType = clientType | ||
| request.userContext = userContext | ||
| request.sessionID = sessionID | ||
| request.analyze = .sparkVersion(version) | ||
| let response = try await service.analyzePlan(request) | ||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this AnalyzePlanRequest only used for test for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in this PR. #1 has SparkSession and DataFrame implementation and I'll make PRs soon after this.
For example, this is the implementation for SparkSession.version API.
| /// Request the server to set a map of configurations for this session. | ||
| /// - Parameter map: A map of key-value pairs to set. | ||
| /// - Returns: Always return true. | ||
| func setConf(map: [String: String]) async throws -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why don't return ConfigResponse like Scala but a Bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different implementation choice to encapsulate those details.
Technically, each language client maintains its SparkConnectClient in the private scope like private[sql].
| var request = ExecutePlanRequest() | ||
| request.clientType = clientType | ||
| request.userContext = userContext | ||
| request.sessionID = sessionID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be self.sessionID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, @viirya . Let me fix this.
| /// - sessionID: A string for the existing session ID. | ||
| /// - plan: A plan to analyze. | ||
| /// - Returns: An ``AnalyzePlanRequest`` instance | ||
| func getAnalyzePlanRequest(_ sessionID: String, _ plan: Plan) async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAnalyzePlanRequest and getExecutePlanRequest are not used in this PR, right? Will they be used in later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's going to used in SparkSession and DataFrame in the next PRs.
Co-authored-by: Liang-Chi Hsieh <[email protected]>
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Looks good for the initial import of SparkConnectClient.
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
|
Thank you so much for your review, Liang-Chi! Merged to main. |
What changes were proposed in this pull request?
This PR aims to add a Swift
SparkConnectClientactor encapsulatinggRPCconnections which is similar to other language clients.This is a part of the following.
Why are the changes needed?
To use
gRPCin the upperSparkSessionandDataFramelayers easily.Does this PR introduce any user-facing change?
No, this is not released.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.