You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51058][PYTHON] Avoid using jvm.SparkSession
### What changes were proposed in this pull request?
Avoids using `jvm.SparkSession` style, to improve Py4J performance similar to #49312, #49313, and #49412.
### Why are the changes needed?
To reduce the overhead of Py4J calls.
```py
import time
def benchmark(f, _n=10, *args, **kwargs):
start = time.time()
for i in range(_n):
f(*args, **kwargs)
print(time.time() - start)
```
```py
from pyspark.context import SparkContext
jvm = SparkContext._jvm
def f():
return jvm.SparkSession
benchmark(f, 10000) # -> 3.578310251235962
```
```py
from pyspark.context import SparkContext
jvm = SparkContext._jvm
def g():
return getattr(jvm, "org.apache.spark.sql.classic.SparkSession")
benchmark(g, 10000) # -> 0.254807710647583
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The existing tests should pass.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#49760 from ueshin/issues/SPARK-51058/spark_session.
Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
0 commit comments