Skip to content

Commit ba7849e

Browse files
committed
[SPARK-51130][YARN][TESTS] Run the test cases related to connect in the YarnClusterSuite on Github Actions only
### What changes were proposed in this pull request? The main change in this PR is the addition of two `assume` conditions to ensure that the test cases related to 'connect' in the YarnClusterSuite are only executed on Github Actions. ### Why are the changes needed? Run these two test cases successfully locally is overly complicated. Firstly, it is necessary to install the required Python packages: https://github.com/apache/spark/blob/f5f7c365d519c4f9d4b7a5dce2c8a047cf051899/.github/workflows/build_and_test.yml#L363 Otherwise, local test execution will fail due to missing Python modules. Secondly, before running tests locally, a packaging operation must be performed to ensure that all dependencies are collected in the `assembly/target/scala-2.13/jars` directory. For example, executing `build/sbt package -Pyarn`. Failing to do so will result in the following error during local test execution: ``` Traceback (most recent call last): File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/resource-managers/yarn/target/test/data/org.apache.spark.deploy.yarn.YarnClusterSuite/yarn-264663/org.apache.spark.deploy.yarn.YarnClusterSuite-localDir-nm-0_0/usercache/yangjie01/appcache/application_1738914482522_0019/container_1738914482522_0019_01_000001/test.py", line 13, in <module> "spark.api.mode", "connect").master("yarn").getOrCreate() ^^^^^^^^^^^^^ File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/session.py", line 511, in getOrCreate RemoteSparkSession._start_connect_server(url, opts) File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/connect/session.py", line 1073, in _start_connect_server PySparkSession(SparkContext.getOrCreate(conf)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 523, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 207, in __init__ self._do_init( File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 300, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 429, in _initialize_context return self._jvm.JavaSparkContext(jconf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1627, in __call__ File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.ClassNotFoundException: org.apache.spark.sql.connect.SparkConnectPlugin at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:467) at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41) at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36) at org.apache.spark.util.Utils$.classForName(Utils.scala:99) at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2828) at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118) at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105) at scala.collection.immutable.ArraySeq.flatMap(ArraySeq.scala:35) at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2826) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:210) at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:196) at org.apache.spark.SparkContext.<init>(SparkContext.scala:588) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:184) at py4j.ClientServerConnection.run(ClientServerConnection.java:108) at java.base/java.lang.Thread.run(Thread.java:840) ``` Lastly, when running tests locally, the `clean` command should not be added. For instance, executing the following command ``` build/sbt "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn ``` will result in successful tests. However, if the `clean` command is included, as in ``` build/sbt clean "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn ``` the same test failure will occur. Additionally, adding `assume` conditions for testing is also relatively complex: 1. It is necessary to check that at least five essential Python modules are installed: pandas, pyarrow, grpc, grpcio, googleapis_common_protos. 2. It must be confirmed that the contents in `assembly/target/scala-2.13/jars` are fresh and usable. Given these circumstances, the current pr proposes that test cases related to `connect` in the `YarnClusterSuite` should only be executed in the GitHub pipeline. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions https://github.com/LuciferYang/spark/actions/runs/13196611264/job/36839274825 ![image](https://github.com/user-attachments/assets/6159b7b5-ab67-4698-a26c-9b4adfd10665) - locally check: ``` build/sbt clean "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn ``` we can see: ``` [info] YarnClusterSuite: ... [info] - run Python application with Spark Connect in yarn-client mode !!! CANCELED !!! (9 milliseconds) [info] Map("JIRA_PASSWORD" -> "JackBaidu2020", "RUBYOPT" -> "", "HOME" -> "/Users/yangjie01", "JAVA_MAIN_CLASS_33082" -> "xsbt.boot.Boot", "HOMEBREW_BOTTLE_DOMAIN" -> ... did not contain key "GITHUB_ACTIONS" (YarnClusterSuite.scala:269) ... [info] - run Python application with Spark Connect in yarn-cluster mode !!! CANCELED !!! (1 millisecond) [info] Map("JIRA_PASSWORD" -> "JackBaidu2020", "RUBYOPT" -> "", "HOME" -> "/Users/yangjie01", "JAVA_MAIN_CLASS_33082" -> "xsbt.boot.Boot", "HOMEBREW_BOTTLE_DOMAIN" -> ... did not contain key "GITHUB_ACTIONS" (YarnClusterSuite.scala:275) ... [info] Run completed in 4 minutes, 33 seconds. [info] Total number of tests run: 28 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 28, failed 0, canceled 2, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #49848 from LuciferYang/SPARK-51130. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
1 parent af92420 commit ba7849e

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,11 +266,13 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
266266
}
267267

268268
test("run Python application with Spark Connect in yarn-client mode") {
269+
assume(sys.env.contains("GITHUB_ACTIONS"))
269270
testPySpark(
270271
true, extraConf = Map(SPARK_API_MODE.key -> "connect"), script = TEST_CONNECT_PYFILE)
271272
}
272273

273274
test("run Python application with Spark Connect in yarn-cluster mode") {
275+
assume(sys.env.contains("GITHUB_ACTIONS"))
274276
testPySpark(
275277
false, extraConf = Map(SPARK_API_MODE.key -> "connect"), script = TEST_CONNECT_PYFILE)
276278
}

0 commit comments

Comments
 (0)