Commit ba7849e
committed
[SPARK-51130][YARN][TESTS] Run the test cases related to
### What changes were proposed in this pull request?
The main change in this PR is the addition of two `assume` conditions to ensure that the test cases related to 'connect' in the YarnClusterSuite are only executed on Github Actions.
### Why are the changes needed?
Run these two test cases successfully locally is overly complicated.
Firstly, it is necessary to install the required Python packages:
https://github.com/apache/spark/blob/f5f7c365d519c4f9d4b7a5dce2c8a047cf051899/.github/workflows/build_and_test.yml#L363
Otherwise, local test execution will fail due to missing Python modules.
Secondly, before running tests locally, a packaging operation must be performed to ensure that all dependencies are collected in the `assembly/target/scala-2.13/jars` directory. For example, executing `build/sbt package -Pyarn`. Failing to do so will result in the following error during local test execution:
```
Traceback (most recent call last):
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/resource-managers/yarn/target/test/data/org.apache.spark.deploy.yarn.YarnClusterSuite/yarn-264663/org.apache.spark.deploy.yarn.YarnClusterSuite-localDir-nm-0_0/usercache/yangjie01/appcache/application_1738914482522_0019/container_1738914482522_0019_01_000001/test.py", line 13, in <module>
"spark.api.mode", "connect").master("yarn").getOrCreate()
^^^^^^^^^^^^^
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/session.py", line 511, in getOrCreate
RemoteSparkSession._start_connect_server(url, opts)
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/connect/session.py", line 1073, in _start_connect_server
PySparkSession(SparkContext.getOrCreate(conf))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 523, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 207, in __init__
self._do_init(
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 300, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/core/context.py", line 429, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1627, in __call__
File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.ClassNotFoundException: org.apache.spark.sql.connect.SparkConnectPlugin
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
at org.apache.spark.util.Utils$.classForName(Utils.scala:99)
at org.apache.spark.util.Utils$.$anonfun$loadExtensions$1(Utils.scala:2828)
at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118)
at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105)
at scala.collection.immutable.ArraySeq.flatMap(ArraySeq.scala:35)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2826)
at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:210)
at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:588)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:184)
at py4j.ClientServerConnection.run(ClientServerConnection.java:108)
at java.base/java.lang.Thread.run(Thread.java:840)
```
Lastly, when running tests locally, the `clean` command should not be added. For instance, executing the following command
```
build/sbt "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn
```
will result in successful tests. However, if the `clean` command is included, as in
```
build/sbt clean "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn
```
the same test failure will occur.
Additionally, adding `assume` conditions for testing is also relatively complex:
1. It is necessary to check that at least five essential Python modules are installed: pandas, pyarrow, grpc, grpcio, googleapis_common_protos.
2. It must be confirmed that the contents in `assembly/target/scala-2.13/jars` are fresh and usable.
Given these circumstances, the current pr proposes that test cases related to `connect` in the `YarnClusterSuite` should only be executed in the GitHub pipeline.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GitHub Actions
https://github.com/LuciferYang/spark/actions/runs/13196611264/job/36839274825

- locally check:
```
build/sbt clean "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn
```
we can see:
```
[info] YarnClusterSuite:
...
[info] - run Python application with Spark Connect in yarn-client mode !!! CANCELED !!! (9 milliseconds)
[info] Map("JIRA_PASSWORD" -> "JackBaidu2020", "RUBYOPT" -> "", "HOME" -> "/Users/yangjie01", "JAVA_MAIN_CLASS_33082" -> "xsbt.boot.Boot", "HOMEBREW_BOTTLE_DOMAIN" -> ... did not contain key "GITHUB_ACTIONS" (YarnClusterSuite.scala:269)
...
[info] - run Python application with Spark Connect in yarn-cluster mode !!! CANCELED !!! (1 millisecond)
[info] Map("JIRA_PASSWORD" -> "JackBaidu2020", "RUBYOPT" -> "", "HOME" -> "/Users/yangjie01", "JAVA_MAIN_CLASS_33082" -> "xsbt.boot.Boot", "HOMEBREW_BOTTLE_DOMAIN" -> ... did not contain key "GITHUB_ACTIONS" (YarnClusterSuite.scala:275)
...
[info] Run completed in 4 minutes, 33 seconds.
[info] Total number of tests run: 28
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 28, failed 0, canceled 2, ignored 0, pending 0
[info] All tests passed.
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #49848 from LuciferYang/SPARK-51130.
Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>connect in the YarnClusterSuite on Github Actions only1 parent af92420 commit ba7849e
File tree
1 file changed
+2
-0
lines changed- resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn
1 file changed
+2
-0
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
266 | 266 | | |
267 | 267 | | |
268 | 268 | | |
| 269 | + | |
269 | 270 | | |
270 | 271 | | |
271 | 272 | | |
272 | 273 | | |
273 | 274 | | |
| 275 | + | |
274 | 276 | | |
275 | 277 | | |
276 | 278 | | |
| |||
0 commit comments