[SPARK-52836] Fix sparkapp.sh to let Spark driver determine pod memory

dongjoon-hyun · dongjoon-hyun · commit 34ae166c2b7e · 2025-07-16T23:05:16.000-07:00
### What changes were proposed in this pull request? This PR aims to fix the benchmark script `sparkapp.sh` to let Spark Driver determine pod request memory. ### Why are the changes needed? Apache Spark driver has `spark.driver.memory` is 1g by default and adds overhead when it creates pod. So, we had better use Spark's built-in logic. ### Does this PR introduce _any_ user-facing change? No, this is a benchmark script change. ### How was this patch tested? Manual run since the benchmark script is irrelevant to the CI. ``` $ cd tests/benchmark # The default value is 1k. But, we need to use small value on laptop. $ ./sparkapps.sh 50 CLEAN UP NAMESPACE FOR BENCHMARK START BENCHMARK WITH 50 JOBS FINISHED 50 JOBS IN 52 SECONDS. DELETED 50 JOBS IN 16 SECONDS. ``` ``` # While running the benchmark, we can check the memory. $ kubectl get pod -l spark-role=driver -oyaml | grep memory | sort | uniq -c 100 memory: 256Mi ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #289 from dongjoon-hyun/SPARK-52836. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/tests/benchmark/sparkapps.sh b/tests/benchmark/sparkapps.sh
@@ -38,8 +38,9 @@ spec:
   jars: "local:///opt/spark/examples/jars/spark-examples.jar"
   driverArgs: ["0"]
   sparkConf:
+    spark.driver.memory: "256m"
+    spark.driver.memoryOverhead: "0m"
     spark.kubernetes.driver.request.cores: "100m"
-    spark.kubernetes.driver.request.memory: "100Mi"
     spark.kubernetes.driver.master: "local[1]"
     spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
     spark.kubernetes.container.image: "apache/spark:4.0.0-java21-scala"