Skip to content

executor_per_core is fixed to 1 vCores in spark-sql-perf on EMR #205

@Rastogii

Description

@Rastogii

I have installed spark-sql-perf using:

  1. sudo yum install -y gcc make flex bison byacc git
  2. cd /tmp/
  3. git clone https://github.com/databricks/tpcds-kit.git
  4. cd tpcds-kit/tools
  5. make OS=LINUX
  6. curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
  7. sudo yum install sbt
  8. cd /home/hadoop/
  9. git clone https://github.com/databricks/spark-sql-perf
  10. mkdir -p /home/hadoop/.sbt/preloaded/org/spark-packages/sbt-spark-package_2.10_0.13/0.1.1/
  11. cd /home/hadoop/.sbt/preloaded/org/spark-packages/sbt-spark-package_2.10_0.13/0.1.1/
  12. wget https://repos.spark-packages.org/org/spark-packages/sbt-spark-package/0.1.1/sbt-spark-package-0.1.1.pom
  13. wget https://repos.spark-packages.org/org/spark-packages/sbt-spark-package/0.1.1/sbt-spark-package-0.1.1.jar
  14. cd /home/hadoop/spark-sql-perf
  15. sbt +package

and spark configurations are set at /usr/lib/spark/conf/spark-defaults.conf
where spark.executor.memory =19650M & spark.executor.cores = 5 & spark.executor.memoryOverhead =2184

In another case, I tried to set executor-per-cores at run-time using --executor-cores along with spark-submit...

Yet, in the YARN UI , I see this:

_**
Container State: COMPLETE
Mon Jun 21 06:12:54 +0000 2021
Elapsed Time: 7mins, 16sec
Resource: 21856 Memory, 1 VCores

**_

And, there are 5 executors on each node, when there are 32 vCores.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions