Skip to content

Issue submitting Spark Job from code when Spark Job is a Python program. #4

@sehunley

Description

@sehunley

When submitting a Java or Scala program, everything works fine. When submitting a python program, it's gets to the ACCEPTED state and then stalls. It eventually times out, but it's not getting picked up to run. Is this interface just for Java/Scala programs/jobs or should it be able to submit PySpark/Python jobs as well?

I am trying to invoke the pi.py sample program that comes with Spark 1.6.0.

Below is the java program that I am testing with. I'm new to Spark so apologies for any "newbie" errors.

import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
// import org.apache.log4j.Logger;

/**

  • This class submits a SparkPi to a YARN from a Java client (as opposed

  • to submitting a Spark job from a shell command line using spark-submit).

  • To accomplish submitting a Spark job from a Java client, we use

  • the org.apache.spark.deploy.yarn.Client class described below:

  • Usage: org.apache.spark.deploy.yarn.Client [options]
    Options:
    --jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
    --class CLASS_NAME Name of your application's main class (required)
    --primary-py-file A main Python file
    --arg ARG Argument to be passed to your application's main class.
    Multiple invocations are possible, each will be passed in order.
    --num-executors NUM Number of executors to start (Default: 2)
    --executor-cores NUM Number of cores per executor (Default: 1).
    --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
    --driver-cores NUM Number of cores used by the driver (Default: 1).
    --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
    --name NAME The name of your application (Default: Spark)
    --queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
    --addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
    --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
    --files files Comma separated list of files to be distributed with the job.
    --archives archives Comma separated list of archives to be distributed with the job.

    How to call this program example:

    export SPARK_HOME="/Users/mparsian/spark-1.6.0"
    java -DSPARK_HOME="$SPARK_HOME" org.dataalgorithms.client.SubmitSparkPiToYARNFromJavaCode 10
    */
    public class SubmitSparkPiToYARNFromJavaCode {

    public static void main(String[] args) throws Exception {
    long startTime = System.currentTimeMillis();

    // this is passed to SparkPi program
    //THE_LOGGER.info("Slices Passed=" + args[0]);
    String slices = args[0];  
    // String slices = "10";
    //
    // String SPARK_HOME = System.getProperty("SPARK_HOME");
    String SPARK_HOME = "/opt/spark/spark-1.6.0";
    // THE_LOGGER.info("SPARK_HOME=" + SPARK_HOME);
    
    //
    pi(SPARK_HOME, slices); // ... the code being measured ... 
    //
    long elapsedTime = System.currentTimeMillis() - startTime;
    // THE_LOGGER.info("elapsedTime (millis)=" + elapsedTime);
    

    }

    static void pi(String SPARK_HOME, String slices) throws Exception {
    //
    String[] args = new String[]{
    "--name",
    "Submit-SparkPi-To-Yarn",
    //
    "--driver-memory",
    "512MB",
    //
    "--jar",
    SPARK_HOME + "/examples/target/spark-examples_2.11-1.6.0.jar",
    //
    "--class",
    "org.apache.spark.examples.JavaSparkPi",

        // argument 1 to my Spark program
        "--arg",
        slices,
    
        // argument 2 to my Spark program (helper argument to create a proper JavaSparkContext object)
        "--arg",
        "yarn-cluster"
    };
    
    Configuration config = new Configuration();
    //
    System.setProperty("SPARK_YARN_MODE", "true");
    //
    SparkConf sparkConf = new SparkConf();
    ClientArguments clientArgs = new ClientArguments(args, sparkConf);
    Client client = new Client(clientArgs, config, sparkConf);
    
    client.run();
    // done!
    

    }
    }

Thanks,

-Scott

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions