- 
                Notifications
    
You must be signed in to change notification settings  - Fork 659
 
Description
When submitting a Java or Scala program, everything works fine. When submitting a python program, it's gets to the ACCEPTED state and then stalls. It eventually times out, but it's not getting picked up to run. Is this interface just for Java/Scala programs/jobs or should it be able to submit PySpark/Python jobs as well?
I am trying to invoke the pi.py sample program that comes with Spark 1.6.0.
Below is the java program that I am testing with. I'm new to Spark so apologies for any "newbie" errors.
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
// import org.apache.log4j.Logger;
/**
- 
This class submits a SparkPi to a YARN from a Java client (as opposed
 - 
to submitting a Spark job from a shell command line using spark-submit).
 - 
To accomplish submitting a Spark job from a Java client, we use
 - 
the org.apache.spark.deploy.yarn.Client class described below:
 - 
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--primary-py-file A main Python file
--arg ARG Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores per executor (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--driver-cores NUM Number of cores used by the driver (Default: 1).
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.How to call this program example:
export SPARK_HOME="/Users/mparsian/spark-1.6.0"
java -DSPARK_HOME="$SPARK_HOME" org.dataalgorithms.client.SubmitSparkPiToYARNFromJavaCode 10
*/
public class SubmitSparkPiToYARNFromJavaCode {public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();// this is passed to SparkPi program //THE_LOGGER.info("Slices Passed=" + args[0]); String slices = args[0]; // String slices = "10"; // // String SPARK_HOME = System.getProperty("SPARK_HOME"); String SPARK_HOME = "/opt/spark/spark-1.6.0"; // THE_LOGGER.info("SPARK_HOME=" + SPARK_HOME); // pi(SPARK_HOME, slices); // ... the code being measured ... // long elapsedTime = System.currentTimeMillis() - startTime; // THE_LOGGER.info("elapsedTime (millis)=" + elapsedTime);}
static void pi(String SPARK_HOME, String slices) throws Exception {
//
String[] args = new String[]{
"--name",
"Submit-SparkPi-To-Yarn",
//
"--driver-memory",
"512MB",
//
"--jar",
SPARK_HOME + "/examples/target/spark-examples_2.11-1.6.0.jar",
//
"--class",
"org.apache.spark.examples.JavaSparkPi",// argument 1 to my Spark program "--arg", slices, // argument 2 to my Spark program (helper argument to create a proper JavaSparkContext object) "--arg", "yarn-cluster" }; Configuration config = new Configuration(); // System.setProperty("SPARK_YARN_MODE", "true"); // SparkConf sparkConf = new SparkConf(); ClientArguments clientArgs = new ClientArguments(args, sparkConf); Client client = new Client(clientArgs, config, sparkConf); client.run(); // done!}
} 
Thanks,
-Scott