Skip to content

Commit 2289020

Browse files
authored
Merge pull request #234 from yahoo/leewyang_pipeline_notebook
add instructions for starting up the pipeline notebook
2 parents bae53cd + bb55a0f commit 2289020

File tree

1 file changed

+54
-2
lines changed

1 file changed

+54
-2
lines changed

examples/mnist/TFOS_pipeline.ipynb

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,44 @@
2121
"In addition, there is a new [dfutil](https://yahoo.github.io/TensorFlowOnSpark/tensorflowonspark.dfutil.html) module which provides helper functions to convert from TensorFlow TFRecords to Spark DataFrames and vice versa.\n"
2222
]
2323
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"### Start a Spark Standalone Cluster\n",
29+
"\n",
30+
"First, in a terminal/shell window, start a single-machine Spark Standalone Cluster with three workers:\n",
31+
"```\n",
32+
"export MASTER=spark://$(hostname):7077\n",
33+
"export SPARK_WORKER_INSTANCES=3\n",
34+
"export CORES_PER_WORKER=1\n",
35+
"export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) \n",
36+
"${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}\n",
37+
"```"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"### Launch the Spark Jupyter Notebook\n",
45+
"\n",
46+
"Now, in the same window, launch a Pyspark Jupyter notebook:\n",
47+
"```\n",
48+
"cd ${TFoS_HOME}/examples/mnist\n",
49+
"PYSPARK_DRIVER_PYTHON=\"jupyter\" \\\n",
50+
"PYSPARK_DRIVER_PYTHON_OPTS=\"notebook --ip=`hostname`\" \\\n",
51+
"pyspark --master ${MASTER} \\\n",
52+
"--conf spark.cores.max=${TOTAL_CORES} \\\n",
53+
"--conf spark.task.cpus=${CORES_PER_WORKER} \\\n",
54+
"--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist_pipeline.py \\\n",
55+
"--conf spark.executorEnv.JAVA_HOME=\"$JAVA_HOME\"\n",
56+
"```\n",
57+
"\n",
58+
"This should open a Jupyter browser pointing to the directory where this notebook is hosted.\n",
59+
"Click on the TFOS_pipeline.ipynb file, and begin executing the steps of the notebook."
60+
]
61+
},
2462
{
2563
"cell_type": "code",
2664
"execution_count": null,
@@ -293,7 +331,7 @@
293331
"cell_type": "markdown",
294332
"metadata": {},
295333
"source": [
296-
"Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call to save the output to disk."
334+
"Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call below to save the output to disk."
297335
]
298336
},
299337
{
@@ -316,6 +354,20 @@
316354
"print(subprocess.check_output([\"ls\", \"-l\", output]))"
317355
]
318356
},
357+
{
358+
"cell_type": "markdown",
359+
"metadata": {},
360+
"source": [
361+
"### Shutdown\n",
362+
"\n",
363+
"In your terminal/shell window, you can type `<ctrl-C>` to exit the Notebook server.\n",
364+
"\n",
365+
"Then, stop the Standalone Cluster via:\n",
366+
"```\n",
367+
"${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh\n",
368+
"```"
369+
]
370+
},
319371
{
320372
"cell_type": "code",
321373
"execution_count": null,
@@ -340,7 +392,7 @@
340392
"name": "python",
341393
"nbconvert_exporter": "python",
342394
"pygments_lexer": "ipython2",
343-
"version": "2.7.12"
395+
"version": "2.7.13"
344396
}
345397
},
346398
"nbformat": 4,

0 commit comments

Comments
 (0)