Merge pull request #234 from yahoo/leewyang_pipeline_notebook

leewyang · web-flow · commit 2289020a99bb · 2018-02-26T10:09:17.000-08:00
add instructions for starting up the pipeline notebook
diff --git a/examples/mnist/TFOS_pipeline.ipynb b/examples/mnist/TFOS_pipeline.ipynb
@@ -21,6 +21,44 @@
     "In addition, there is a new [dfutil](https://yahoo.github.io/TensorFlowOnSpark/tensorflowonspark.dfutil.html) module which provides helper functions to convert from TensorFlow TFRecords to Spark DataFrames and vice versa.\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start a Spark Standalone Cluster\n",
+    "\n",
+    "First, in a terminal/shell window, start a single-machine Spark Standalone Cluster with three workers:\n",
+    "```\n",
+    "export MASTER=spark://$(hostname):7077\n",
+    "export SPARK_WORKER_INSTANCES=3\n",
+    "export CORES_PER_WORKER=1\n",
+    "export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) \n",
+    "${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Launch the Spark Jupyter Notebook\n",
+    "\n",
+    "Now, in the same window, launch a Pyspark Jupyter notebook:\n",
+    "```\n",
+    "cd ${TFoS_HOME}/examples/mnist\n",
+    "PYSPARK_DRIVER_PYTHON=\"jupyter\" \\\n",
+    "PYSPARK_DRIVER_PYTHON_OPTS=\"notebook --ip=`hostname`\" \\\n",
+    "pyspark  --master ${MASTER} \\\n",
+    "--conf spark.cores.max=${TOTAL_CORES} \\\n",
+    "--conf spark.task.cpus=${CORES_PER_WORKER} \\\n",
+    "--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist_pipeline.py \\\n",
+    "--conf spark.executorEnv.JAVA_HOME=\"$JAVA_HOME\"\n",
+    "```\n",
+    "\n",
+    "This should open a Jupyter browser pointing to the directory where this notebook is hosted.\n",
+    "Click on the TFOS_pipeline.ipynb file, and begin executing the steps of the notebook."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -293,7 +331,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, invoke the `TFModel.transform()` method and save the output DataFrame.  **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call to save the output to disk."
+    "Now, invoke the `TFModel.transform()` method and save the output DataFrame.  **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call below to save the output to disk."
    ]
   },
   {
@@ -316,6 +354,20 @@
     "print(subprocess.check_output([\"ls\", \"-l\", output]))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Shutdown\n",
+    "\n",
+    "In your terminal/shell window, you can type `<ctrl-C>` to exit the Notebook server.\n",
+    "\n",
+    "Then, stop the Standalone Cluster via:\n",
+    "```\n",
+    "${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -340,7 +392,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython2",
-   "version": "2.7.12"
+   "version": "2.7.13"
   }
  },
  "nbformat": 4,