|
21 | 21 | "In addition, there is a new [dfutil](https://yahoo.github.io/TensorFlowOnSpark/tensorflowonspark.dfutil.html) module which provides helper functions to convert from TensorFlow TFRecords to Spark DataFrames and vice versa.\n" |
22 | 22 | ] |
23 | 23 | }, |
| 24 | + { |
| 25 | + "cell_type": "markdown", |
| 26 | + "metadata": {}, |
| 27 | + "source": [ |
| 28 | + "### Start a Spark Standalone Cluster\n", |
| 29 | + "\n", |
| 30 | + "First, in a terminal/shell window, start a single-machine Spark Standalone Cluster with three workers:\n", |
| 31 | + "```\n", |
| 32 | + "export MASTER=spark://$(hostname):7077\n", |
| 33 | + "export SPARK_WORKER_INSTANCES=3\n", |
| 34 | + "export CORES_PER_WORKER=1\n", |
| 35 | + "export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) \n", |
| 36 | + "${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}\n", |
| 37 | + "```" |
| 38 | + ] |
| 39 | + }, |
| 40 | + { |
| 41 | + "cell_type": "markdown", |
| 42 | + "metadata": {}, |
| 43 | + "source": [ |
| 44 | + "### Launch the Spark Jupyter Notebook\n", |
| 45 | + "\n", |
| 46 | + "Now, in the same window, launch a Pyspark Jupyter notebook:\n", |
| 47 | + "```\n", |
| 48 | + "cd ${TFoS_HOME}/examples/mnist\n", |
| 49 | + "PYSPARK_DRIVER_PYTHON=\"jupyter\" \\\n", |
| 50 | + "PYSPARK_DRIVER_PYTHON_OPTS=\"notebook --ip=`hostname`\" \\\n", |
| 51 | + "pyspark --master ${MASTER} \\\n", |
| 52 | + "--conf spark.cores.max=${TOTAL_CORES} \\\n", |
| 53 | + "--conf spark.task.cpus=${CORES_PER_WORKER} \\\n", |
| 54 | + "--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist_pipeline.py \\\n", |
| 55 | + "--conf spark.executorEnv.JAVA_HOME=\"$JAVA_HOME\"\n", |
| 56 | + "```\n", |
| 57 | + "\n", |
| 58 | + "This should open a Jupyter browser pointing to the directory where this notebook is hosted.\n", |
| 59 | + "Click on the TFOS_pipeline.ipynb file, and begin executing the steps of the notebook." |
| 60 | + ] |
| 61 | + }, |
24 | 62 | { |
25 | 63 | "cell_type": "code", |
26 | 64 | "execution_count": null, |
|
293 | 331 | "cell_type": "markdown", |
294 | 332 | "metadata": {}, |
295 | 333 | "source": [ |
296 | | - "Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call to save the output to disk." |
| 334 | + "Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call below to save the output to disk." |
297 | 335 | ] |
298 | 336 | }, |
299 | 337 | { |
|
316 | 354 | "print(subprocess.check_output([\"ls\", \"-l\", output]))" |
317 | 355 | ] |
318 | 356 | }, |
| 357 | + { |
| 358 | + "cell_type": "markdown", |
| 359 | + "metadata": {}, |
| 360 | + "source": [ |
| 361 | + "### Shutdown\n", |
| 362 | + "\n", |
| 363 | + "In your terminal/shell window, you can type `<ctrl-C>` to exit the Notebook server.\n", |
| 364 | + "\n", |
| 365 | + "Then, stop the Standalone Cluster via:\n", |
| 366 | + "```\n", |
| 367 | + "${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh\n", |
| 368 | + "```" |
| 369 | + ] |
| 370 | + }, |
319 | 371 | { |
320 | 372 | "cell_type": "code", |
321 | 373 | "execution_count": null, |
|
340 | 392 | "name": "python", |
341 | 393 | "nbconvert_exporter": "python", |
342 | 394 | "pygments_lexer": "ipython2", |
343 | | - "version": "2.7.12" |
| 395 | + "version": "2.7.13" |
344 | 396 | } |
345 | 397 | }, |
346 | 398 | "nbformat": 4, |
|
0 commit comments