Skip to content

Commit 3ac0437

Browse files
authored
Merge pull request #88414 from dagiro/cats84
cats84
2 parents e43ba75 + fec5edb commit 3ac0437

File tree

7 files changed

+8
-6
lines changed

7 files changed

+8
-6
lines changed

articles/hdinsight/spark/apache-spark-deep-learning-caffe.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: Use Caffe on Azure HDInsight Spark for distributed deep learning
33
description: Use Caffe on Apache Spark for distributed deep learning in Azure HDInsight.
44
author: hrasheed-msft
55
ms.author: hrasheed
6+
ms.reviewer: jasonh
67
ms.service: hdinsight
78
ms.custom: hdinsightactive
89
ms.topic: conceptual
@@ -60,7 +61,7 @@ The second step is to download, compile, and install protobuf 2.5.0 for Caffe du
6061

6162
To get started, you can just run this script action against your cluster to all the worker nodes and head nodes (for HDInsight 3.5). You can either run the script actions on an existing cluster, or use script actions during the cluster creation. For more information on the script actions, see the documentation [here](https://docs.microsoft.com/azure/hdinsight/hdinsight-hadoop-customize-cluster-linux).
6263

63-
![Script Actions to Install Dependencies](./media/apache-spark-deep-learning-caffe/Script-Action-1.png)
64+
![Script Actions to Install Dependencies](./media/apache-spark-deep-learning-caffe/submit-script-action.png)
6465

6566

6667
## Step 2: Build Caffe on Apache Spark for HDInsight on the head node
@@ -170,7 +171,8 @@ For this example, since you are using CPU rather than GPU, you should change the
170171
# solver mode: CPU or GPU
171172
solver_mode: CPU
172173

173-
![Caffe Config1](./media/apache-spark-deep-learning-caffe/Caffe-1.png)
174+
![Caffe Config1](./media/apache-spark-deep-learning-caffe/caffe-configuration1.png
175+
)
174176

175177
You can change other lines as needed.
176178

@@ -179,7 +181,7 @@ The second file (${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt) define
179181
- change the "file:/Users/mridul/bigml/demodl/mnist_train_lmdb" to "wasb:///projects/machine_learning/image_dataset/mnist_train_lmdb"
180182
- change "file:/Users/mridul/bigml/demodl/mnist_test_lmdb/" to "wasb:///projects/machine_learning/image_dataset/mnist_test_lmdb"
181183

182-
![Caffe Config2](./media/apache-spark-deep-learning-caffe/Caffe-2.png)
184+
![Caffe Config2](./media/apache-spark-deep-learning-caffe/caffe-configuration2.png)
183185

184186
For more information on how to define the network, check the [Caffe documentation on MNIST dataset](https://caffe.berkeleyvision.org/gathered/examples/mnist.html)
185187

@@ -199,15 +201,15 @@ If you want to know what happened, you usually need to get the Spark driver's lo
199201

200202
https://yourclustername.azurehdinsight.net/yarnui
201203

202-
![YARN UI](./media/apache-spark-deep-learning-caffe/YARN-UI-1.png)
204+
![YARN UI](./media/apache-spark-deep-learning-caffe/apache-yarn-window-1.png)
203205

204206
You can take a look at how many resources are allocated for this particular application. You can click the "Scheduler" link, and then you will see that for this application, there are nine containers running. you ask YARN to provide eight executors, and another container is for driver process.
205207

206-
![YARN Scheduler](./media/apache-spark-deep-learning-caffe/YARN-Scheduler.png)
208+
![YARN Scheduler](./media/apache-spark-deep-learning-caffe/apache-yarn-scheduler.png)
207209

208210
You may want to check the driver logs or container logs if there are failures. For driver logs, you can click the application ID in YARN UI, then click the "Logs" button. The driver logs are written into stderr.
209211

210-
![YARN UI 2](./media/apache-spark-deep-learning-caffe/YARN-UI-2.png)
212+
![YARN UI 2](./media/apache-spark-deep-learning-caffe/apache-yarn-window-2.png)
211213

212214
For example, you might see some of the error below from the driver logs, indicating you allocate too many executors.
213215

0 commit comments

Comments
 (0)