Skip to content

Commit 1ce5b66

Browse files
authored
Merge pull request #101253 from dagiro/freshness177
freshness177
2 parents 9aa8179 + dcc3e98 commit 1ce5b66

File tree

1 file changed

+37
-45
lines changed

1 file changed

+37
-45
lines changed

articles/hdinsight/spark/apache-spark-microsoft-cognitive-toolkit.md

Lines changed: 37 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,27 @@
22
title: Microsoft Cognitive Toolkit with Apache Spark - Azure HDInsight
33
description: Learn how a trained Microsoft Cognitive Toolkit deep learning model can be applied to a dataset using the Spark Python API in an Azure HDInsight Spark cluster.
44
author: hrasheed-msft
5+
ms.author: hrasheed
56
ms.reviewer: jasonh
6-
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 11/28/2017
11-
ms.author: hrasheed
12-
9+
ms.custom: hdinsightactive
10+
ms.date: 01/14/2020
1311
---
12+
1413
# Use Microsoft Cognitive Toolkit deep learning model with Azure HDInsight Spark cluster
1514

1615
In this article, you do the following steps.
1716

18-
1. Run a custom script to install [Microsoft Cognitive Toolkit](https://www.microsoft.com/en-us/cognitive-toolkit/) on an Azure HDInsight Spark cluster.
17+
1. Run a custom script to install [Microsoft Cognitive Toolkit](https://docs.microsoft.com/cognitive-toolkit/) on an Azure HDInsight Spark cluster.
1918

20-
2. Upload a [Jupyter Notebook](https://jupyter.org/) to the [Apache Spark](https://spark.apache.org/) cluster to see how to apply a trained Microsoft Cognitive Toolkit deep learning model to files in an Azure Blob Storage Account using the [Spark Python API (PySpark)](https://spark.apache.org/docs/0.9.0/python-programming-guide.html)
19+
2. Upload a [Jupyter Notebook](https://jupyter.org/) to the [Apache Spark](https://spark.apache.org/) cluster to see how to apply a trained Microsoft Cognitive Toolkit deep learning model to files in an Azure Blob Storage Account using the [Spark Python API (PySpark)](https://spark.apache.org/docs/latest/api/python/index.html)
2120

2221
## Prerequisites
2322

24-
* **An Azure subscription**. Before you begin this article, you must have an Azure subscription. See [Create your free Azure account today](https://azure.microsoft.com/free).
23+
* An Apache Spark cluster on HDInsight. See [Create an Apache Spark cluster](./apache-spark-jupyter-spark-sql-use-portal.md).
2524

26-
* **Azure HDInsight Spark cluster**. For this article, create a Spark 2.0 cluster. For instructions, see [Create Apache Spark cluster in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
25+
* Familiarity with using Jupyter Notebooks with Spark on HDInsight. For more information, see [Load data and run queries with Apache Spark on HDInsight](./apache-spark-load-data-run-query.md).
2726

2827
## How does this solution flow?
2928

@@ -34,68 +33,69 @@ This solution is divided between this article and a Jupyter notebook that you up
3433

3534
The following remaining steps are covered in the Jupyter notebook.
3635

37-
- Load sample images into a Spark Resiliant Distributed Dataset or RDD.
38-
- Load modules and define presets.
39-
- Download the dataset locally on the Spark cluster.
40-
- Convert the dataset into an RDD.
41-
- Score the images using a trained Cognitive Toolkit model.
42-
- Download the trained Cognitive Toolkit model to the Spark cluster.
43-
- Define functions to be used by worker nodes.
44-
- Score the images on worker nodes.
45-
- Evaluate model accuracy.
46-
36+
* Load sample images into a Spark Resilient Distributed Dataset or RDD.
37+
* Load modules and define presets.
38+
* Download the dataset locally on the Spark cluster.
39+
* Convert the dataset into an RDD.
40+
* Score the images using a trained Cognitive Toolkit model.
41+
* Download the trained Cognitive Toolkit model to the Spark cluster.
42+
* Define functions to be used by worker nodes.
43+
* Score the images on worker nodes.
44+
* Evaluate model accuracy.
4745

4846
## Install Microsoft Cognitive Toolkit
4947

50-
You can install Microsoft Cognitive Toolkit on a Spark cluster using script action. Script action uses custom scripts to install components on the cluster that are not available by default. You can use the custom script from the Azure portal, by using HDInsight .NET SDK, or by using Azure PowerShell. You can also use the script to install the toolkit either as part of cluster creation, or after the cluster is up and running.
48+
You can install Microsoft Cognitive Toolkit on a Spark cluster using script action. Script action uses custom scripts to install components on the cluster that aren't available by default. You can use the custom script from the Azure portal, by using HDInsight .NET SDK, or by using Azure PowerShell. You can also use the script to install the toolkit either as part of cluster creation, or after the cluster is up and running.
5149

5250
In this article, we use the portal to install the toolkit, after the cluster has been created. For other ways to run the custom script, see [Customize HDInsight clusters using Script Action](../hdinsight-hadoop-customize-cluster-linux.md).
5351

5452
### Using the Azure portal
5553

56-
For instructions on how to use the Azure portal to run script action, see [Customize HDInsight clusters using Script Action](../hdinsight-hadoop-customize-cluster-linux.md#use-a-script-action-during-cluster-creation). Make sure you provide the following inputs to install Microsoft Cognitive Toolkit.
57-
58-
* Provide a value for the script action name.
59-
60-
* For **Bash script URI**, enter `https://raw.githubusercontent.com/Azure-Samples/hdinsight-pyspark-cntk-integration/master/cntk-install.sh`.
54+
For instructions on how to use the Azure portal to run script action, see [Customize HDInsight clusters using Script Action](../hdinsight-hadoop-customize-cluster-linux.md#use-a-script-action-during-cluster-creation). Make sure you provide the following inputs to install Microsoft Cognitive Toolkit. Use the following values for your script action:
6155

62-
* Make sure you run the script only on the head and worker nodes and clear all the other checkboxes.
63-
64-
* Click **Create**.
56+
|Property |Value |
57+
|---|---|
58+
|Script type|- Custom|
59+
|Name| Install MCT|
60+
|Bash script URI|`https://raw.githubusercontent.com/Azure-Samples/hdinsight-pyspark-cntk-integration/master/cntk-install.sh`|
61+
|Node type(s):|Head, Worker|
62+
|Parameters|None|
6563

6664
## Upload the Jupyter notebook to Azure HDInsight Spark cluster
6765

6866
To use the Microsoft Cognitive Toolkit with the Azure HDInsight Spark cluster, you must load the Jupyter notebook **CNTK_model_scoring_on_Spark_walkthrough.ipynb** to the Azure HDInsight Spark cluster. This notebook is available on GitHub at [https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration](https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration).
6967

70-
1. Clone the GitHub repository [https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration](https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration). For instructions to clone, see [Cloning a repository](https://help.github.com/articles/cloning-a-repository/).
71-
72-
2. From the Azure portal, open the Spark cluster blade that you already provisioned, click **Cluster Dashboard**, and then click **Jupyter notebook**.
68+
1. Download and unzip [https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration](https://github.com/Azure-Samples/hdinsight-pyspark-cntk-integration).
7369

74-
You can also launch the Jupyter notebook by going to the URL `https://<clustername>.azurehdinsight.net/jupyter/`. Replace \<clustername> with the name of your HDInsight cluster.
70+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/jupyter`, where `CLUSTERNAME` is the name of your cluster.
7571

76-
3. From the Jupyter notebook, click **Upload** in the top-right corner and then navigate to the location where you cloned the GitHub repository.
72+
1. From the Jupyter notebook, select **Upload** in the top-right corner and then navigate to the download and select file `CNTK_model_scoring_on_Spark_walkthrough.ipynb`.
7773

78-
![Upload Jupyter notebook to Azure HDInsight Spark cluster](./media/apache-spark-microsoft-cognitive-toolkit/hdinsight-microsoft-cognitive-toolkit-load-jupyter-notebook.png "Upload Jupyter notebook to Azure HDInsight Spark cluster")
74+
![Upload Jupyter notebook to Azure HDInsight Spark cluster](./media/apache-spark-microsoft-cognitive-toolkit/hdinsight-microsoft-cognitive-toolkit-load-jupyter-notebook.png "Upload Jupyter notebook to Azure HDInsight Spark cluster")
7975

80-
4. Click **Upload** again.
76+
1. Select **Upload** again.
8177

82-
5. After the notebook is uploaded, click the name of the notebook and then follow the instructions in the notebook itself on how to load the data set and perform the article.
78+
1. After the notebook is uploaded, click the name of the notebook and then follow the instructions in the notebook itself on how to load the data set and perform the article.
8379

8480
## See also
81+
8582
* [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
8683

8784
### Scenarios
85+
8886
* [Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
8987
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
9088
* [Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
9189
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
9290
* [Application Insight telemetry data analysis using Apache Spark in HDInsight](apache-spark-analyze-application-insight-logs.md)
9391

9492
### Create and run applications
93+
9594
* [Create a standalone application using Scala](apache-spark-create-standalone-application.md)
9695
* [Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
9796

9897
### Tools and extensions
98+
9999
* [Use HDInsight Tools Plugin for IntelliJ IDEA to create and submit Spark Scala applications](apache-spark-intellij-tool-plugin.md)
100100
* [Use HDInsight Tools Plugin for IntelliJ IDEA to debug Apache Spark applications remotely](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
101101
* [Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight](apache-spark-zeppelin-notebook.md)
@@ -104,14 +104,6 @@ To use the Microsoft Cognitive Toolkit with the Azure HDInsight Spark cluster, y
104104
* [Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
105105

106106
### Manage resources
107+
107108
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
108109
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
109-
110-
[hdinsight-versions]: hdinsight-component-versioning.md
111-
[hdinsight-upload-data]: hdinsight-upload-data.md
112-
[hdinsight-storage]: hdinsight-hadoop-use-blob-storage.md
113-
114-
[azure-purchase-options]: https://azure.microsoft.com/pricing/purchase-options/
115-
[azure-member-offers]: https://azure.microsoft.com/pricing/member-offers/
116-
[azure-free-trial]: https://azure.microsoft.com/pricing/free-trial/
117-
[azure-create-storageaccount]:../../storage/common/storage-create-storage-account.md

0 commit comments

Comments
 (0)