Skip to content

Commit 55eddbe

Browse files
committed
spark1
1 parent 8740e45 commit 55eddbe

File tree

5 files changed

+36
-29
lines changed

5 files changed

+36
-29
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,8 @@
272272
href: ./spark/apache-spark-jupyter-spark-sql-use-powershell.md
273273
- name: Create Apache Spark cluster - Azure CLI
274274
href: ./spark/apache-spark-create-cluster-cli.md
275-
- name: Create Apache Spark cluster - Template
275+
- name: Create Apache Spark cluster - ARM Template
276+
displayName: Resource Manager
276277
href: ./spark/apache-spark-jupyter-spark-sql.md
277278
- name: Tutorials
278279
items:

articles/hdinsight/spark/apache-spark-jupyter-spark-sql.md

Lines changed: 34 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,57 +6,63 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: quickstart
9-
ms.custom: mvc
10-
ms.date: 03/05/2020
9+
ms.custom: subject-armqs
10+
ms.date: 03/13/2020
1111

1212
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a Spark cluster and query some data.
1313
---
1414

1515
# Quickstart: Create Apache Spark cluster in Azure HDInsight using Resource Manager template
1616

17-
In this quickstart, you use an Azure Resource Manager template to create an Apache Spark cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
17+
In this quickstart, you use an Azure Resource Manager template to create an [Apache Spark](./apache-spark-overview.md) cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
1818

19-
[Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md) | [Apache Spark](https://spark.apache.org/) | [Apache Hive](https://hive.apache.org/) | [Jupyter Notebook](https://jupyter.org/) | [Azure quickstart templates](https://azure.microsoft.com/resources/templates/?resourceType=Microsoft.Hdinsight&pageNumber=1&sort=Popular)
19+
[!INCLUDE [About Azure Resource Manager](../../../includes/resource-manager-quickstart-introduction.md)]
2020

2121
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2222

2323
## Create an Apache Spark cluster
2424

25-
Create an Apache Spark cluster in HDInsight using an Azure Resource Manager template. The template can be found in [GitHub](https://azure.microsoft.com/resources/templates/101-hdinsight-spark-linux/). For the JSON syntax and properties of the cluster, see [Microsoft.HDInsight/clusters](/azure/templates/microsoft.hdinsight/clusters).
25+
### Review the template
2626

27-
The cluster uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
27+
The template used in this quickstart is from [Azure Quickstart templates](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-spark-linux).
2828

29-
> [!IMPORTANT]
30-
> Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see the [Clean up resources](#clean-up-resources) section of this article.
29+
:::code language="json" source="~/quickstart-templates/101-hdinsight-spark-linux/azuredeploy.json" range="1-143":::
3130

32-
1. Select the following link to open the template in the Azure portal in a new browser tab:
31+
The mapping is defined in the `openpublishing.publish.config` file.
3332

34-
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-hdinsight-spark-linux%2Fazuredeploy.json" target="_blank">Deploy to Azure</a>
33+
Two Azure resources are defined in the template:
3534

36-
2. Enter the following values:
35+
* [Microsoft.Storage/storageAccounts](https://docs.microsoft.com/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
36+
* [Microsoft.HDInsight/cluster](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
3737

38-
| Property | Value |
39-
|---|---|
40-
|Subscription|Select your Azure subscription used for creating this cluster. |
41-
| Resource group|Create a resource group or select an existing one. Resource group is used to manage Azure resources for your projects. The new resource group name used for this quickstart is **myspark20180403rg**.|
42-
| Location|Select a location for the resource group. The template uses this location for creating the cluster, and the default cluster storage. The location used for this quickstart is **East US 2**.|
43-
| ClusterName|Enter a name for the cluster that you want to create. The new cluster name used for this quickstart is **myspark20180403**.|
44-
| Cluster login name and password|The default login name is admin. Choose a password for the cluster login. The login name used for this quickstart is **admin**.|
45-
| SSH user name and password|Choose a password for the SSH user. The SSH user name used for this quickstart is **sshuser**.|
38+
### Deploy the template
4639

47-
![Create Spark cluster in HDInsight using Azure Resource Manager template](./media/apache-spark-jupyter-spark-sql/create-spark-cluster-in-hdinsight-using-azure-resource-manager-template.png "Create Spark cluster in HDInsight using an Azure Resource Manager template")
40+
1. Select the **Deploy to Azure** button below to sign in to Azure and open the Resource Manager template.
4841

49-
3. Select **I agree to the terms and conditions stated above**, and then select **Purchase**. You can see a new tile titled **Deploying Template deployment**. It takes about 20 minutes to create the cluster. The cluster must be created before you can proceed to the next session.
42+
[![Deploy to Azure](./media/apache-spark-jupyter-spark-sql/deploy-to-azure.png)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-hdinsight-spark-linux%2Fazuredeploy.json)
5043

51-
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-customize-cluster-linux.md#access-control).
44+
1. Enter or select the following values:
45+
46+
|Property |Description |
47+
|---|---|
48+
|Subscription|From the drop-down list, select the Azure subscription that's used for the cluster.|
49+
|Resource group|From the drop-down list, select your existing resource group, or select **Create new**.|
50+
|Location|The value will autopopulate with the location used for the resource group.|
51+
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
52+
|Cluster Login User Name|Provide the username, default is **admin**.|
53+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ). |
54+
|Ssh User Name|Provide the username, default is **sshuser**|
55+
|Ssh Password|Provide the password.|
56+
57+
![Create Spark cluster in HDInsight using Azure Resource Manager template](./media/apache-spark-jupyter-spark-sql/resource-manager-template-spark.png "Create Spark cluster in HDInsight using an Azure Resource Manager template")
5258

53-
## Install IntelliJ/Eclipse for Spark applications
59+
1. Review the **TERMS AND CONDITIONS**. Then select **I agree to the terms and conditions stated above**, then **Purchase**. You'll receive a notification that your deployment is in progress. It takes about 20 minutes to create a cluster.
5460

55-
Use the Azure Toolkit for IntelliJ/Eclipse plug-in to develop Spark applications written in [Scala](https://www.scala-lang.org/), and then submit them to an Azure HDInsight cluster directly from the IntelliJ/Eclipse integrated development environment (IDE). For more information, see [Use IntelliJ to author/submit Spark application](./apache-spark-intellij-tool-plugin.md) and [Use Eclipse to author/submit Spark application](./apache-spark-eclipse-tool-plugin.md).
61+
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-customize-cluster-linux.md#access-control).
5662

57-
## Install VSCode for PySpark/Hive applications
63+
## Review deployed resources
5864

59-
Learn how to use the Azure HDInsight Tools for Visual Studio Code (VSCode) to create and submit Hive batch jobs, interactive Hive queries, PySpark batch, and PySpark interactive scripts. The Azure HDInsight Tools can be installed on the platforms that are supported by VSCode. These include Windows, Linux, and macOS. For more information, see [Use VSCode to author/submit PySpark application](../hdinsight-for-vscode.md).
65+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Storage](../hdinsight-hadoop-use-blob-storage.md) account or an [Azure Data Lake Storage account](../hdinsight-hadoop-use-data-lake-store.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
6066

6167
## Create a Jupyter notebook
6268

@@ -116,9 +122,9 @@ SQL (Structured Query Language) is the most common and widely used language for
116122

117123
## Clean up resources
118124

119-
HDInsight saves your data and Jupyter notebooks in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use. If you plan to work on the tutorial listed in [Next steps](#next-steps) immediately, you might want to keep the cluster.
125+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
120126

121-
Switch back to the Azure portal, and select **Delete**.
127+
From the Azure portal, navigate to your cluster, and select **Delete**.
122128

123129
![Azure portal delete an HDInsight cluster](./media/apache-spark-jupyter-spark-sql/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
124130

16.7 KB
Loading
281 KB
Loading

0 commit comments

Comments
 (0)