Skip to content

Commit daf8d5b

Browse files
authored
Merge pull request #107680 from dagiro/spark1
spark1
2 parents 91b6b0a + a677014 commit daf8d5b

File tree

5 files changed

+34
-29
lines changed

5 files changed

+34
-29
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,8 @@
274274
href: ./spark/apache-spark-jupyter-spark-sql-use-powershell.md
275275
- name: Create Apache Spark cluster - Azure CLI
276276
href: ./spark/apache-spark-create-cluster-cli.md
277-
- name: Create Apache Spark cluster - Template
277+
- name: Create Apache Spark cluster - ARM Template
278+
displayName: Resource Manager
278279
href: ./spark/apache-spark-jupyter-spark-sql.md
279280
- name: Tutorials
280281
items:

articles/hdinsight/spark/apache-spark-jupyter-spark-sql.md

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,57 +6,61 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: quickstart
9-
ms.custom: mvc
10-
ms.date: 03/05/2020
9+
ms.custom: subject-armqs
10+
ms.date: 03/13/2020
1111

1212
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a Spark cluster and query some data.
1313
---
1414

1515
# Quickstart: Create Apache Spark cluster in Azure HDInsight using Resource Manager template
1616

17-
In this quickstart, you use an Azure Resource Manager template to create an Apache Spark cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
17+
In this quickstart, you use an Azure Resource Manager template to create an [Apache Spark](./apache-spark-overview.md) cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
1818

19-
[Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md) | [Apache Spark](https://spark.apache.org/) | [Apache Hive](https://hive.apache.org/) | [Jupyter Notebook](https://jupyter.org/) | [Azure quickstart templates](https://azure.microsoft.com/resources/templates/?resourceType=Microsoft.Hdinsight&pageNumber=1&sort=Popular)
19+
[!INCLUDE [About Azure Resource Manager](../../../includes/resource-manager-quickstart-introduction.md)]
2020

2121
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2222

2323
## Create an Apache Spark cluster
2424

25-
Create an Apache Spark cluster in HDInsight using an Azure Resource Manager template. The template can be found in [GitHub](https://azure.microsoft.com/resources/templates/101-hdinsight-spark-linux/). For the JSON syntax and properties of the cluster, see [Microsoft.HDInsight/clusters](/azure/templates/microsoft.hdinsight/clusters).
25+
### Review the template
2626

27-
The cluster uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
27+
The template used in this quickstart is from [Azure Quickstart templates](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-spark-linux).
2828

29-
> [!IMPORTANT]
30-
> Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see the [Clean up resources](#clean-up-resources) section of this article.
29+
:::code language="json" source="~/quickstart-templates/101-hdinsight-spark-linux/azuredeploy.json" range="1-143":::
3130

32-
1. Select the following link to open the template in the Azure portal in a new browser tab:
31+
Two Azure resources are defined in the template:
3332

34-
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-hdinsight-spark-linux%2Fazuredeploy.json" target="_blank">Deploy to Azure</a>
33+
* [Microsoft.Storage/storageAccounts](https://docs.microsoft.com/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
34+
* [Microsoft.HDInsight/cluster](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
3535

36-
2. Enter the following values:
36+
### Deploy the template
3737

38-
| Property | Value |
39-
|---|---|
40-
|Subscription|Select your Azure subscription used for creating this cluster. |
41-
| Resource group|Create a resource group or select an existing one. Resource group is used to manage Azure resources for your projects. The new resource group name used for this quickstart is **myspark20180403rg**.|
42-
| Location|Select a location for the resource group. The template uses this location for creating the cluster, and the default cluster storage. The location used for this quickstart is **East US 2**.|
43-
| ClusterName|Enter a name for the cluster that you want to create. The new cluster name used for this quickstart is **myspark20180403**.|
44-
| Cluster login name and password|The default login name is admin. Choose a password for the cluster login. The login name used for this quickstart is **admin**.|
45-
| SSH user name and password|Choose a password for the SSH user. The SSH user name used for this quickstart is **sshuser**.|
38+
1. Select the **Deploy to Azure** button below to sign in to Azure and open the Resource Manager template.
4639

47-
![Create Spark cluster in HDInsight using Azure Resource Manager template](./media/apache-spark-jupyter-spark-sql/create-spark-cluster-in-hdinsight-using-azure-resource-manager-template.png "Create Spark cluster in HDInsight using an Azure Resource Manager template")
40+
[![Deploy to Azure](./media/apache-spark-jupyter-spark-sql/deploy-to-azure.png)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-hdinsight-spark-linux%2Fazuredeploy.json)
4841

49-
3. Select **I agree to the terms and conditions stated above**, and then select **Purchase**. You can see a new tile titled **Deploying Template deployment**. It takes about 20 minutes to create the cluster. The cluster must be created before you can proceed to the next session.
42+
1. Enter or select the following values:
5043

51-
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-customize-cluster-linux.md#access-control).
44+
|Property |Description |
45+
|---|---|
46+
|Subscription|From the drop-down list, select the Azure subscription that's used for the cluster.|
47+
|Resource group|From the drop-down list, select your existing resource group, or select **Create new**.|
48+
|Location|The value will autopopulate with the location used for the resource group.|
49+
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
50+
|Cluster Login User Name|Provide the username, default is **admin**.|
51+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ). |
52+
|Ssh User Name|Provide the username, default is **sshuser**|
53+
|Ssh Password|Provide the password.|
5254

53-
## Install IntelliJ/Eclipse for Spark applications
55+
![Create Spark cluster in HDInsight using Azure Resource Manager template](./media/apache-spark-jupyter-spark-sql/resource-manager-template-spark.png "Create Spark cluster in HDInsight using an Azure Resource Manager template")
5456

55-
Use the Azure Toolkit for IntelliJ/Eclipse plug-in to develop Spark applications written in [Scala](https://www.scala-lang.org/), and then submit them to an Azure HDInsight cluster directly from the IntelliJ/Eclipse integrated development environment (IDE). For more information, see [Use IntelliJ to author/submit Spark application](./apache-spark-intellij-tool-plugin.md) and [Use Eclipse to author/submit Spark application](./apache-spark-eclipse-tool-plugin.md).
57+
1. Review the **TERMS AND CONDITIONS**. Then select **I agree to the terms and conditions stated above**, then **Purchase**. You'll receive a notification that your deployment is in progress. It takes about 20 minutes to create a cluster.
58+
59+
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-customize-cluster-linux.md#access-control).
5660

57-
## Install VSCode for PySpark/Hive applications
61+
## Review deployed resources
5862

59-
Learn how to use the Azure HDInsight Tools for Visual Studio Code (VSCode) to create and submit Hive batch jobs, interactive Hive queries, PySpark batch, and PySpark interactive scripts. The Azure HDInsight Tools can be installed on the platforms that are supported by VSCode. These include Windows, Linux, and macOS. For more information, see [Use VSCode to author/submit PySpark application](../hdinsight-for-vscode.md).
63+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Storage](../hdinsight-hadoop-use-blob-storage.md) account or an [Azure Data Lake Storage account](../hdinsight-hadoop-use-data-lake-store.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
6064

6165
## Create a Jupyter notebook
6266

@@ -116,9 +120,9 @@ SQL (Structured Query Language) is the most common and widely used language for
116120

117121
## Clean up resources
118122

119-
HDInsight saves your data and Jupyter notebooks in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use. If you plan to work on the tutorial listed in [Next steps](#next-steps) immediately, you might want to keep the cluster.
123+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
120124

121-
Switch back to the Azure portal, and select **Delete**.
125+
From the Azure portal, navigate to your cluster, and select **Delete**.
122126

123127
![Azure portal delete an HDInsight cluster](./media/apache-spark-jupyter-spark-sql/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
124128

16.7 KB
Loading
281 KB
Loading

0 commit comments

Comments
 (0)