Skip to content

Commit 222364f

Browse files
authored
Merge pull request #102823 from v-thepet/quickstarts9
Four HDInsight quickstarts
2 parents e096ba6 + bf9795c commit 222364f

17 files changed

+100
-89
lines changed

articles/hdinsight/spark/apache-spark-create-cluster-cli.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,23 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: quickstart
99
ms.date: 02/03/2020
10-
11-
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a spark cluster.
10+
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a Spark cluster.
1211
---
1312

1413
# Quickstart: Create Apache Spark cluster in Azure HDInsight using Azure CLI
1514

16-
In this quickstart, you learn how to create an Apache Spark cluster in Azure HDInsight using Azure CLI. Apache Spark enables fast data analytics and cluster computing using in-memory processing. The [Azure command-line interface (CLI)](https://docs.microsoft.com/cli/azure/?view=azure-cli-latest) is Microsoft's cross-platform command-line experience for managing Azure resources.
17-
18-
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
15+
In this quickstart, you learn how to create an Apache Spark cluster in Azure HDInsight using the Azure command-line interface (CLI). Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources.
1916

2017
## Prerequisites
2118

22-
Azure CLI. If you haven't installed the Azure CLI, see [Install the Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) for steps.
19+
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?ref=microsoft.com&utm_source=microsoft.com&utm_medium=docs&utm_campaign=visualstudio).
20+
- [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli), if you don't want to use Azure Cloud Shell.
2321

2422
[!INCLUDE [cloud-shell-try-it.md](../../../includes/cloud-shell-try-it.md)]
2523

2624
## Create an Apache Spark cluster
2725

28-
1. Sign in to your Azure subscription. If you plan to use Azure Cloud Shell, select **Try it** in the upper-right corner of the code block. Else, enter the command below:
26+
1. Sign in to your Azure subscription. If you plan to use Azure Cloud Shell, select **Try it** in the upper-right corner of the following code block. Else, enter the following command:
2927

3028
```azurecli-interactive
3129
az login
@@ -138,7 +136,7 @@ az group delete \
138136

139137
## Next steps
140138

141-
In this quickstart, you learned how to create an Apache Spark cluster in Azure HDInsight using Azure CLI. Advance to the next tutorial to learn how to use an HDInsight Spark cluster to run interactive queries on sample data.
139+
In this quickstart, you learned how to create an Apache Spark cluster in Azure HDInsight using Azure CLI. Advance to the next tutorial to learn how to use an HDInsight cluster to run interactive queries on sample data.
142140

143141
> [!div class="nextstepaction"]
144142
> [Run interactive queries on Apache Spark](./apache-spark-load-data-run-query.md)

articles/hdinsight/spark/apache-spark-jupyter-spark-sql-use-portal.md

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,56 @@
11
---
22
title: 'Quickstart: Create Spark cluster in HDInsight using Azure portal'
3-
description: This quickstart shows how to use the Azure portal to create an Apache Spark cluster in Azure HDInsight, and run a Spark SQL.
3+
description: This quickstart shows how to use the Azure portal to create an Apache Spark cluster in Azure HDInsight, and run a Spark SQL query.
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: quickstart
99
ms.date: 09/27/2019
1010
ms.custom: mvc
11-
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a spark cluster and query some data.
11+
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a Spark cluster and query some data.
1212
---
1313

1414
# Quickstart: Create Apache Spark cluster in Azure HDInsight using Azure portal
1515

16-
Learn how to create Apache Spark cluster in Azure HDInsight, and how to run Spark SQL queries against Hive tables. Apache Spark enables fast data analytics and cluster computing using in-memory processing. For information on Spark on HDInsight, see [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md).
16+
In this quickstart, you use the Azure portal to create an Apache Spark cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
1717

18-
In this quickstart, you use the Azure portal to create an HDInsight Spark cluster. The cluster uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
18+
[Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md) | [Apache Spark](https://spark.apache.org/) | [Apache Hive](https://hive.apache.org/) | [Jupyter Notebook](https://jupyter.org/)
19+
20+
## Prerequisites
21+
22+
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?ref=microsoft.com&utm_source=microsoft.com&utm_medium=docs&utm_campaign=visualstudio).
23+
24+
## Create an Apache Spark cluster in HDInsight
25+
26+
You use the Azure portal to create an HDInsight cluster that uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
1927

2028
> [!IMPORTANT]
2129
> Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see the [Clean up resources](#clean-up-resources) section of this article.
2230
23-
If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free/) before you begin.
31+
1. In the Azure portal, select **Create a resource**.
2432

25-
## Create an HDInsight Spark cluster
33+
![Azure portal create a resource](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-create.png "Create a resource in Azure portal")
2634

27-
1. In the Azure portal, select **Create a resource** > **Analytics** > **HDInsight**.
35+
1. On the **New** page, select **Analytics** > **HDInsight**.
2836

29-
![Azure portal create a resource HDInsight](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-create-hdinsight-spark-cluster.png "HDInsight on Azure portal")
37+
![Azure portal create HDInsight](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-create-hdinsight-spark-cluster.png "HDInsight on Azure portal")
3038

3139
1. Under **Basics**, provide the following values:
3240

3341
|Property |Description |
3442
|---------|---------|
3543
|Subscription | From the drop-down, select an Azure subscription used for this cluster. The subscription used for this quickstart is **Azure**. |
3644
|Resource group | Specify whether you want to create a new resource group or use an existing one. A resource group is a container that holds related resources for an Azure solution. The resource group name used for this quickstart is **myResourceGroup**. |
37-
|Cluster name | Give a name to your HDInsight Spark cluster. The cluster name used for this quickstart is **myspark2019**.|
45+
|Cluster name | Give a name to your HDInsight cluster. The cluster name used for this quickstart is **myspark2019**.|
3846
|Location | Select a location for the resource group. The template uses this location for creating the cluster as well as for the default cluster storage. The location used for this quickstart is **East US**. |
3947
|Cluster type| Select **Spark** as the cluster type.|
4048
|Cluster version|This field will auto-populate with the default version once the cluster type has been selected.|
4149
|Cluster login username| Enter the cluster login username. The default name is *admin*. You use this account to login in to the Jupyter notebook later in the quickstart. |
4250
|Cluster login password| Enter the cluster login password. |
4351
|Secure Shell (SSH) username| Enter the SSH username. The SSH username used for this quickstart is **sshuser**. By default, this account shares the same password as the *Cluster Login username* account. |
4452

45-
![Create HDInsight Spark cluster basic configurations](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-cluster-basics-spark.png "Create Spark cluster in HDInsight the Basic configurations")
53+
![Create HDInsight cluster basic configurations](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-cluster-basics-spark.png "Create Spark cluster in HDInsight the Basic configurations")
4654

4755
Select **Next: Storage >>** to continue to the **Storage** page.
4856

@@ -55,13 +63,13 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
5563
|Primary storage account|Use the auto-populated value.|
5664
|Container|Use the auto-populated value.|
5765

58-
![Create HDInsight Spark cluster basic configurations](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-cluster-storage.png "Create Spark cluster in HDInsight the Basic configurations")
66+
![Create HDInsight cluster basic configurations](./media/apache-spark-jupyter-spark-sql-use-portal/azure-portal-cluster-storage.png "Create Spark cluster in HDInsight the Basic configurations")
5967

6068
Select **Review + create** to continue.
6169

6270
1. Under **Review + create**, select **Create**. It takes about 20 minutes to create the cluster. The cluster must be created before you can proceed to the next session.
6371

64-
If you run into an issue with creating HDInsight clusters, it could be that you do not have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-create-linux-clusters-portal.md).
72+
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-create-linux-clusters-portal.md).
6573

6674
## Create a Jupyter notebook
6775

@@ -83,13 +91,13 @@ Jupyter Notebook is an interactive notebook environment that supports various pr
8391

8492
A new notebook is created and opened with the name Untitled(Untitled.pynb).
8593

86-
## Run Spark SQL statements
94+
## Run Apache Spark SQL statements
8795

8896
SQL (Structured Query Language) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
8997

9098
1. Verify the kernel is ready. The kernel is ready when you see a hollow circle next to the kernel name in the notebook. Solid circle denotes that the kernel is busy.
9199

92-
![Apache Hive query in HDInsight Spark1](./media/apache-spark-jupyter-spark-sql/jupyter-spark-kernel-status.png "Hive query in HDInsight Spark1")
100+
![Apache Hive query in HDInsight](./media/apache-spark-jupyter-spark-sql/jupyter-spark-kernel-status.png "Hive query in HDInsight")
93101

94102
When you start the notebook for the first time, the kernel performs some tasks in the background. Wait for the kernel to be ready.
95103

@@ -100,9 +108,9 @@ SQL (Structured Query Language) is the most common and widely used language for
100108
SHOW TABLES
101109
```
102110
103-
When you use a Jupyter Notebook with your HDInsight Spark cluster, you get a preset `sqlContext` that you can use to run Hive queries using Spark SQL. `%%sql` tells Jupyter Notebook to use the preset `sqlContext` to run the Hive query. The query retrieves the top 10 rows from a Hive table (**hivesampletable**) that comes with all HDInsight clusters by default. It takes about 30 seconds to get the results. The output looks like:
111+
When you use a Jupyter Notebook with your HDInsight cluster, you get a preset `sqlContext` that you can use to run Hive queries using Spark SQL. `%%sql` tells Jupyter Notebook to use the preset `sqlContext` to run the Hive query. The query retrieves the top 10 rows from a Hive table (**hivesampletable**) that comes with all HDInsight clusters by default. It takes about 30 seconds to get the results. The output looks like:
104112
105-
![Apache Hive query in HDInsight Spark2](./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query.png "Hive query in HDInsight Spark2")
113+
![Apache Hive query in HDInsight](./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query.png "Hive query in HDInsight")
106114
107115
Every time you run a query in Jupyter, your web browser window title shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner.
108116
@@ -115,7 +123,7 @@ SQL (Structured Query Language) is the most common and widely used language for
115123
116124
The screen shall refresh to show the query output.
117125
118-
![Hive query output in HDInsight Spark](./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query-output.png "Hive query output in HDInsight Spark")
126+
![Hive query output in HDInsight](./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query-output.png "Hive query output in HDInsight")
119127
120128
1. From the **File** menu on the notebook, select **Close and Halt**. Shutting down the notebook releases the cluster resources.
121129
@@ -127,11 +135,11 @@ Switch back to the Azure portal, and select **Delete**.
127135
128136
![Azure portal delete an HDInsight cluster](./media/apache-spark-jupyter-spark-sql/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
129137
130-
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight Spark cluster, and the default storage account.
138+
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
131139
132140
## Next steps
133141
134-
In this quickstart, you learned how to create an HDInsight Spark cluster and run a basic Spark SQL query. Advance to the next tutorial to learn how to use an HDInsight Spark cluster to run interactive queries on sample data.
142+
In this quickstart, you learned how to create an Apache Spark cluster in HDInsight and run a basic Spark SQL query. Advance to the next tutorial to learn how to use an HDInsight cluster to run interactive queries on sample data.
135143
136144
> [!div class="nextstepaction"]
137145
> [Run interactive queries on Apache Spark](./apache-spark-load-data-run-query.md)

0 commit comments

Comments
 (0)