You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-create-cluster-cli.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -136,7 +136,7 @@ az group delete \
136
136
137
137
## Next steps
138
138
139
-
In this quickstart, you learned how to create an Apache Spark cluster in Azure HDInsight using Azure CLI. Advance to the next tutorial to learn how to use an HDInsight Spark cluster to run interactive queries on sample data.
139
+
In this quickstart, you learned how to create an Apache Spark cluster in Azure HDInsight using Azure CLI. Advance to the next tutorial to learn how to use an HDInsight cluster to run interactive queries on sample data.
140
140
141
141
> [!div class="nextstepaction"]
142
142
> [Run interactive queries on Apache Spark](./apache-spark-load-data-run-query.md)
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-jupyter-spark-sql-use-portal.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,29 +1,29 @@
1
1
---
2
2
title: 'Quickstart: Create Spark cluster in HDInsight using Azure portal'
3
-
description: This quickstart shows how to use the Azure portal to create an Apache Spark cluster in Azure HDInsight, and run a Spark SQL.
3
+
description: This quickstart shows how to use the Azure portal to create an Apache Spark cluster in Azure HDInsight, and run a Spark SQL query.
4
4
author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: quickstart
9
9
ms.date: 09/27/2019
10
10
ms.custom: mvc
11
-
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a spark cluster and query some data.
11
+
#Customer intent: As a developer new to Apache Spark on Azure, I need to see how to create a Spark cluster and query some data.
12
12
---
13
13
14
14
# Quickstart: Create Apache Spark cluster in Azure HDInsight using Azure portal
15
15
16
-
In this quickstart, you create an HDInsight Spark cluster in the Azure portal. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
16
+
In this quickstart, you use the Azure portal to create an Apache Spark cluster in Azure HDInsight. You then create a Jupyter notebook, and use it to run Spark SQL queries against Apache Hive tables. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. The Apache Spark framework for HDInsight enables fast data analytics and cluster computing using in-memory processing. Jupyter notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?ref=microsoft.com&utm_source=microsoft.com&utm_medium=docs&utm_campaign=visualstudio).
23
23
24
-
## Create an HDInsight Spark cluster
24
+
## Create an Apache Spark cluster in HDInsight
25
25
26
-
You use the Azure portal to create an HDInsight Spark cluster that uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
26
+
You use the Azure portal to create an HDInsight cluster that uses Azure Storage Blobs as the cluster storage. For more information on using Data Lake Storage Gen2, see [Quickstart: Set up clusters in HDInsight](../../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md).
27
27
28
28
> [!IMPORTANT]
29
29
> Billing for HDInsight clusters is prorated per minute, whether you are using them or not. Be sure to delete your cluster after you have finished using it. For more information, see the [Clean up resources](#clean-up-resources) section of this article.
@@ -42,15 +42,15 @@ You use the Azure portal to create an HDInsight Spark cluster that uses Azure St
42
42
|---------|---------|
43
43
|Subscription | From the drop-down, select an Azure subscription used for this cluster. The subscription used for this quickstart is **Azure**. |
44
44
|Resource group | Specify whether you want to create a new resource group or use an existing one. A resource group is a container that holds related resources for an Azure solution. The resource group name used for this quickstart is **myResourceGroup**. |
45
-
|Cluster name | Give a name to your HDInsight Spark cluster. The cluster name used for this quickstart is **myspark2019**.|
45
+
|Cluster name | Give a name to your HDInsight cluster. The cluster name used for this quickstart is **myspark2019**.|
46
46
|Location | Select a location for the resource group. The template uses this location for creating the cluster as well as for the default cluster storage. The location used for this quickstart is **East US**. |
47
47
|Cluster type| Select **Spark** as the cluster type.|
48
48
|Cluster version|This field will auto-populate with the default version once the cluster type has been selected.|
49
49
|Cluster login username| Enter the cluster login username. The default name is *admin*. You use this account to login in to the Jupyter notebook later in the quickstart. |
50
50
|Cluster login password| Enter the cluster login password. |
51
51
|Secure Shell (SSH) username| Enter the SSH username. The SSH username used for this quickstart is **sshuser**. By default, this account shares the same password as the *Cluster Login username* account. |
52
52
53
-

53
+

54
54
55
55
Select **Next: Storage >>** to continue to the **Storage** page.
56
56
@@ -63,13 +63,13 @@ You use the Azure portal to create an HDInsight Spark cluster that uses Azure St
63
63
|Primary storage account|Use the auto-populated value.|
64
64
|Container|Use the auto-populated value.|
65
65
66
-

66
+

67
67
68
68
Select **Review + create** to continue.
69
69
70
70
1. Under **Review + create**, select **Create**. It takes about 20 minutes to create the cluster. The cluster must be created before you can proceed to the next session.
71
71
72
-
If you run into an issue with creating HDInsight clusters, it could be that you do not have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-create-linux-clusters-portal.md).
72
+
If you run into an issue with creating HDInsight clusters, it could be that you don't have the right permissions to do so. For more information, see [Access control requirements](../hdinsight-hadoop-create-linux-clusters-portal.md).
73
73
74
74
## Create a Jupyter notebook
75
75
@@ -91,13 +91,13 @@ Jupyter Notebook is an interactive notebook environment that supports various pr
91
91
92
92
A new notebook is created and opened with the name Untitled(Untitled.pynb).
93
93
94
-
## Run Spark SQL statements
94
+
## Run Apache Spark SQL statements
95
95
96
96
SQL (Structured Query Language) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
97
97
98
98
1. Verify the kernel is ready. The kernel is ready when you see a hollow circle next to the kernel name in the notebook. Solid circle denotes that the kernel is busy.
99
99
100
-

100
+

101
101
102
102
When you start the notebook for the first time, the kernel performs some tasks in the background. Wait for the kernel to be ready.
103
103
@@ -108,9 +108,9 @@ SQL (Structured Query Language) is the most common and widely used language for
108
108
SHOW TABLES
109
109
```
110
110
111
-
When you use a Jupyter Notebook with your HDInsight Spark cluster, you get a preset `sqlContext` that you can use to run Hive queries using Spark SQL. `%%sql` tells Jupyter Notebook to use the preset `sqlContext` to run the Hive query. The query retrieves the top 10 rows from a Hive table (**hivesampletable**) that comes with all HDInsight clusters by default. It takes about 30 seconds to get the results. The output looks like:
111
+
When you use a Jupyter Notebook with your HDInsight cluster, you get a preset `sqlContext` that you can use to run Hive queries using Spark SQL. `%%sql` tells Jupyter Notebook to use the preset `sqlContext` to run the Hive query. The query retrieves the top 10 rows from a Hive table (**hivesampletable**) that comes with all HDInsight clusters by default. It takes about 30 seconds to get the results. The output looks like:
112
112
113
-

113
+

114
114
115
115
Every time you run a query in Jupyter, your web browser window title shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner.
116
116
@@ -123,7 +123,7 @@ SQL (Structured Query Language) is the most common and widely used language for
123
123
124
124
The screen shall refresh to show the query output.
125
125
126
-

126
+

127
127
128
128
1. From the **File** menu on the notebook, select **Close and Halt**. Shutting down the notebook releases the cluster resources.
129
129
@@ -135,11 +135,11 @@ Switch back to the Azure portal, and select **Delete**.
135
135
136
136

137
137
138
-
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight Spark cluster, and the default storage account.
138
+
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
139
139
140
140
## Next steps
141
141
142
-
In this quickstart, you learned how to create an HDInsight Spark cluster and run a basic Spark SQL query. Advance to the next tutorial to learn how to use an HDInsight Spark cluster to run interactive queries on sample data.
142
+
In this quickstart, you learned how to create an Apache Spark cluster in HDInsight and run a basic Spark SQL query. Advance to the next tutorial to learn how to use an HDInsight cluster to run interactive queries on sample data.
143
143
144
144
> [!div class="nextstepaction"]
145
145
> [Run interactive queries on Apache Spark](./apache-spark-load-data-run-query.md)
0 commit comments