You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-apache-spark-notebook.md
+21-13Lines changed: 21 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'Quickstart: Create an Apache Spark pool (preview) in Azure Synapse Analytics'
2
+
title: 'Quickstart: Create an Apache Spark notebook'
3
3
description: This quickstart shows how to use the web tools to create an Apache Spark pool (preview) in Azure Synapse Analytics, and run a Spark SQL query.
4
4
services: synapse-analytics
5
5
author: euangMS
@@ -11,25 +11,33 @@ ms.topic: quickstart
11
11
ms.date: 04/15/2020
12
12
---
13
13
14
-
# Quickstart: Create an Apache Spark pool (preview) in Synapse Analytics using web tools
14
+
# Quickstart: Create an Apache Spark pool (preview) in Azure Synapse Analytics using web tools
15
15
16
-
In this quickstart, you learn how to create an Apache Spark pool (preview) in Azure Synapse Analytics using web tools. You then learn to connect to the Apache Spark pool and run Spark SQL queries against files and tables. Apache Spark enables fast data analytics and cluster computing using in-memory processing. For information on Spark on Synapse Analytics, see [Overview: Apache Spark on Azure Synapse Analytics](apache-spark-overview.md).
16
+
In this quickstart, you learn how to create an Apache Spark pool (preview) in Azure Synapse using web tools. You then learn to connect to the Apache Spark pool and run Spark SQL queries against files and tables. Apache Spark enables fast data analytics and cluster computing using in-memory processing. For information on Spark in Azure Synapse, see [Overview: Apache Spark on Azure Synapse](apache-spark-overview.md).
17
17
18
18
> [!IMPORTANT]
19
19
> Billing for Spark instances is prorated per minute, whether you are using them or not. Be sure to shutdown your Spark instance after you have finished using it, or set a short timeout. For more information, see the **Clean up resources** section of this article.
20
20
21
-
If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free/) before you begin.
21
+
If you don't have an Azure subscription, [create a free account before you begin](https:/azure.microsoft.com/free/).
22
+
23
+
## Prerequisites
24
+
25
+
- Azure subscription - [create one for free](https:/azure.microsoft.com/free/)
This article shows you how to create a new Apache Spark pool using web tools.
31
+
Sign in to the [Azure portal](https:/portal.azure.com/)
32
+
33
+
If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free/) before you begin.
26
34
27
35
## Create a notebook
28
36
29
37
A notebook is an interactive environment that supports various programming languages. The notebook allows you to interact with your data, combine code with markdown, text and perform simple visualizations.
30
38
31
-
1. From the Azure portal view for the Synapse Analytics workspace you want to use, select **Launch Synapse Studio**.
32
-
2. Once Synapse Analytics Studio has launched, select **Develop**. Then, hover over the **Notebooks** entry. Select the ellipsis (**...**).
39
+
1. From the Azure portal view for the Azure Synapse workspace you want to use, select **Launch Synapse Studio**.
40
+
2. Once Synapse Studio has launched, select **Develop**. Then, hover over the **Notebooks** entry. Select the ellipsis (**...**).
33
41
3. From there, select **New notebook**. A new notebook is created and opened with an automatically generated name.
@@ -85,7 +93,7 @@ SQL (Structured Query Language) is the most common and widely used language for
85
93
SHOWTABLES
86
94
```
87
95
88
-
When you use a Notebook with your Synapse Analytics Apache Spark pool, you get a preset `sqlContext` that you can use to run queries using Spark SQL. `%%sql` tells the notebook to use the preset `sqlContext` to run the query. The query retrieves the top 10 rows from a system table that comes withallSynapse Analytics Apache Spark pools by default.
96
+
When you use a Notebook with your Azure Synapse Apache Spark pool, you get a preset `sqlContext` that you can use to run queries using Spark SQL. `%%sql` tells the notebook to use the preset `sqlContext` to run the query. The query retrieves the top 10 rows from a system table that comes withallAzure Synapse Apache Spark pools by default.
89
97
90
98
2. Run another query to see the data in`demo_df`.
91
99
@@ -98,7 +106,7 @@ SQL (Structured Query Language) is the most common and widely used language for
98
106
99
107
By default the results view shows a grid, but there is a view switcher underneath the grid that allows the view to switch between grid and graph views.
100
108
101
-

10. It is possible to get the same experience of running SQL but without having to switch languages. You can do this by replacing the SQL cell above with this PySpark cell, the output experience is the same because the **display** command is used:
114
122
@@ -120,13 +128,13 @@ SQL (Structured Query Language) is the most common and widely used language for
120
128
121
129
## Clean up resources
122
130
123
-
Synapse Analytics saves your data in Azure Data Lake Storage. You can safely let a Spark instance shut down when it isnotin use. You are charged fora Synapse Analytics Apache Spark pool aslongas it is running, even when it isnotin use. Since the charges for the pool are many times more than the charges for storage, it makes economic sense to let Spark instances shut down when they are notin use.
131
+
Azure Synapse saves your data in Azure Data Lake Storage. You can safely let a Spark instance shut down when it isnotin use. You are charged foran Azure Synapse Apache Spark pool aslongas it is running, even when it isnotin use. Since the charges for the pool are many times more than the charges for storage, it makes economic sense to let Spark instances shut down when they are notin use.
124
132
125
133
To ensure the Spark instance is shut down, end any connected sessions(notebooks). The pool shuts down when the **idle time** specified in the Apache Spark pool is reached. You can also select **end session**from the status bar at the bottom of the notebook.
126
134
127
135
## Next steps
128
136
129
-
In this quickstart, you learned how to create a Synapse Analytics Apache Spark pool and run a basic Spark SQL query.
137
+
In this quickstart, you learned how to create a Azure Synapse Apache Spark pool and run a basic Spark SQL query.
0 commit comments