You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-apache-spark-notebook.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,13 +28,13 @@ If you don't have an Azure subscription, [create a free account before you begin
28
28
29
29
## Sign in to the Azure portal
30
30
31
-
Sign in to the [Azure portal](https:/portal.azure.com/)
31
+
Sign in to the [Azure portal](https:/portal.azure.com/).
32
32
33
33
If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free/) before you begin.
34
34
35
35
## Create a notebook
36
36
37
-
A notebook is an interactive environment that supports various programming languages. The notebook allows you to interact with your data, combine code with markdown, text and perform simple visualizations.
37
+
A notebook is an interactive environment that supports various programming languages. The notebook allows you to interact with your data, combine code with markdown, text, and perform simple visualizations.
38
38
39
39
1. From the Azure portal view for the Azure Synapse workspace you want to use, select **Launch Synapse Studio**.
40
40
2. Once Synapse Studio has launched, select **Develop**. Then, hover over the **Notebooks** entry. Select the ellipsis (**...**).
@@ -43,7 +43,7 @@ A notebook is an interactive environment that supports various programming langu
43
43
44
44
4. In the **Properties** window, provide a name for the notebook.
45
45
5. On the toolbar, click **Publish**.
46
-
6. If there is only one Apache Spark pool in your workspace, then it is selected by default. Use the drop-down to select the correct Apache Spark pool if none is selected.
46
+
6. If there is only one Apache Spark pool in your workspace, then it's selected by default. Use the drop-down to select the correct Apache Spark pool if none is selected.
47
47
7. Click **Add code**. The default language is `Pyspark`. You are going to use a mix of Pyspark and Spark SQL, so the default choice is fine.
48
48
8. Next you create a simple Spark DataFrame object to manipulate. In this case, you create it from code. There are three rows and three columns:
49
49
@@ -61,7 +61,7 @@ A notebook is an interactive environment that supports various programming langu
61
61
62
62

63
63
64
-
10. If the Apache Spark pool instance is not already running, it is automatically started. You can see the status of the Apache Spark pool instance below the cell you are running and also on the status panel at the bottom of the notebook. Depending on the size of pool, starting should take 2-5 minutes. Once the code has finished running, information below the cell displays showing how long it took to run and its execution. In the output cell, you see the output.
64
+
10. If the Apache Spark pool instance isn't already running, it is automatically started. You can see the Apache Spark pool instance status below the cell you are running and also on the status panel at the bottom of the notebook. Depending on the size of pool, starting should take 2-5 minutes. Once the code has finished running, information below the cell displays showing how long it took to run and its execution. In the output cell, you see the output.
65
65
66
66

67
67
@@ -74,7 +74,7 @@ A notebook is an interactive environment that supports various programming langu
If you use the storage explorer, itis possible to see the impact of the two different ways of writing a file used above. When no file system is specified then the default is used, in this case `default>user>trusted-service-user>demo_df`. The data is saved to the location of the specified file system.
77
+
If you use the storage explorer, it's possible to see the impact of the two different ways of writing a file used above. When no file system is specified then the default is used, in this case `default>user>trusted-service-user>demo_df`. The data is saved to the location of the specified file system.
78
78
79
79
Notice in both the "csv"and"parquet" formats, write operations a directory is created with many partitioned files.
80
80
@@ -84,7 +84,7 @@ A notebook is an interactive environment that supports various programming langu
84
84
85
85
## Run Spark SQL statements
86
86
87
-
SQL (Structured Query Language) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
87
+
Structured Query Language (SQL) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
88
88
89
89
1. Paste the following code in an empty cell, and then run the code. The command lists the tables on the pool.
90
90
@@ -104,11 +104,11 @@ SQL (Structured Query Language) is the most common and widely used language for
104
104
105
105
The code produces two output cells, one that contains data results the other, which shows the job view.
106
106
107
-
By default the results view shows a grid, but there is a view switcher underneath the grid that allows the view to switch between grid and graph views.
107
+
By default the results view shows a grid. But, there is a view switcher underneath the grid that allows the view to switch between grid and graph views.
108
108
109
109

110
110
111
-
3. In the **View** switcher, select **Chart**
111
+
3. In the **View** switcher, select **Chart**.
112
112
4. Select the **View options** icon from the far right-hand side.
113
113
5. In the **Chart type** field, select "bar chart".
114
114
6. In the X-axis column field, select "state".
@@ -128,13 +128,13 @@ SQL (Structured Query Language) is the most common and widely used language for
128
128
129
129
## Clean up resources
130
130
131
-
Azure Synapse saves your data in Azure Data Lake Storage. You can safely let a Spark instance shut down when itisnotin use. You are charged for an Azure Synapse Apache Spark pool aslongas itisrunning, even when itisnotin use. Since the charges for the pool are many times more than the charges for storage, it makes economic sense to let Spark instances shut down when they are notin use.
131
+
Azure Synapse saves your data in Azure Data Lake Storage. You can safely allow a Spark instance to shut down when it's not in use. You are charged for an Azure Synapse Apache Spark pool as long as it's running, even when it's not in use. The charges for the pool are many times more than the charges for storage. As such, it makes economic sense to let Spark instances shut down when they are not in use.
132
132
133
133
To ensure the Spark instance is shut down, end any connected sessions(notebooks). The pool shuts down when the **idle time** specified in the Apache Spark pool is reached. You can also select **end session**from the status bar at the bottom of the notebook.
134
134
135
135
## Next steps
136
136
137
-
In this quickstart, you learned how to create a Azure Synapse Apache Spark pool and run a basic Spark SQL query.
137
+
In this quickstart, you learned how to create an Azure Synapse Apache Spark pool and run a basic Spark SQL query.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-sql-on-demand.md
+16-20Lines changed: 16 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Using SQL on-demand (preview)
3
-
description: In this quickstart, you will see and learn how easy is to query various types of files using SQL on-demand (preview).
3
+
description: In this quickstart, you'll see and learn how easy is to query various types of files using SQL on-demand (preview).
4
4
services: synapse-analytics
5
5
author: azaricstefan
6
6
ms.service: synapse-analytics
@@ -13,15 +13,15 @@ ms.reviewer: jrasnick
13
13
14
14
# Quickstart: Using SQL on-demand
15
15
16
-
Synapse SQL on-demand (preview) is a serverless query service that enables you to run the SQL queries on your files placed in Azure Storage. In this quickstart, you will learn how to query various types of files using SQL on-demand.
16
+
Synapse SQL on-demand (preview) is a serverless query service that enables you to run SQL queries on files placed in Azure Storage. In this quickstart, you'll learn how to query various types of files using SQL on-demand.
17
17
18
18
The following file types are supported: JSON, CSV, Apache Parquet
19
19
20
20
## Prerequisites
21
21
22
22
Choose a SQL client to issue queries:
23
23
24
-
-[Azure Synapse Studio](quickstart-synapse-studio.md) is a web tool that you can use to browse files in storage and create SQL query.
24
+
-[Azure Synapse Studio](quickstart-synapse-studio.md) is a web tool that you can use to browse files in storage and create SQL queries.
25
25
-[Azure Data Studio](sql/get-started-azure-data-studio.md) is a client tool that enables you to run SQL queries and notebooks on your On-demand database.
26
26
-[SQL Server Management Studio](sql/get-started-ssms.md) is a client tool that enables you to run SQL queries on your On-demand database.
27
27
@@ -36,19 +36,18 @@ Parameters for quickstart:
36
36
37
37
## First-time setup
38
38
39
-
Prior to using samples:
39
+
Before using the samples:
40
40
41
41
- Create database for your views (in case you want to use views)
42
42
- Create credentials to be used by SQL on-demand to access files in storage
43
43
44
44
### Create database
45
45
46
-
Create your own database for demo purposes. This is the database in which you create your views. Use this database in the sample queries in this article.
46
+
Create your own database for demo purposes. You'll use this database to create your views and for the sample queries in this article.
47
47
48
48
> [!NOTE]
49
49
> The databases are used only for view metadata, not for actual data.
50
-
>
51
-
> Write down database name you use for use later in the Quickstart.
50
+
>Write down database name you use for use later in the Quickstart.
52
51
53
52
Use the following query, changing `mydbname` to a name of your choice:
54
53
@@ -61,15 +60,15 @@ CREATE DATABASE mydbname
61
60
To run queries using SQL on-demand, create credentials for SQL on-demand to use to access files in storage.
62
61
63
62
> [!NOTE]
64
-
> In order to successfully run samples in this section you have to use SAS token.
63
+
> In order to successfully run samples in this section you have to use an SAS token.
65
64
>
66
65
> To start using SAS tokens you have to drop the UserIdentity which is explained in the following [article](sql/develop-storage-files-storage-access-control.md#disable-forcing-azure-ad-pass-through).
67
66
>
68
67
> SQL on-demand by default always uses AAD pass-through.
69
68
70
-
For more information on how to manage storage access control, check this [link](sql/develop-storage-files-storage-access-control.md).
69
+
For more information on how to manage storage access control, see the[Control storage account access for SQL on-demand ](sql/develop-storage-files-storage-access-control.md) article.
71
70
72
-
Execute following code snippet to create credential used in samples in this section:
71
+
Execute the following code snippet to create credentials used in samples in this section:
73
72
74
73
```sql
75
74
-- create credentials for containers in our demo storage account
@@ -91,7 +90,7 @@ The following image is a preview of the file to be queried:
91
90
92
91

93
92
94
-
The following query shows how to read a CSV file that does not contain a header row, with Windows-style new line, and comma-delimited columns:
93
+
The following query shows how to read a CSV file that doesn't contain a header row, with Windows-style new line, and comma-delimited columns:
95
94
96
95
```sql
97
96
SELECT TOP 10*
@@ -119,7 +118,7 @@ For more examples, see how to [query CSV file](sql/query-single-csv-file.md).
119
118
The following sample shows the automatic schema inference capabilities for querying Parquet files. It returns the number of rows in September of 2017 without specifying schema.
120
119
121
120
> [!NOTE]
122
-
> You do not have to specify columns in `OPENROWSET WITH` clause when reading Parquet files. In that case, SQL on-demand utilizes metadata in the Parquet file and bind columns by name.
121
+
> You do not have to specify columns in `OPENROWSET WITH` clause when reading Parquet files. In that case, SQL on-demand utilizes metadata in the Parquet file and binds columns by name.
123
122
124
123
```sql
125
124
SELECT COUNT_BIG(*)
@@ -130,7 +129,7 @@ FROM OPENROWSET
130
129
) AS nyc
131
130
```
132
131
133
-
Find more information about [querying parquet files](sql/query-parquet-files.md)].
132
+
Find more information about [querying parquet files](sql/query-parquet-files.md).
134
133
135
134
## Querying JSON files
136
135
@@ -156,7 +155,7 @@ Files are stored in *json* container, folder *books*, and contain single book en
156
155
157
156
### Querying JSON files
158
157
159
-
Following query shows how to use [JSON_VALUE](/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest) to retrieve scalar values (title, publisher) from a book with the title *Probabilistic and Statistical Methods in Cryptology, An Introduction by Selected articles*:
158
+
The following query shows how to use [JSON_VALUE](/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest) to retrieve scalar values (title, publisher) from a book with the title *Probabilistic and Statistical Methods in Cryptology, An Introduction by Selected articles*:
160
159
161
160
```sql
162
161
SELECT
@@ -178,11 +177,11 @@ WHERE
178
177
```
179
178
180
179
> [!IMPORTANT]
181
-
> We are reading the entire JSON file as single row/column so FIELDTERMINATOR, FIELDQUOTE, and ROWTERMINATOR are set to 0x0b because we do not expect to find it in the file.
180
+
> We are reading the entire JSON file as single row/column. So, FIELDTERMINATOR, FIELDQUOTE, and ROWTERMINATOR are set to 0x0b because we do not expect to find it in the file.
182
181
183
182
## Next steps
184
183
185
-
Now you are ready to start with following Quickstart articles:
184
+
You're now ready to continue on with the following articles:
186
185
187
186
-[Query single CSV file](sql/query-single-csv-file.md)
188
187
-[Query folders and multiple CSV files](sql/query-folders-multiple-csv-files.md)
@@ -193,7 +192,4 @@ Now you are ready to start with following Quickstart articles:
193
192
-[Creating and using views](sql/create-use-views.md)
194
193
-[Creating and using external tables](sql/create-use-external-tables.md)
195
194
-[Persist query result to Azure storage](sql/create-external-table-as-select.md)
196
-
197
-
Advance to the next article to learn how to query single CSV file.
198
-
> [!div class="nextstepaction"]
199
-
> [Query single CSV file](sql/query-single-csv-file.md)
195
+
-[Query single CSV file](sql/query-single-csv-file.md)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-synapse-studio.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,15 +51,15 @@ You can create new folders and upload files using the links in toolbar to organi
51
51
52
52

53
53
54
-
3. Run the generated query or notebook to see the content of the file:
54
+
3. Run the generated query or notebook to see the content of the file.
55
55
56
56

57
57
58
58
4. You can change the query to filter and sort results. Find language features that are available in SQL on-demand in [SQL features overview](sql/overview-features.md).
59
59
60
60
## Next steps
61
61
62
-
- Enable Azure AD users to query files [by assigning **Storage Blob Data Reader** or **Storage Blob Data Contributor** RBAC permissions on Azure Storage](../storage/common/storage-auth-aad-rbac-portal.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json#assign-a-built-in-rbac-role)
63
-
-[Query files on Azure Storage using SQL On-Demand](sql/on-demand-workspace-overview.md)
- Enable Azure AD users to query files by assigning [**Storage Blob Data Reader** or **Storage Blob Data Contributor** RBAC permissions on Azure Storage](../storage/common/storage-auth-aad-rbac-portal.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json#assign-a-built-in-rbac-role)
63
+
-[Query files on Azure Storage using SQL on-Demand](sql/on-demand-workspace-overview.md)
64
+
-[Create Apache Spark pool using the Azure portal](quickstart-create-apache-spark-pool.md)
65
65
-[Create Power BI report on files stored on Azure Storage](sql/tutorial-connect-power-bi-desktop.md)
0 commit comments