Skip to content

Commit 0663fd2

Browse files
authored
Merge pull request #113531 from julieMSFT/20200430_current-to-master
20200430 current to master
2 parents 103e831 + 371c35b commit 0663fd2

12 files changed

+118
-51
lines changed
-11.3 KB
Loading
-20.9 KB
Loading

articles/synapse-analytics/quickstart-apache-spark-notebook.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@ If you don't have an Azure subscription, [create a free account before you begin
2828

2929
## Sign in to the Azure portal
3030

31-
Sign in to the [Azure portal](https:/portal.azure.com/)
31+
Sign in to the [Azure portal](https:/portal.azure.com/).
3232

3333
If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free/) before you begin.
3434

3535
## Create a notebook
3636

37-
A notebook is an interactive environment that supports various programming languages. The notebook allows you to interact with your data, combine code with markdown, text and perform simple visualizations.
37+
A notebook is an interactive environment that supports various programming languages. The notebook allows you to interact with your data, combine code with markdown, text, and perform simple visualizations.
3838

3939
1. From the Azure portal view for the Azure Synapse workspace you want to use, select **Launch Synapse Studio**.
4040
2. Once Synapse Studio has launched, select **Develop**. Then, hover over the **Notebooks** entry. Select the ellipsis (**...**).
@@ -43,7 +43,7 @@ A notebook is an interactive environment that supports various programming langu
4343

4444
4. In the **Properties** window, provide a name for the notebook.
4545
5. On the toolbar, click **Publish**.
46-
6. If there is only one Apache Spark pool in your workspace, then it is selected by default. Use the drop-down to select the correct Apache Spark pool if none is selected.
46+
6. If there is only one Apache Spark pool in your workspace, then it's selected by default. Use the drop-down to select the correct Apache Spark pool if none is selected.
4747
7. Click **Add code**. The default language is `Pyspark`. You are going to use a mix of Pyspark and Spark SQL, so the default choice is fine.
4848
8. Next you create a simple Spark DataFrame object to manipulate. In this case, you create it from code. There are three rows and three columns:
4949

@@ -61,7 +61,7 @@ A notebook is an interactive environment that supports various programming langu
6161

6262
![Create data frame object](./media/quickstart-apache-spark-notebook/spark-get-started-create-data-frame-object.png "Output from the Spark job ")
6363

64-
10. If the Apache Spark pool instance is not already running, it is automatically started. You can see the status of the Apache Spark pool instance below the cell you are running and also on the status panel at the bottom of the notebook. Depending on the size of pool, starting should take 2-5 minutes. Once the code has finished running, information below the cell displays showing how long it took to run and its execution. In the output cell, you see the output.
64+
10. If the Apache Spark pool instance isn't already running, it is automatically started. You can see the Apache Spark pool instance status below the cell you are running and also on the status panel at the bottom of the notebook. Depending on the size of pool, starting should take 2-5 minutes. Once the code has finished running, information below the cell displays showing how long it took to run and its execution. In the output cell, you see the output.
6565

6666
![Output from executing a cell](./media/quickstart-apache-spark-notebook/run-cell-with-output.png "Output from the Spark job ")
6767

@@ -74,7 +74,7 @@ A notebook is an interactive environment that supports various programming langu
7474
demo_df.write.parquet('abfss://<<TheNameOfAStorageAccountFileSystem>>@<<TheNameOfAStorageAccount>>.dfs.core.windows.net/demodata/demo_df', mode='overwrite')
7575
```
7676

77-
If you use the storage explorer, it is possible to see the impact of the two different ways of writing a file used above. When no file system is specified then the default is used, in this case `default>user>trusted-service-user>demo_df`. The data is saved to the location of the specified file system.
77+
If you use the storage explorer, it's possible to see the impact of the two different ways of writing a file used above. When no file system is specified then the default is used, in this case `default>user>trusted-service-user>demo_df`. The data is saved to the location of the specified file system.
7878

7979
Notice in both the "csv" and "parquet" formats, write operations a directory is created with many partitioned files.
8080

@@ -84,7 +84,7 @@ A notebook is an interactive environment that supports various programming langu
8484

8585
## Run Spark SQL statements
8686

87-
SQL (Structured Query Language) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
87+
Structured Query Language (SQL) is the most common and widely used language for querying and defining data. Spark SQL functions as an extension to Apache Spark for processing structured data, using the familiar SQL syntax.
8888

8989
1. Paste the following code in an empty cell, and then run the code. The command lists the tables on the pool.
9090

@@ -104,11 +104,11 @@ SQL (Structured Query Language) is the most common and widely used language for
104104

105105
The code produces two output cells, one that contains data results the other, which shows the job view.
106106

107-
By default the results view shows a grid, but there is a view switcher underneath the grid that allows the view to switch between grid and graph views.
107+
By default the results view shows a grid. But, there is a view switcher underneath the grid that allows the view to switch between grid and graph views.
108108

109109
![Query output in Azure Synapse Spark](./media/quickstart-apache-spark-notebook/spark-get-started-query.png "Query output in Azure Synapse Spark")
110110

111-
3. In the **View** switcher, select **Chart**
111+
3. In the **View** switcher, select **Chart**.
112112
4. Select the **View options** icon from the far right-hand side.
113113
5. In the **Chart type** field, select "bar chart".
114114
6. In the X-axis column field, select "state".
@@ -128,13 +128,13 @@ SQL (Structured Query Language) is the most common and widely used language for
128128

129129
## Clean up resources
130130

131-
Azure Synapse saves your data in Azure Data Lake Storage. You can safely let a Spark instance shut down when it is not in use. You are charged for an Azure Synapse Apache Spark pool as long as it is running, even when it is not in use. Since the charges for the pool are many times more than the charges for storage, it makes economic sense to let Spark instances shut down when they are not in use.
131+
Azure Synapse saves your data in Azure Data Lake Storage. You can safely allow a Spark instance to shut down when it's not in use. You are charged for an Azure Synapse Apache Spark pool as long as it's running, even when it's not in use. The charges for the pool are many times more than the charges for storage. As such, it makes economic sense to let Spark instances shut down when they are not in use.
132132

133133
To ensure the Spark instance is shut down, end any connected sessions(notebooks). The pool shuts down when the **idle time** specified in the Apache Spark pool is reached. You can also select **end session** from the status bar at the bottom of the notebook.
134134

135135
## Next steps
136136

137-
In this quickstart, you learned how to create a Azure Synapse Apache Spark pool and run a basic Spark SQL query.
137+
In this quickstart, you learned how to create an Azure Synapse Apache Spark pool and run a basic Spark SQL query.
138138

139139
- [Azure Synapse Analytics](overview-what-is.md)
140140
- [.NET for Apache Spark documentation](/dotnet/spark?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json)

articles/synapse-analytics/quickstart-sql-on-demand.md

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Using SQL on-demand (preview)
3-
description: In this quickstart, you will see and learn how easy is to query various types of files using SQL on-demand (preview).
3+
description: In this quickstart, you'll see and learn how easy is to query various types of files using SQL on-demand (preview).
44
services: synapse-analytics
55
author: azaricstefan
66
ms.service: synapse-analytics
@@ -13,15 +13,15 @@ ms.reviewer: jrasnick
1313

1414
# Quickstart: Using SQL on-demand
1515

16-
Synapse SQL on-demand (preview) is a serverless query service that enables you to run the SQL queries on your files placed in Azure Storage. In this quickstart, you will learn how to query various types of files using SQL on-demand.
16+
Synapse SQL on-demand (preview) is a serverless query service that enables you to run SQL queries on files placed in Azure Storage. In this quickstart, you'll learn how to query various types of files using SQL on-demand.
1717

1818
The following file types are supported: JSON, CSV, Apache Parquet
1919

2020
## Prerequisites
2121

2222
Choose a SQL client to issue queries:
2323

24-
- [Azure Synapse Studio](quickstart-synapse-studio.md) is a web tool that you can use to browse files in storage and create SQL query.
24+
- [Azure Synapse Studio](quickstart-synapse-studio.md) is a web tool that you can use to browse files in storage and create SQL queries.
2525
- [Azure Data Studio](sql/get-started-azure-data-studio.md) is a client tool that enables you to run SQL queries and notebooks on your On-demand database.
2626
- [SQL Server Management Studio](sql/get-started-ssms.md) is a client tool that enables you to run SQL queries on your On-demand database.
2727

@@ -36,19 +36,18 @@ Parameters for quickstart:
3636

3737
## First-time setup
3838

39-
Prior to using samples:
39+
Before using the samples:
4040

4141
- Create database for your views (in case you want to use views)
4242
- Create credentials to be used by SQL on-demand to access files in storage
4343

4444
### Create database
4545

46-
Create your own database for demo purposes. This is the database in which you create your views. Use this database in the sample queries in this article.
46+
Create your own database for demo purposes. You'll use this database to create your views and for the sample queries in this article.
4747

4848
> [!NOTE]
4949
> The databases are used only for view metadata, not for actual data.
50-
>
51-
> Write down database name you use for use later in the Quickstart.
50+
>Write down database name you use for use later in the Quickstart.
5251
5352
Use the following query, changing `mydbname` to a name of your choice:
5453

@@ -61,15 +60,15 @@ CREATE DATABASE mydbname
6160
To run queries using SQL on-demand, create credentials for SQL on-demand to use to access files in storage.
6261

6362
> [!NOTE]
64-
> In order to successfully run samples in this section you have to use SAS token.
63+
> In order to successfully run samples in this section you have to use an SAS token.
6564
>
6665
> To start using SAS tokens you have to drop the UserIdentity which is explained in the following [article](sql/develop-storage-files-storage-access-control.md#disable-forcing-azure-ad-pass-through).
6766
>
6867
> SQL on-demand by default always uses AAD pass-through.
6968
70-
For more information on how to manage storage access control, check this [link](sql/develop-storage-files-storage-access-control.md).
69+
For more information on how to manage storage access control, see the[Control storage account access for SQL on-demand ](sql/develop-storage-files-storage-access-control.md) article.
7170

72-
Execute following code snippet to create credential used in samples in this section:
71+
Execute the following code snippet to create credentials used in samples in this section:
7372

7473
```sql
7574
-- create credentials for containers in our demo storage account
@@ -91,7 +90,7 @@ The following image is a preview of the file to be queried:
9190

9291
![First 10 rows of the CSV file without header, Windows style new line.](./sql/media/query-single-csv-file/population.png)
9392

94-
The following query shows how to read a CSV file that does not contain a header row, with Windows-style new line, and comma-delimited columns:
93+
The following query shows how to read a CSV file that doesn't contain a header row, with Windows-style new line, and comma-delimited columns:
9594

9695
```sql
9796
SELECT TOP 10 *
@@ -119,7 +118,7 @@ For more examples, see how to [query CSV file](sql/query-single-csv-file.md).
119118
The following sample shows the automatic schema inference capabilities for querying Parquet files. It returns the number of rows in September of 2017 without specifying schema.
120119

121120
> [!NOTE]
122-
> You do not have to specify columns in `OPENROWSET WITH` clause when reading Parquet files. In that case, SQL on-demand utilizes metadata in the Parquet file and bind columns by name.
121+
> You do not have to specify columns in `OPENROWSET WITH` clause when reading Parquet files. In that case, SQL on-demand utilizes metadata in the Parquet file and binds columns by name.
123122
124123
```sql
125124
SELECT COUNT_BIG(*)
@@ -130,7 +129,7 @@ FROM OPENROWSET
130129
) AS nyc
131130
```
132131

133-
Find more information about [querying parquet files](sql/query-parquet-files.md)].
132+
Find more information about [querying parquet files](sql/query-parquet-files.md).
134133

135134
## Querying JSON files
136135

@@ -156,7 +155,7 @@ Files are stored in *json* container, folder *books*, and contain single book en
156155

157156
### Querying JSON files
158157

159-
Following query shows how to use [JSON_VALUE](/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest) to retrieve scalar values (title, publisher) from a book with the title *Probabilistic and Statistical Methods in Cryptology, An Introduction by Selected articles*:
158+
The following query shows how to use [JSON_VALUE](/sql/t-sql/functions/json-value-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest) to retrieve scalar values (title, publisher) from a book with the title *Probabilistic and Statistical Methods in Cryptology, An Introduction by Selected articles*:
160159

161160
```sql
162161
SELECT
@@ -178,11 +177,11 @@ WHERE
178177
```
179178

180179
> [!IMPORTANT]
181-
> We are reading the entire JSON file as single row/column so FIELDTERMINATOR, FIELDQUOTE, and ROWTERMINATOR are set to 0x0b because we do not expect to find it in the file.
180+
> We are reading the entire JSON file as single row/column. So, FIELDTERMINATOR, FIELDQUOTE, and ROWTERMINATOR are set to 0x0b because we do not expect to find it in the file.
182181
183182
## Next steps
184183

185-
Now you are ready to start with following Quickstart articles:
184+
You're now ready to continue on with the following articles:
186185

187186
- [Query single CSV file](sql/query-single-csv-file.md)
188187
- [Query folders and multiple CSV files](sql/query-folders-multiple-csv-files.md)
@@ -193,7 +192,4 @@ Now you are ready to start with following Quickstart articles:
193192
- [Creating and using views](sql/create-use-views.md)
194193
- [Creating and using external tables](sql/create-use-external-tables.md)
195194
- [Persist query result to Azure storage](sql/create-external-table-as-select.md)
196-
197-
Advance to the next article to learn how to query single CSV file.
198-
> [!div class="nextstepaction"]
199-
> [Query single CSV file](sql/query-single-csv-file.md)
195+
- [Query single CSV file](sql/query-single-csv-file.md)

articles/synapse-analytics/quickstart-synapse-studio.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,15 @@ You can create new folders and upload files using the links in toolbar to organi
5151

5252
![Query files on storage](./media/quickstart-synapse-studio/query-files-on-storage.png)
5353

54-
3. Run the generated query or notebook to see the content of the file:
54+
3. Run the generated query or notebook to see the content of the file.
5555

5656
![See the content of file](./media/quickstart-synapse-studio/query-files-on-storage-result.png)
5757

5858
4. You can change the query to filter and sort results. Find language features that are available in SQL on-demand in [SQL features overview](sql/overview-features.md).
5959

6060
## Next steps
6161

62-
- Enable Azure AD users to query files [by assigning **Storage Blob Data Reader** or **Storage Blob Data Contributor** RBAC permissions on Azure Storage](../storage/common/storage-auth-aad-rbac-portal.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json#assign-a-built-in-rbac-role)
63-
- [Query files on Azure Storage using SQL On-Demand](sql/on-demand-workspace-overview.md)
64-
- [Create Apache Spark pool](quickstart-create-apache-spark-pool.md)
62+
- Enable Azure AD users to query files by assigning [**Storage Blob Data Reader** or **Storage Blob Data Contributor** RBAC permissions on Azure Storage](../storage/common/storage-auth-aad-rbac-portal.md?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json#assign-a-built-in-rbac-role)
63+
- [Query files on Azure Storage using SQL on-Demand](sql/on-demand-workspace-overview.md)
64+
- [Create Apache Spark pool using the Azure portal](quickstart-create-apache-spark-pool.md)
6565
- [Create Power BI report on files stored on Azure Storage](sql/tutorial-connect-power-bi-desktop.md)

0 commit comments

Comments
 (0)