You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/get-started.md
+69-61Lines changed: 69 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ This tutorial will guide you through all the basic steps needed to setup and use
35
35
|**Data Lake Storage Gen2**|`Enabled`| Azure Synapse only works with storage accounts where this setting is enabled.|
36
36
||||
37
37
38
-
* Once the storage account is created, make these role assignments or ensure they are already assigned. While in the storage account, select **Access control (IAM)** from the left navigation.
38
+
* Once the storage account is created, select **Access control (IAM)** from the left navigation. Then assign the following roles or ensure they are already assigned.
39
39
* Assign yourself to the **Owner** role on the storage account
40
40
* Assign yourself to the **Storage Blob Data Owner** role on the Storage Account
41
41
* From the left navigation, select **Containers** and create a container. You can give it any name. Accept the default **Public access level**. In this document, we will call the container `users`. Select **Create**.
@@ -56,7 +56,9 @@ This tutorial will guide you through all the basic steps needed to setup and use
56
56
* Under **Select Data Lake Storage Gen 2** select the account and container you previously created
57
57
58
58
> [!NOTE]
59
-
> The storage account chosen here will be referred to as the "primary" storage account of the Synapse workspace
59
+
> We refer to the storage account chosen hereas the "primary" storage account of the Synapse workspace. This account
60
+
> Is used for storing data in Apache spark tables and for logs created when Spark pools are created or Spark applications
61
+
> run.
60
62
61
63
* Select **Review + create**. Select **Create**. Your workspace will be ready in a few minutes.
62
64
@@ -65,9 +67,9 @@ This tutorial will guide you through all the basic steps needed to setup and use
65
67
This may have already been done for you. In any case, you should verify.
66
68
67
69
* Open the [Azure portal](https://portal.azure.com) open the primary storage account chosen for your workspace.
68
-
*Ensure that the following assignment exists or create it if it doesn't
69
-
* Storage Blob Data Contributor role on the storage account to your workspace.
70
-
* To assign this role to the workspace select the Storage Blob Data Contributor role, leave the default **Assign access to** and in the **Select** box type the name of your workspace. Select **Save**.
70
+
*Select **Access control (IAM)** from the left navigation. Then assign the following roles or ensure they are already assigned.
71
+
*Assign the workspace identity to the **Storage Blob Data Contributor** role on the storage account. The workspace identity has the same name as the workspace. In this document, the workspace name is `myworkspace` so the workspace identity is `myworkspaced`
72
+
* Select **Save**.
71
73
72
74
## Launch Synapse Studio
73
75
@@ -86,15 +88,15 @@ Once your Synapse workspace is created, you have two ways to open Synapse Studio
86
88
|**SQL pool name**|`SQLDB1`|
87
89
|**Performance level**|`DW100C`|
88
90
* Select **Review+create** and then select **Create**.
89
-
* Your pool will be ready in a few minutes.
91
+
* Your SQL pool will be ready in a few minutes.
90
92
91
93
> [!NOTE]
92
94
> A Synapse SQL pool corresponds to what used to be called an "Azure SQL Data Warehouse"
93
95
94
96
* A SQL pool consumes billable resources as long as it's running. So, you can pause the pool when needed to reduce costs.
95
97
* When your SQL pool is created, it will be associated with a SQL pool database also called **SQLDB1**.
96
98
97
-
## Create an Apache Spark pool for Azure Synapse Analytics
99
+
## Create an Apache Spark pool
98
100
99
101
* In Synapse Studio, on the left side select **Manage > Apache Spark pools**
100
102
* Select **+New** and enter these settings:
@@ -117,12 +119,9 @@ Once your Synapse workspace is created, you have two ways to open Synapse Studio
117
119
> [!NOTE]
118
120
> Spark databases are independently created from Spark pools. A workspace always has a Spark DB called **default** and you can create additional Spark databases.
119
121
120
-
## SQL on-demand pools
122
+
## The SQL on-demand pool
121
123
122
-
SQL on-demand is a special kind of SQL pool that is always available with a Synapse workspace. It allows you to work with SQL without having to create or think about managing a Synapse SQL pool.
123
-
124
-
> [!NOTE]
125
-
> Unlike the other kinds of pools, billing for SQL on-demand is based on the amount of data scanned to run the query - and not the number of resources used to execute the query.
124
+
Every workspace comes with a pre-built and undeleteable pool called **SQL on-demand**. The SQL on-demand pool allows you to work with SQL without having to create or think about managing a Synapse SQL pool. Unlike the other kinds of pools, billing for SQL on-demand is based on the amount of data scanned to run the query - and not the number of resources used to execute the query.
126
125
127
126
* SQL on-demand also has its own kind of SQL on-demand databases that exist independently from any SQL on-demand pool.
128
127
* Currently a workspace always has exactly one SQL on-demand pool named **SQL on-demand**.
@@ -157,11 +156,11 @@ SQL on-demand is a special kind of SQL pool that is always available with a Syna
157
156
* This query shows how the total trip distances and average trip distance relate to the number of passengers
158
157
*In the SQL script result window change the **View** to **Chart** to see a visualization of the results as a line chart
159
158
160
-
## Create a Spark database and load the NYC taxi data into it
159
+
## Load the NYC Taxi Sample data into the Spark nyctaxi database
161
160
162
-
We have data available in a SQL pool database. Now we load it into a Spark database.
161
+
We have data available in a table in`SQLDB1`. Now we load it into a Spark database named 'nyctaxi`.
163
162
164
-
*In Synapse Studio, navigate to the **Develop hub"
163
+
* In Synapse Studio, navigate to the **Develop** hub
165
164
* Select **+** and select **Notebook**
166
165
* At the top of the notebook, set the **Attach to** value to `Spark1`
167
166
* Select **Add code** to add a notebook code cell and paste the text below:
@@ -173,23 +172,24 @@ We have data available in a SQL pool database. Now we load it into a Spark datab
## Analyze NYC taxi data in Spark databases using SQL-on demand
245
+
## Analyze NYC taxi data in Spark databases using SQL on-demand
244
246
245
-
* Tables in Spark databases are automatically visible and queryable by SQL on-demand
246
-
* In Synapse Studio navigate to the Develop hub and create a new SQL script
247
+
* Tables in Spark databases are automatically visible and queryable by SQL on-demand.
248
+
* In Synapse Studio, navigate to the **Develop** hub and create a new SQL script
247
249
* Set **Connect to** to **SQL on-demand**
248
-
* Paste the following text into the script:
250
+
* Paste the following text into the script and run the script.
249
251
250
252
```sql
251
253
SELECT *
252
254
FROM nyctaxi.dbo.passengercountstats
253
255
```
254
-
255
-
* Select **Run**
256
-
* NOTE: THe first time you run this it will take about 10 seconds for SQL on-demand to gather SQL resources needed to run your queries. Subsequent queries will not require this time.
256
+
* NOTE: The first time you run a query that uses SQL on-deman, it will take about 10 seconds for SQL on-demand to gather SQL resources needed to run your queries. Subsequent queries will not require this time and be much faster.
257
257
258
-
## Use pipelines to orchestrate activities
258
+
## Orchestrate activities with pipelines
259
259
260
260
You can orchestrate a wide variety of tasks in Azure Synapse. In this section, you'll see how easy it is.
261
261
262
-
* In Synapse Studio, navigate to the Orchestrate hub.
262
+
*In Synapse Studio, navigate to the **Orchestrate** hub.
263
263
*Select**+** then select**Pipeline**. A new pipeline will be created.
264
-
* Navigate to the Develop hub and find any of the notebooks you previously created.
264
+
* Navigate to the Develop hub and find the notebook you previously created.
265
265
* Drag that notebook into the pipeline.
266
266
*In the pipeline select**Add trigger > New/edit**.
267
267
*In**Choose trigger**select**New**, and then in recurrence set the trigger to run every 1 hour.
@@ -271,14 +271,14 @@ You can orchestrate a wide variety of tasks in Azure Synapse. In this section, y
271
271
272
272
## Working with data in a storage account
273
273
274
-
So far, we've covered scenarios were data resided in databases. Now we'll show how Azure Synapse can analyze simple files in a storage account. In this scenario we'll use the storage account and container that we linked the workspace to.
274
+
So far, we've covered scenarios were data resided in databases in the workspace. Now we'll show how to work with files in storage accounts. In this scenario, we'll use the primary storage account of the workspace and container we specified when creating the workspace.
275
275
276
-
The name of the storage account: contosolake
277
-
The name of the container in the storage account: users
276
+
* The name of the storage account: `contosolake`
277
+
* The name of the container in the storage account: `users`
278
278
279
279
### Creating CSV and Parquet files in your Storage account
280
280
281
-
Run the the following code in a notebook. It creates a CSV and parquet data in the storage account
281
+
Run the the following code in a notebook. It creates a CSV file and a parquet file in the storage account
* The script will be attached to **SQL on-demand** run the script. Notice that it infers the schema from the parquet file.
318
+
* In the script the **Attach to** field will be set to **SQL on-demand**.
319
+
* Run the script.
319
320
320
321
## Visualize data with Power BI
321
322
322
-
Your data can now be easily analyzed and visualized in Power BI. Synapse offers a unique integration which allows you to link a Power BI workspace to you Synapse workspace. Before starting, first follow the steps in this [quickstart](quickstart-power-bi.md) to link your Power BI workspace.
323
+
From the NYX taxi data, we created arregated datasets in two tables:
324
+
*`nyctaxi.passengercountstats`
325
+
*`SQLDB1.dbo.PassengerCountStats`
326
+
327
+
You can link a Power BI workspace to you Synapse workspace. This allows you to easily get data into your PowerBI worksapce and you can edit your PowerBI reports directly in your Synapse workspace.
323
328
324
-
### Create a Power BI Workspace and link it to your Synapse Workspace
329
+
### Create a Power BI Workspace
325
330
326
331
* Log into [powerbi.microsoft.com](https://powerbi.microsoft.com/).
327
332
* Create a new Power BI workspace called `NYCTaxiWorkspace1`.
333
+
334
+
### Link your Synapse Workspace to your new PowerBI workspace
335
+
328
336
* In Synapse Studio, navigate to the **Manage > Linked Services**.
329
337
* Select **+ New**and select **Connect to Power BI**andset these fields:
330
338
@@ -340,8 +348,9 @@ Your data can now be easily analyzed and visualized in Power BI. Synapse offers
340
348
341
349
* In Synapse Studio, navigate to the **Develop > Power BI**.
342
350
* Navigate to **NYCTaxiWorkspace1 > Power BI datasets**and select **New Power BI dataset**.
343
-
* Hover over the SQLDB1 database and select **Download .pbids file**.
344
-
* Open the downloaded `.pbids` file. This will launch Power BI desktop and automatically connect it to SQLDB1 in your synapse workspace.
351
+
* Hover over the `SQLDB1` database and select **Download .pbids file**.
352
+
* Open the downloaded `.pbids`file.
353
+
* This will launch Power BI desktop and automatically connect it to `SQLDB1`in your synapse workspace.
345
354
* If you see a dialog appear called **SQL server database**:
346
355
* Select **Microsoft account**.
347
356
* Select **Sign in**and log in.
@@ -361,22 +370,21 @@ Your data can now be easily analyzed and visualized in Power BI. Synapse offers
361
370
### Configure authentication for your dataset
362
371
363
372
* Open [powerbi.microsoft.com](https://powerbi.microsoft.com/) and**Sign in**
364
-
* At the left, under **Workspaces** select the the `NYCTaxiWorkspace1` workspace that you published to.
373
+
* At the left, under **Workspaces** select the the `NYCTaxiWorkspace1` workspace.
365
374
* Inside that workspace you should see a dataset called `Passenger Analysis`and a report called `Passenger Analysis`.
366
375
* Hover over the `PassengerAnalysis` dataset and select the icon with the three dots and select **Settings**.
367
-
* In **Data source credentials** set the Authentication method to **OAuth2** and select **Sign in**.
376
+
* In **Data source credentials**set the **Authentication method** to **OAuth2**and select **Sign in**.
368
377
369
378
### Edit a report report in Synapse Studio
370
379
371
-
* Go back to Synapse Studio and select **Close and refresh** now you should see:
372
-
* Under **Power BI datasets**, a new dataset called **PassengerAnalysis**.
373
-
* Under **Power BI datasets**, a new report called **PassengerAnalysis**.
374
-
* CLick on the **PassengerAnalysis** report.
375
-
* It won't show anything because you still need to configure authentication for the dataset.
376
-
*In SynapseStudio, navigate to **Develop > Power BI > Your workspace name > Power BI reports**.
377
-
* Close any windows showing the Power BI report.
378
-
* Refresh the **Power BI reports** node.
379
-
*Select the report and now you can edit the report directly within Synapse Studio.
380
+
* Go back to Synapse Studio and select **Close and refresh**
381
+
* Navigate to the **Devlop** hub
382
+
* Hover over **power BI**and click on the three Refresh the **Power BI reports** node.
383
+
* Now under the **Power BI** you should see:
384
+
* Under **NYCTaxiWorkspace1 > Power BI datasets**, a new dataset called **PassengerAnalysis**.
385
+
* Under **NYCTaxiWorkspace1 > Power BI reports**, a new report called **PassengerAnalysis**.
386
+
* Click on the **PassengerAnalysis** report.
387
+
* The report will openand now you can edit the report directly within Synapse Studio.
0 commit comments