MicrosoftDocs
diff --git a/‎articles/synapse-analytics/get-started.md
Lines changed: 69 additions & 61 deletions b/‎articles/synapse-analytics/get-started.md
Lines changed: 69 additions & 61 deletions
@@ -35,7 +35,7 @@ This tutorial will guide you through all the basic steps needed to setup and use
     |**Data Lake Storage Gen2**|`Enabled`| Azure Synapse only works with storage accounts where this setting is enabled.|
     ||||
 
-* Once the storage account is created, make these role assignments or ensure they are already assigned. While in the storage account, select **Access control (IAM)** from the left navigation.
+* Once the storage account is created, select **Access control (IAM)** from the left navigation. Then assign the following roles or ensure they are already assigned. 
     * Assign yourself to the **Owner** role on the storage account
     * Assign yourself to the **Storage Blob Data Owner** role on the Storage Account
 * From the left navigation, select **Containers** and create a container. You can give it any name. Accept the default **Public access level**. In this document, we will call the container `users`. Select **Create**. 
@@ -56,7 +56,9 @@ This tutorial will guide you through all the basic steps needed to setup and use
 * Under **Select Data Lake Storage Gen 2** select the account and container you previously created
 
 > [!NOTE]
-> The storage account chosen here will be referred to as the "primary" storage account of the Synapse workspace
+> We refer to the storage account chosen hereas the "primary" storage account of the Synapse workspace. This account
+> Is used for storing data in Apache spark tables and for logs created when Spark pools are created or Spark applications
+> run.
 
 * Select **Review + create**. Select **Create**. Your workspace will be ready in a few minutes.
 
@@ -65,9 +67,9 @@ This tutorial will guide you through all the basic steps needed to setup and use
 This may have already been done for you. In any case, you should verify.
 
 * Open the [Azure portal](https://portal.azure.com) open the primary storage account chosen for your workspace.
-* Ensure that the following assignment exists or create it if it doesn't
-    * Storage Blob Data Contributor role on the storage account to your workspace.
-    * To assign this role to the workspace select the Storage Blob Data Contributor role, leave the default **Assign access to** and in the **Select** box type the name of your workspace. Select **Save**.
+* Select **Access control (IAM)** from the left navigation. Then assign the following roles or ensure they are already assigned. 
+    * Assign the workspace identity to the **Storage Blob Data Contributor** role on the storage account. The workspace identity has the same name as the workspace. In this document, the workspace name is `myworkspace` so the workspace identity is `myworkspaced`
+* Select **Save**.
 
 ## Launch Synapse Studio
 
@@ -86,15 +88,15 @@ Once your Synapse workspace is created, you have two ways to open Synapse Studio
     |**SQL pool name**| `SQLDB1`|
     |**Performance level**|`DW100C`|
 * Select **Review+create** and then select **Create**.
-* Your pool will be ready in a few minutes.
+* Your SQL pool will be ready in a few minutes.
 
 > [!NOTE]
 > A Synapse SQL pool corresponds to what used to be called an "Azure SQL Data Warehouse"
 
 * A SQL pool consumes billable resources as long as it's running. So, you can pause the pool when needed to reduce costs.
 * When your SQL pool is created, it will be associated with a SQL pool database also called **SQLDB1**.
 
-## Create an Apache Spark pool for Azure Synapse Analytics
+## Create an Apache Spark pool
 
 * In Synapse Studio, on the left side select **Manage > Apache Spark pools**
 * Select **+New** and enter these settings:
@@ -117,12 +119,9 @@ Once your Synapse workspace is created, you have two ways to open Synapse Studio
 > [!NOTE]
 > Spark databases are independently created from Spark pools. A workspace always has a Spark DB called **default** and you can create additional Spark databases.
 
-## SQL on-demand pools
+## The SQL on-demand pool
 
-SQL on-demand is a special kind of SQL pool that is always available with a Synapse workspace. It allows you to work with SQL without having to create or think about managing a Synapse SQL pool.
-
-> [!NOTE]
-> Unlike the other kinds of pools, billing for SQL on-demand is based on the amount of data scanned to run the query - and not the number of resources used to execute the query.
+Every workspace comes with a pre-built and undeleteable pool called **SQL on-demand**. The SQL on-demand pool allows you to work with SQL without having to create or think about managing a Synapse SQL pool. Unlike the other kinds of pools, billing for SQL on-demand is based on the amount of data scanned to run the query - and not the number of resources used to execute the query.
 
 * SQL on-demand also has its own kind of SQL on-demand databases that exist independently from any SQL on-demand pool.
 * Currently a workspace always has exactly one SQL on-demand pool named **SQL on-demand**.
@@ -157,11 +156,11 @@ SQL on-demand is a special kind of SQL pool that is always available with a Syna
 * This query shows how the total trip distances and average trip distance relate to the number of passengers
 * In the SQL script result window change the **View** to **Chart** to see a visualization of the results as a line chart
 
-## Create a Spark database and load the NYC taxi data into it
+## Load the NYC Taxi Sample data into the Spark nyctaxi database
 
-We have data available in a SQL pool database. Now we load it into a Spark database.
+We have data available in a table in `SQLDB1`. Now we load it into a Spark database named 'nyctaxi`.
 
-* In Synapse Studio, navigate to the **Develop hub"
+* In Synapse Studio, navigate to the **Develop** hub
 * Select **+** and select **Notebook**
 * At the top of the notebook, set the **Attach to** value to `Spark1`
 * Select **Add code** to add a notebook code cell and paste the text below:
@@ -173,23 +172,24 @@ We have data available in a SQL pool database. Now we load it into a Spark datab
     df.write.mode("overwrite").saveAsTable("nyctaxi.trip")
     ```
 
- * Navigate to the Data hub, right-click on databases and select **Refresh**
+ * Navigate to the **Data** hub, right-click on **Databases** and select **Refresh**
  * Now you should see these databases:
      * SQLDB (SQL pool)
      * nyctaxi (Spark)
       
  ## Analyze the NYC Taxi data using Spark and notebooks
 
  * Return to your notebook
- * Create a new code cell, enter the text below, and run the cell
+ * Create a new code cell, enter the text below, and run the cell to example the NYC taxi data we loaded into the `nyctaxi` Spark DB.
 
    ```py
    %%pyspark
    df = spark.sql("SELECT * FROM nyctaxi.trip") 
    display(df)
    ```
 
- * Run this code to perform the same analysis we did earlier with the SQL pool
+ * Run the following code to perform the same analysis we did earlier with the SQL pool `SQLDB1`. This code 
+   also saves the results of the analysis into a table called `nyctaxi.passengercountstats` and visualizes the results.
 
    ```py
    %%pyspark
@@ -210,9 +210,9 @@ We have data available in a SQL pool database. Now we load it into a Spark datab
  
 ## Customize data visualization data with Spark and notebooks
 
-With spark notebooks you can control exactly how render charts. The following
-code shows a simple example using the popular libraries matplotlib and sea-born. It will 
-render the same chart you saw when running the SQL queries earlier.
+With notebooks you can control how render charts. The following
+code shows a simple example using the popular libraries `matplotlib` and `seaborn`. It will 
+render the same kind od line chart you saw when running the SQL queries earlier.
 
 ```py
 %%pyspark
@@ -229,39 +229,39 @@ matplotlib.pyplot.show()
     
 ## Load data from a Spark table into a SQL pool table
 
-Earlier we copied data from a SQL pool database into a Spark DB. Using
-Spark, we aggregated the data into the nyctaxi.passengercountstats. 
-Now run the cell below in a notebook and it will copy the aggregated table back into
-the SQL pool database.
+Earlier we copied data from a SQL pool table `SQLDB1.dbo.Trip` into a Spark table `nyctaxi.trip`. Then, using
+Spark, we aggregated the data into the the Spark table `nyctaxi.passengercountstats`. Now we will copy the data 
+from `nyctaxi.passengercountstats` into a SQL pool table called `SQLDB1.dbo.PassengerCountStats`. 
+
+Run  the cell below in your notebook. It will copy the aggregated Spark table back into
+the SQL pool table.
 
 ```scala
 %%spark
 val df = spark.sql("SELECT * FROM nyctaxi.passengercountstats")
 df.write.sqlanalytics("SQLDB1.dbo.PassengerCountStats", Constants.INTERNAL )
 ```
 
-## Analyze NYC taxi data in Spark databases using SQL-on demand 
+## Analyze NYC taxi data in Spark databases using SQL on-demand 
 
-* Tables in Spark databases are automatically visible and queryable by SQL on-demand
-* In Synapse Studio navigate to the Develop hub and create a new SQL script
+* Tables in Spark databases are automatically visible and queryable by SQL on-demand. 
+* In Synapse Studio, navigate to the **Develop** hub and create a new SQL script
 * Set **Connect to** to **SQL on-demand** 
-* Paste the following text into the script:
+* Paste the following text into the script and run the script.
 
     ```sql
     SELECT *
     FROM nyctaxi.dbo.passengercountstats
     ```
-
-* Select **Run**
-* NOTE: THe first time you run this it will take about 10 seconds for SQL on-demand to gather SQL resources needed to run your queries. Subsequent queries will not require this time.
+* NOTE: The first time you run a query that uses SQL on-deman, it will take about 10 seconds for SQL on-demand to gather SQL resources needed to run your queries. Subsequent queries will not require this time and be much faster.
   
-## Use pipelines to orchestrate activities
+## Orchestrate activities with pipelines
 
 You can orchestrate a wide variety of tasks in Azure Synapse. In this section, you'll see how easy it is.
 
-* In Synapse Studio, navigate to the Orchestrate hub.
+* In Synapse Studio, navigate to the **Orchestrate** hub.
 * Select **+** then select **Pipeline**. A new pipeline will be created.
-* Navigate to the Develop hub and find any of the notebooks you previously created.
+* Navigate to the Develop hub and find the notebook you previously created.
 * Drag that notebook into the pipeline.
 * In the pipeline select **Add trigger > New/edit**.
 * In **Choose trigger** select **New**, and then in recurrence set the trigger to run every 1 hour.
@@ -271,14 +271,14 @@ You can orchestrate a wide variety of tasks in Azure Synapse. In this section, y
 
 ## Working with data in a storage account
 
-So far, we've covered scenarios were data resided in databases. Now we'll show how Azure Synapse can analyze simple files in a storage account. In this scenario we'll use the storage account and container that we linked the workspace to.
+So far, we've covered scenarios were data resided in databases in the workspace. Now we'll show how to work with files in storage accounts. In this scenario, we'll use the primary storage account of the workspace and container we specified when creating the workspace.
 
-The name of the storage account: contosolake
-The name of the container in the storage account: users
+* The name of the storage account: `contosolake`
+* The name of the container in the storage account: `users`
 
 ### Creating CSV and Parquet files in your Storage account
 
-Run the the following code in a notebook. It creates a CSV and parquet data in the storage account
+Run the the following code in a notebook. It creates a CSV file and a parquet file in the storage account
 
 ```py
 %%pyspark
@@ -292,39 +292,47 @@ df.write.mode("overwrite").parquet("/NYCTaxi/PassengerCountStats.parquet")
 
 * In Synapse Studio, navigate to the **Data** hub
 * Select **Linked**
-* Navigate to **Storage accounts > workspacename (Primary - contosolake)**
+* Navigate to **Storage accounts > myworkspace (Primary - contosolake)**
 * Select **users (Primary)"**
-* You should see a folder called `NYCTaxi'. Inside you should see two folders 'PassengerCountStats.csv' and 'PassengerCountStats.parquet'.
+* You should see a folder called `NYCTaxi' and inside . Inside you should see two folders 'PassengerCountStats.csv' and 'PassengerCountStats.parquet'.
 * Navigate into the `PassengerCountStats.parquet' folder.
-* Right-click on the parquet file inside, and select new notebook, it will create a notebook with a cell like this:
+* Right-click on the parquet file inside, and select **new notebook**, it will create a notebook with a cell like this:
 
     ```py
     %%pyspark
     data_path = spark.read.load('abfss://[email protected]/NYCTaxi/PassengerCountStats.parquet/part-00000-1f251a58-d8ac-4972-9215-8d528d490690-c000.snappy.parquet', format='parquet')
     data_path.show(100)
     ```
 
-* Run the cell to analyze the parquet file with spark.
-* Right-click on the parquet file inside, and select New **SQL script > SELECT TOP 100 rows**, it will create a notebook with a cell like this:
+* Run the cell.
+* Right-click on the parquet file inside, and select **New SQL script > SELECT TOP 100 rows**, it will create a SQL script like this:
 
-    ```py
+    ```sql
     SELECT TOP 100 *
     FROM OPENROWSET(
         BULK 'https://contosolake.dfs.core.windows.net/users/NYCTaxi/PassengerCountStats.parquet/part-00000-1f251a58-d8ac-4972-9215-8d528d490690-c000.snappy.parquet',
         FORMAT='PARQUET'
     ) AS [r];
     ```
 
-* The script will be attached to **SQL on-demand** run the script. Notice that it infers the schema from the parquet file.
+* In the script the **Attach to** field will be set to **SQL on-demand**.
+* Run the script.
 
 ## Visualize data with Power BI
 
-Your data can now be easily analyzed and visualized in Power BI. Synapse offers a unique integration which allows you to link a Power BI workspace to you Synapse workspace. Before starting, first follow the steps in this [quickstart](quickstart-power-bi.md) to link your Power BI workspace.
+From the NYX taxi data, we created arregated datasets in two tables:
+* `nyctaxi.passengercountstats`
+* `SQLDB1.dbo.PassengerCountStats`
+
+You can link a Power BI workspace to you Synapse workspace. This allows you to easily get data into your PowerBI worksapce and you can edit your PowerBI reports directly in your Synapse workspace.
 
-### Create a Power BI Workspace and link it to your Synapse Workspace
+### Create a Power BI Workspace
 
 * Log into [powerbi.microsoft.com](https://powerbi.microsoft.com/).
 * Create a new Power BI workspace called `NYCTaxiWorkspace1`.
+
+### Link your Synapse Workspace to your new PowerBI workspace
+
 * In Synapse Studio, navigate to the **Manage > Linked Services**.
 * Select **+ New** and select **Connect to Power BI** and set these fields:
 
@@ -340,8 +348,9 @@ Your data can now be easily analyzed and visualized in Power BI. Synapse offers
 
 * In Synapse Studio, navigate to the **Develop > Power BI**.
 * Navigate to **NYCTaxiWorkspace1 > Power BI datasets** and select **New Power BI dataset**.
-* Hover over the SQLDB1 database and select **Download .pbids file**.
-* Open the downloaded `.pbids` file. This will launch Power BI desktop and automatically connect it to SQLDB1 in your synapse workspace.
+* Hover over the `SQLDB1` database and select **Download .pbids file**.
+* Open the downloaded `.pbids` file. 
+* This will launch Power BI desktop and automatically connect it to `SQLDB1` in your synapse workspace.
 * If you see a dialog appear called **SQL server database**:
     * Select **Microsoft account**. 
     * Select **Sign in** and log in.
@@ -361,22 +370,21 @@ Your data can now be easily analyzed and visualized in Power BI. Synapse offers
 ### Configure authentication for your dataset
 
 * Open [powerbi.microsoft.com](https://powerbi.microsoft.com/) and **Sign in**
-* At the left, under **Workspaces** select the the `NYCTaxiWorkspace1` workspace that you published to.
+* At the left, under **Workspaces** select the the `NYCTaxiWorkspace1` workspace.
 * Inside that workspace you should see a dataset called `Passenger Analysis` and a report called `Passenger Analysis`.
 * Hover over the `PassengerAnalysis` dataset and select the icon with the three dots and select **Settings**.
-* In **Data source credentials** set the Authentication method to **OAuth2** and select **Sign in**.
+* In **Data source credentials** set the **Authentication method** to **OAuth2** and select **Sign in**.
 
 ### Edit a report report in Synapse Studio
 
-* Go back to Synapse Studio and select **Close and refresh** now you should see:
-    * Under **Power BI datasets**, a new dataset called **PassengerAnalysis**.
-    * Under **Power BI datasets**, a new report called **PassengerAnalysis**.
-* CLick on the **PassengerAnalysis** report. 
-    * It won't show anything because you still need to configure authentication for the dataset.
-* In SynapseStudio, navigate to **Develop > Power BI > Your workspace name > Power BI reports**.
-* Close any windows showing the Power BI report.
-* Refresh the **Power BI reports** node.
-* Select the report and now you can edit the report directly within Synapse Studio.
+* Go back to Synapse Studio and select **Close and refresh** 
+* Navigate to the **Devlop** hub 
+* Hover over **power BI** and click on the three Refresh the **Power BI reports** node.
+* Now under the **Power BI** you should see:
+    * Under **NYCTaxiWorkspace1 > Power BI datasets**, a new dataset called **PassengerAnalysis**.
+    * Under **NYCTaxiWorkspace1 > Power BI reports**, a new report called **PassengerAnalysis**.
+* Click on the **PassengerAnalysis** report. 
+* The report will open and now you can edit the report directly within Synapse Studio.
 
 ## Monitor activities