Skip to content

Commit dd69aed

Browse files
authored
Merge pull request #18744 from changeworld/fix/typo2
Fix typo
2 parents 619bd64 + 3047fb3 commit dd69aed

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

articles/azure-databricks/databricks-extract-load-sql-data-warehouse.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ ms.date: 07/26/2018
1313
---
1414
# Tutorial: Extract, transform, and load data using Azure Databricks
1515

16-
In this tutorial, you perform an ETL (extract, transform, and load data) operation using Azure Databricks. You extract data from Azure Data Lake Store into Azure Databricks, run transformations on the data in Azure Databricks, and then load the transformed data into Azure SQL Data Warehouse.
16+
In this tutorial, you perform an ETL (extract, transform, and load data) operation using Azure Databricks. You extract data from Azure Data Lake Store into Azure Databricks, run transformations on the data in Azure Databricks, and then load the transformed data into Azure SQL Data Warehouse.
1717

1818
The steps in this tutorial use the SQL Data Warehouse connector for Azure Databricks to transfer data to Azure Databricks. This connector, in turn, uses Azure Blob Storage as temporary storage for the data being transferred between an Azure Databricks cluster and Azure SQL Data Warehouse.
1919

2020
The following illustration shows the application flow:
2121

2222
![Azure Databricks with Data Lake Store and SQL Data Warehouse](./media/databricks-extract-load-sql-data-warehouse/databricks-extract-transform-load-sql-datawarehouse.png "Azure Databricks with Data Lake Store and SQL Data Warehouse")
2323

24-
This tutorial covers the following tasks:
24+
This tutorial covers the following tasks:
2525

2626
> [!div class="checklist"]
2727
> * Create an Azure Databricks workspace
@@ -58,7 +58,7 @@ In this section, you create an Azure Databricks workspace using the Azure portal
5858

5959
![Create an Azure Databricks workspace](./media/databricks-extract-load-sql-data-warehouse/create-databricks-workspace.png "Create an Azure Databricks workspace")
6060

61-
Provide the following values:
61+
Provide the following values:
6262

6363
|Property |Description |
6464
|---------|---------|
@@ -89,14 +89,14 @@ In this section, you create an Azure Databricks workspace using the Azure portal
8989
Accept all other defaults other than the following values:
9090

9191
* Enter a name for the cluster.
92-
* For this article, create a cluster with **4.0** runtime.
92+
* For this article, create a cluster with **4.0** runtime.
9393
* Make sure you select the **Terminate after \_\_ minutes of inactivity** checkbox. Provide a duration (in minutes) to terminate the cluster, if the cluster is not being used.
9494

9595
Select **Create cluster**. Once the cluster is running, you can attach notebooks to the cluster and run Spark jobs.
9696

9797
## Create an Azure Data Lake Store account
9898

99-
In this section, you create an Azure Data Lake Store account and associate an Azure Active Directory service principal with it. Later in this tutorial, you use this service principal in Azure Databricks to access Azure Data Lake Store.
99+
In this section, you create an Azure Data Lake Store account and associate an Azure Active Directory service principal with it. Later in this tutorial, you use this service principal in Azure Databricks to access Azure Data Lake Store.
100100

101101
1. From the [Azure portal](https://portal.azure.com), select **Create a resource** > **Storage** > **Data Lake Store**.
102102
3. In the **New Data Lake Store** blade, provide the values as shown in the following screenshot:
@@ -183,7 +183,7 @@ When programmatically logging in, you need to pass the tenant ID with your authe
183183

184184
1. Copy the **Directory ID**. This value is your tenant ID.
185185

186-
![tenant ID](./media/databricks-extract-load-sql-data-warehouse/copy-directory-id.png)
186+
![tenant ID](./media/databricks-extract-load-sql-data-warehouse/copy-directory-id.png)
187187

188188
## Upload data to Data Lake Store
189189

@@ -300,7 +300,7 @@ You have now extracted the data from Azure Data Lake Store into Azure Databricks
300300

301301
## Transform data in Azure Databricks
302302

303-
The raw sample data **small_radio_json.json** captures the audience for a radio station and has a variety of columns. In this section, you transform the data to only retrieve specific columns in from the dataset.
303+
The raw sample data **small_radio_json.json** captures the audience for a radio station and has a variety of columns. In this section, you transform the data to only retrieve specific columns in from the dataset.
304304

305305
1. Start by retrieving only the columns *firstName*, *lastName*, *gender*, *location*, and *level* from the dataframe you already created.
306306

@@ -334,7 +334,7 @@ The raw sample data **small_radio_json.json** captures the audience for a radio
334334
| Margaux| Smith| F|Atlanta-Sandy Spr...| free|
335335
+---------+----------+------+--------------------+-----+
336336

337-
2. You can further transform this data to rename the column **level** to **subscription_type**.
337+
2. You can further transform this data to rename the column **level** to **subscription_type**.
338338

339339
val renamedColumnsDf = specificColumnsDf.withColumnRenamed("level", "subscription_type")
340340
renamedColumnsDf.show()
@@ -376,7 +376,7 @@ As mentioned earlier, the SQL date warehouse connector uses Azure Blob Storage a
376376

377377
val blobStorage = "<STORAGE ACCOUNT NAME>.blob.core.windows.net"
378378
val blobContainer = "<CONTAINER NAME>"
379-
val blobAccessKey = "<ACCESS KEY>"
379+
val blobAccessKey = "<ACCESS KEY>"
380380

381381
2. Specify a temporary folder that will be used while moving data between Azure Databricks and Azure SQL Data Warehouse.
382382

@@ -391,23 +391,23 @@ As mentioned earlier, the SQL date warehouse connector uses Azure Blob Storage a
391391

392392
//SQL Data Warehouse related settings
393393
val dwDatabase = "<DATABASE NAME>"
394-
val dwServer = "<DATABASE SERVER NAME>"
394+
val dwServer = "<DATABASE SERVER NAME>"
395395
val dwUser = "<USER NAME>"
396396
val dwPass = "<PASSWORD>"
397-
val dwJdbcPort = "1433"
397+
val dwJdbcPort = "1433"
398398
val dwJdbcExtraOptions = "encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
399399
val sqlDwUrl = "jdbc:sqlserver://" + dwServer + ".database.windows.net:" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass + ";$dwJdbcExtraOptions"
400400
val sqlDwUrlSmall = "jdbc:sqlserver://" + dwServer + ".database.windows.net:" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass
401401

402-
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDf**, as a table in SQL data warehouse. This snippet creates a table called **SampleTable** in the SQL database. Please note that Azure SQL DW requires a master key. You can create a master key by executing "CREATE MASTER KEY;" command in SQL Server Management Studio.
402+
5. Run the following snippet to load the transformed dataframe, **renamedColumnsDf**, as a table in SQL data warehouse. This snippet creates a table called **SampleTable** in the SQL database. Please note that Azure SQL DW requires a master key. You can create a master key by executing "CREATE MASTER KEY;" command in SQL Server Management Studio.
403403

404404
spark.conf.set(
405405
"spark.sql.parquet.writeLegacyFormat",
406406
"true")
407407
408408
renamedColumnsDf.write
409409
.format("com.databricks.spark.sqldw")
410-
.option("url", sqlDwUrlSmall)
410+
.option("url", sqlDwUrlSmall)
411411
.option("dbtable", "SampleTable")
412412
.option( "forward_spark_azure_storage_credentials","True")
413413
.option("tempdir", tempDir)
@@ -428,9 +428,9 @@ After you have finished running the tutorial, you can terminate the cluster. To
428428

429429
![Stop a Databricks cluster](./media/databricks-extract-load-sql-data-warehouse/terminate-databricks-cluster.png "Stop a Databricks cluster")
430430

431-
If you do not manually terminate the cluster it will automatically stop, provided you selected the **Terminate after __ minutes of inactivity** checkbox while creating the cluster. In such a case, the cluster automatically stops if it has been inactive for the specified time.
431+
If you do not manually terminate the cluster it will automatically stop, provided you selected the **Terminate after \_\_ minutes of inactivity** checkbox while creating the cluster. In such a case, the cluster automatically stops if it has been inactive for the specified time.
432432

433-
## Next steps
433+
## Next steps
434434
In this tutorial, you learned how to:
435435

436436
> [!div class="checklist"]

0 commit comments

Comments
 (0)