Skip to content

Commit 30c0679

Browse files
authored
Merge pull request #106655 from dagiro/freshness21
freshness21
2 parents c3908e9 + 16751f0 commit 30c0679

File tree

1 file changed

+28
-28
lines changed

1 file changed

+28
-28
lines changed

articles/hdinsight/spark/apache-spark-connect-to-sql-database.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
11
---
2-
title: Use Apache Spark to read and write data to Azure SQL database
3-
description: Learn how to set up a connection between HDInsight Spark cluster and an Azure SQL database to read data, write data, and stream data into a SQL database
2+
title: Use Apache Spark to read and write data to Azure SQL Database
3+
description: Learn how to set up a connection between HDInsight Spark cluster and an Azure SQL Database to read data, write data, and stream data into a SQL database
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 10/03/2019
9+
ms.custom: hdinsightactive
10+
ms.date: 03/05/2020
1111
---
1212

13-
# Use HDInsight Spark cluster to read and write data to Azure SQL database
13+
# Use HDInsight Spark cluster to read and write data to Azure SQL Database
1414

15-
Learn how to connect an Apache Spark cluster in Azure HDInsight with an Azure SQL database and then read, write, and stream data into the SQL database. The instructions in this article use a [Jupyter Notebook](https://jupyter.org/) to run the Scala code snippets. However, you can create a standalone application in Scala or Python and perform the same tasks.
15+
Learn how to connect an Apache Spark cluster in Azure HDInsight with an Azure SQL Database and then read, write, and stream data into the SQL database. The instructions in this article use a [Jupyter Notebook](https://jupyter.org/) to run the Scala code snippets. However, you can create a standalone application in Scala or Python and perform the same tasks.
1616

1717
## Prerequisites
1818

19-
* Azure HDInsight Spark cluster*. Follow the instructions at [Create an Apache Spark cluster in HDInsight](apache-spark-jupyter-spark-sql.md).
19+
* Azure HDInsight Spark cluster. Follow the instructions at [Create an Apache Spark cluster in HDInsight](apache-spark-jupyter-spark-sql.md).
2020

21-
* Azure SQL database. Follow the instructions at [Create an Azure SQL database](../../sql-database/sql-database-get-started-portal.md). Make sure you create a database with the sample **AdventureWorksLT** schema and data. Also, make sure you create a server-level firewall rule to allow your client's IP address to access the SQL database on the server. The instructions to add the firewall rule is available in the same article. Once you've created your Azure SQL database, make sure you keep the following values handy. You need them to connect to the database from a Spark cluster.
21+
* Azure SQL Database. Follow the instructions at [Create an Azure SQL Database](../../sql-database/sql-database-get-started-portal.md). Make sure you create a database with the sample **AdventureWorksLT** schema and data. Also, make sure you create a server-level firewall rule to allow your client's IP address to access the SQL database on the server. The instructions to add the firewall rule is available in the same article. Once you've created your Azure SQL Database, make sure you keep the following values handy. You need them to connect to the database from a Spark cluster.
2222

23-
* Server name hosting the Azure SQL database.
24-
* Azure SQL database name.
25-
* Azure SQL database admin user name / password.
23+
* Server name hosting the Azure SQL Database.
24+
* Azure SQL Database name.
25+
* Azure SQL Database admin user name / password.
2626

2727
* SQL Server Management Studio (SSMS). Follow the instructions at [Use SSMS to connect and query data](../../sql-database/sql-database-connect-query-ssms.md).
2828

@@ -55,11 +55,11 @@ Start by creating a [Jupyter Notebook](https://jupyter.org/) associated with the
5555

5656
You can now start creating your application.
5757

58-
## Read data from Azure SQL database
58+
## Read data from Azure SQL Database
5959

6060
In this section, you read data from a table (for example, **SalesLT.Address**) that exists in the AdventureWorks database.
6161

62-
1. In a new Jupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your Azure SQL database.
62+
1. In a new Jupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your Azure SQL Database.
6363

6464
// Declare the values for your Azure SQL database
6565

@@ -80,7 +80,7 @@ In this section, you read data from a table (for example, **SalesLT.Address**) t
8080
connectionProperties.put("user", s"${jdbcUsername}")
8181
connectionProperties.put("password", s"${jdbcPassword}")
8282

83-
1. Use the snippet below to create a dataframe with the data from a table in your Azure SQL database. In this snippet, we use a **SalesLT.Address** table that is available as part of the **AdventureWorksLT** database. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
83+
1. Use the snippet below to create a dataframe with the data from a table in your Azure SQL Database. In this snippet, we use a `SalesLT.Address` table that is available as part of the **AdventureWorksLT** database. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
8484

8585
val sqlTableDF = spark.read.jdbc(jdbc_url, "SalesLT.Address", connectionProperties)
8686

@@ -100,11 +100,11 @@ In this section, you read data from a table (for example, **SalesLT.Address**) t
100100

101101
sqlTableDF.select("AddressLine1", "City").show(10)
102102

103-
## Write data into Azure SQL database
103+
## Write data into Azure SQL Database
104104

105-
In this section, we use a sample CSV file available on the cluster to create a table in Azure SQL database and populate it with data. The sample CSV file (**HVAC.csv**) is available on all HDInsight clusters at `HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv`.
105+
In this section, we use a sample CSV file available on the cluster to create a table in Azure SQL Database and populate it with data. The sample CSV file (**HVAC.csv**) is available on all HDInsight clusters at `HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv`.
106106

107-
1. In a new Jupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your Azure SQL database.
107+
1. In a new Jupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your Azure SQL Database.
108108

109109
// Declare the values for your Azure SQL database
110110

@@ -135,17 +135,17 @@ In this section, we use a sample CSV file available on the cluster to create a t
135135
readDf.createOrReplaceTempView("temphvactable")
136136
spark.sql("create table hvactable_hive as select * from temphvactable")
137137

138-
1. Finally, use the hive table to create a table in Azure SQL database. The following snippet creates `hvactable` in Azure SQL database.
138+
1. Finally, use the hive table to create a table in Azure SQL Database. The following snippet creates `hvactable` in Azure SQL Database.
139139

140140
spark.table("hvactable_hive").write.jdbc(jdbc_url, "hvactable", connectionProperties)
141141

142-
1. Connect to the Azure SQL database using SSMS and verify that you see a `dbo.hvactable` there.
142+
1. Connect to the Azure SQL Database using SSMS and verify that you see a `dbo.hvactable` there.
143143

144-
a. Start SSMS and connect to the Azure SQL database by providing connection details as shown in the screenshot below.
144+
a. Start SSMS and connect to the Azure SQL Database by providing connection details as shown in the screenshot below.
145145

146146
![Connect to SQL database using SSMS1](./media/apache-spark-connect-to-sql-database/connect-to-sql-db-ssms.png "Connect to SQL database using SSMS1")
147147

148-
b. From **Object Explorer**, expand the Azure SQL database and the Table node to see the **dbo.hvactable** created.
148+
b. From **Object Explorer**, expand the Azure SQL Database and the Table node to see the **dbo.hvactable** created.
149149

150150
![Connect to SQL database using SSMS2](./media/apache-spark-connect-to-sql-database/connect-to-sql-db-ssms-locate-table.png "Connect to SQL database using SSMS2")
151151

@@ -155,11 +155,11 @@ In this section, we use a sample CSV file available on the cluster to create a t
155155
SELECT * from hvactable
156156
```
157157

158-
## Stream data into Azure SQL database
158+
## Stream data into Azure SQL Database
159159

160-
In this section, we stream data into the **hvactable** that you already created in Azure SQL database in the previous section.
160+
In this section, we stream data into the `hvactable` that you already created in Azure SQL Database in the previous section.
161161

162-
1. As a first step, make sure there are no records in the **hvactable**. Using SSMS, run the following query on the table.
162+
1. As a first step, make sure there are no records in the `hvactable`. Using SSMS, run the following query on the table.
163163

164164
```sql
165165
TRUNCATE TABLE [dbo].[hvactable]
@@ -173,17 +173,17 @@ In this section, we stream data into the **hvactable** that you already created
173173
import org.apache.spark.sql.streaming._
174174
import java.sql.{Connection,DriverManager,ResultSet}
175175

176-
1. We stream data from the **HVAC.csv** into the hvactable. HVAC.csv file is available on the cluster at `/HdiSamples/HdiSamples/SensorSampleData/HVAC/`. In the following snippet, we first get the schema of the data to be streamed. Then, we create a streaming dataframe using that schema. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
176+
1. We stream data from the **HVAC.csv** into the `hvactable`. HVAC.csv file is available on the cluster at `/HdiSamples/HdiSamples/SensorSampleData/HVAC/`. In the following snippet, we first get the schema of the data to be streamed. Then, we create a streaming dataframe using that schema. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
177177

178178
val userSchema = spark.read.option("header", "true").csv("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv").schema
179179
val readStreamDf = spark.readStream.schema(userSchema).csv("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/")
180180
readStreamDf.printSchema
181181

182-
1. The output shows the schema of **HVAC.csv**. The **hvactable** has the same schema as well. The output lists the columns in the table.
182+
1. The output shows the schema of **HVAC.csv**. The `hvactable` has the same schema as well. The output lists the columns in the table.
183183

184184
![hdinsight Apache Spark schema table](./media/apache-spark-connect-to-sql-database/hdinsight-schema-table.png "Schema of table")
185185

186-
1. Finally, use the following snippet to read data from the HVAC.csv and stream it into the **hvactable** in Azure SQL database. Paste the snippet in a code cell, replace the placeholder values with the values for your Azure SQL database, and then press **SHIFT + ENTER** to run.
186+
1. Finally, use the following snippet to read data from the HVAC.csv and stream it into the `hvactable` in Azure SQL Database. Paste the snippet in a code cell, replace the placeholder values with the values for your Azure SQL Database, and then press **SHIFT + ENTER** to run.
187187

188188
val WriteToSQLQuery = readStreamDf.writeStream.foreach(new ForeachWriter[Row] {
189189
var connection:java.sql.Connection = _
@@ -224,7 +224,7 @@ In this section, we stream data into the **hvactable** that you already created
224224

225225
var streamingQuery = WriteToSQLQuery.start()
226226

227-
1. Verify that the data is being streamed into the **hvactable** by running the following query in SQL Server Management Studio (SSMS). Every time you run the query, it shows the number of rows in the table increasing.
227+
1. Verify that the data is being streamed into the `hvactable` by running the following query in SQL Server Management Studio (SSMS). Every time you run the query, it shows the number of rows in the table increasing.
228228

229229
```sql
230230
SELECT COUNT(*) FROM hvactable

0 commit comments

Comments
 (0)