You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Use Apache Spark to read and write data to Azure SQL Database
3
-
description: Learn how to set up a connection between HDInsight Spark cluster and an Azure SQL Database to read data, write data, and stream data into a SQL database
3
+
description: Learn how to set up a connection between HDInsight Spark cluster and an Azure SQL Database. To read data, write data, and stream data into a SQL database
4
4
author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
9
ms.custom: hdinsightactive
10
-
ms.date: 03/05/2020
10
+
ms.date: 04/20/2020
11
11
---
12
12
13
13
# Use HDInsight Spark cluster to read and write data to Azure SQL Database
14
14
15
-
Learn how to connect an Apache Spark cluster in Azure HDInsight with an Azure SQL Database and then read, write, and stream data into the SQL database. The instructions in this article use a [Jupyter Notebook](https://jupyter.org/) to run the Scala code snippets. However, you can create a standalone application in Scala or Python and perform the same tasks.
15
+
Learn how to connect an Apache Spark cluster in Azure HDInsight with an Azure SQL Database. Then read, write, and stream data into the SQL database. The instructions in this article use a Jupyter Notebook to run the Scala code snippets. However, you can create a standalone application in Scala or Python and do the same tasks.
16
16
17
17
## Prerequisites
18
18
@@ -28,7 +28,7 @@ Learn how to connect an Apache Spark cluster in Azure HDInsight with an Azure SQ
28
28
29
29
## Create a Jupyter Notebook
30
30
31
-
Start by creating a [Jupyter Notebook](https://jupyter.org/) associated with the Spark cluster. You use this notebook to run the code snippets used in this article.
31
+
Start by creating a Jupyter Notebook associated with the Spark cluster. You use this notebook to run the code snippets used in this article.
32
32
33
33
1. From the [Azure portal](https://portal.azure.com/), open your cluster.
34
34
1. Select **Jupyter notebook** underneath **Cluster dashboards** on the right side. If you don't see **Cluster dashboards**, select **Overview** from the left menu. If prompted, enter the admin credentials for the cluster.
@@ -49,7 +49,7 @@ Start by creating a [Jupyter Notebook](https://jupyter.org/) associated with the
49
49
> [!NOTE]
50
50
> In this article, we use a Spark (Scala) kernel because streaming data from Spark into SQL database is only supported in Scala and Java currently. Even though reading from and writing into SQL can be done using Python, for consistency in this article, we use Scala for all three operations.
51
51
52
-
1.This opens a new notebook with a default name, **Untitled**. Click the notebook name and enter a name of your choice.
52
+
1.A new notebook opens with a default name, **Untitled**. Click the notebook name and enter a name of your choice.
53
53
54
54

55
55
@@ -61,83 +61,105 @@ In this section, you read data from a table (for example, **SalesLT.Address**) t
61
61
62
62
1. In a new Jupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your Azure SQL Database.
63
63
64
-
// Declare the values for your Azure SQL database
64
+
```scala
65
+
// Declare the values for your Azure SQL database
65
66
66
-
val jdbcUsername = "<SQL DB ADMIN USER>"
67
-
val jdbcPassword = "<SQL DB ADMIN PWD>"
68
-
val jdbcHostname = "<SQL SERVER NAME HOSTING SDL DB>" //typically, this is in the form or servername.database.windows.net
69
-
val jdbcPort = 1433
70
-
val jdbcDatabase ="<AZURE SQL DB NAME>"
67
+
valjdbcUsername="<SQL DB ADMIN USER>"
68
+
valjdbcPassword="<SQL DB ADMIN PWD>"
69
+
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
70
+
valjdbcPort=1433
71
+
valjdbcDatabase="<AZURE SQL DB NAME>"
72
+
```
71
73
72
74
Press**SHIFT+ENTER** to run the code cell.
73
75
74
76
1. Use the snippet below to build a JDBCURL that you can pass to the Spark dataframe APIs. The code creates a `Properties` objectto hold the parameters. Paste the snippet in a code cell and press **SHIFT+ENTER** to run.
75
77
76
-
import java.util.Properties
78
+
```scala
79
+
importjava.util.Properties
77
80
78
-
val jdbc_url = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=60;"
1. Use the snippet below to create a dataframe with the data from a table in your AzureSQLDatabase. Inthis snippet, we use a `SalesLT.Address` table that is available aspart of the **AdventureWorksLT** database. Paste the snippet in a code cell and press **SHIFT+ENTER** to run.
84
88
85
-
val sqlTableDF = spark.read.jdbc(jdbc_url, "SalesLT.Address", connectionProperties)
Inthis section, we use a sample CSV file available on the cluster to create a table in AzureSQLDatabase and populate it with data. The sample CSV file (**HVAC.csv**) is available on all HDInsight clusters at `HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv`.
106
118
107
119
1. In a newJupyter notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your AzureSQLDatabase.
108
120
109
-
// Declare the values for your Azure SQL database
121
+
```scala
122
+
// Declare the values for your Azure SQL database
110
123
111
-
val jdbcUsername = "<SQL DB ADMIN USER>"
112
-
val jdbcPassword = "<SQL DB ADMIN PWD>"
113
-
val jdbcHostname = "<SQL SERVER NAME HOSTING SDL DB>" //typically, this is in the form or servername.database.windows.net
114
-
val jdbcPort = 1433
115
-
val jdbcDatabase ="<AZURE SQL DB NAME>"
124
+
valjdbcUsername="<SQL DB ADMIN USER>"
125
+
valjdbcPassword="<SQL DB ADMIN PWD>"
126
+
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
127
+
valjdbcPort=1433
128
+
valjdbcDatabase="<AZURE SQL DB NAME>"
129
+
```
116
130
117
131
Press**SHIFT+ENTER** to run the code cell.
118
132
119
133
1. The following snippet builds a JDBCURL that you can pass to the Spark dataframe APIs. The code creates a `Properties` objectto hold the parameters. Paste the snippet in a code cell and press **SHIFT+ENTER** to run.
120
134
121
-
import java.util.Properties
135
+
```scala
136
+
importjava.util.Properties
122
137
123
-
val jdbc_url = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=60;"
1. Use the following snippet to extract the schema of the data in HVAC.csv and use the schema to load the data from the CSV in a dataframe, `readDf`. Paste the snippet in a code cell and press **SHIFT+ENTER** to run.
129
145
130
-
val userSchema = spark.read.option("header", "true").csv("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv").schema
131
-
val readDf = spark.read.format("csv").schema(userSchema).load("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv")
1. We stream data from the **HVAC.csv** into the `hvactable`. HVAC.csv file is available on the cluster at `/HdiSamples/HdiSamples/SensorSampleData/HVAC/`. In the following snippet, we first get the schema of the data to be streamed. Then, we create a streaming dataframe using that schema. Paste the snippet in a code cell and press **SHIFT+ENTER** to run.
177
201
178
-
val userSchema =spark.read.option("header", "true").csv("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv").schema
179
-
val readStreamDf =spark.readStream.schema(userSchema).csv("wasbs:///HdiSamples/HdiSamples/SensorSampleData/hvac/")
1. The output shows the schema of **HVAC.csv**. The `hvactable` has the same schema aswell. The output lists the columns in the table.
183
209
184
-

210
+

185
211
186
212
1. Finally, use the following snippet to read data from the HVAC.csv and stream it into the `hvactable` in AzureSQLDatabase. Paste the snippet in a code cell, replace the placeholder values with the values for your AzureSQLDatabase, and then press **SHIFT+ENTER** to run.
187
213
188
-
val WriteToSQLQuery =readStreamDf.writeStream.foreach(new ForeachWriter[Row] {
189
-
var connection:java.sql.Connection = _
190
-
var statement:java.sql.Statement = _
191
-
192
-
val jdbcUsername ="<SQL DB ADMIN USER>"
193
-
val jdbcPassword ="<SQL DB ADMIN PWD>"
194
-
val jdbcHostname ="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form orservername.database.windows.net
195
-
val jdbcPort =1433
196
-
val jdbcDatabase ="<AZURE SQL DB NAME>"
197
-
val driver ="com.microsoft.sqlserver.jdbc.SQLServerDriver"
198
-
val jdbc_url = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
statement.execute("INSERT INTO "+"dbo.hvactable"+" VALUES ("+ valueStr +")")
245
+
}
246
+
247
+
defclose(errorOrNull: Throwable):Unit= {
248
+
connection.close
249
+
}
250
+
})
251
+
252
+
varstreamingQuery=WriteToSQLQuery.start()
253
+
```
226
254
227
255
1. Verify that the data is being streamed into the `hvactable` by running the following query in SQLServerManagementStudio (SSMS). Every time you run the query, it shows the number of rows in the table increasing.
228
256
@@ -233,5 +261,5 @@ In this section, we stream data into the `hvactable` that you already created in
233
261
##Next steps
234
262
235
263
* [UseHDInsightSpark cluster to analyze data in DataLakeStorage](apache-spark-use-with-data-lake-store.md)
236
-
* [Process structured streaming events using EventHub](apache-spark-eventhub-structured-streaming.md)
264
+
* [Load data and run queries on an ApacheSpark cluster in AzureHDInsight](apache-spark-load-data-run-query.md)
237
265
* [UseApacheSparkStructuredStreamingwithApacheKafka on HDInsight](../hdinsight-apache-kafka-spark-structured-streaming.md)
0 commit comments