You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -14,12 +14,12 @@ Learn how to connect an Apache Spark cluster in Azure HDInsight with Azure SQL D
14
14
## Prerequisites
15
15
16
16
* Azure HDInsight Spark cluster. Follow the instructions at [Create an Apache Spark cluster in HDInsight](apache-spark-jupyter-spark-sql.md).
17
-
18
17
* Azure SQL Database. Follow the instructions at [Create a database in Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart). Make sure you create a database with the sample **AdventureWorksLT** schema and data. Also, make sure you create a server-level firewall rule to allow your client's IP address to access the SQL database. The instructions to add the firewall rule is available in the same article. Once you've created your SQL database, make sure you keep the following values handy. You need them to connect to the database from a Spark cluster.
19
18
20
-
* Server name.
21
-
* Database name.
22
-
* Azure SQL Database admin user name / password.
19
+
* Server name.
20
+
* Database name.
21
+
* Azure SQL Database admin user name / password.
22
+
23
23
24
24
* SQL Server Management Studio (SSMS). Follow the instructions at [Use SSMS to connect and query data](/azure/azure-sql/database/connect-query-ssms).
25
25
@@ -30,133 +30,130 @@ Start by creating a Jupyter Notebook associated with the Spark cluster. You use
30
30
1. From the [Azure portal](https://portal.azure.com/), open your cluster.
31
31
1. Select **Jupyter Notebook** underneath **Cluster dashboards** on the right side. If you don't see **Cluster dashboards**, select **Overview** from the left menu. If prompted, enter the admin credentials for the cluster.
32
32
33
-
:::image type="content" source="./media/apache-spark-connect-to-sql-database/hdinsight-spark-cluster-dashboard-jupyter-notebook.png " alt-text="Jupyter Notebook on Apache Spark" border="true":::
33
+
:::image type="content" source="./media/apache-spark-connect-to-sql-database/new-hdinsight-spark-cluster-dashboard-jupyter-notebook.png " alt-text="Jupyter Notebook on Apache Spark" border="true":::
34
34
35
-
> [!NOTE]
35
+
> [!NOTE]
36
36
> You can also access the Jupyter Notebook on Spark cluster by opening the following URL in your browser. Replace **CLUSTERNAME** with the name of your cluster:
1. In the Jupyter Notebook, from the top-right corner, click **New**, and then click **Spark** to create a Scala notebook. Jupyter Notebooks on HDInsight Spark cluster also provide the **PySpark** kernel for Python2 applications, and the **PySpark3** kernel for Python3 applications. For this article, we create a Scala notebook.
41
40
42
-
:::image type="content" source="./media/apache-spark-connect-to-sql-database/kernel-jupyter-notebook-on-spark.png " alt-text="Kernels for Jupyter Notebook on Spark" border="true":::
41
+
:::image type="content" source="./media/apache-spark-connect-to-sql-database/new-kernel-jupyter-notebook-on-spark.png " alt-text="Kernels for Jupyter Notebook on Spark" border="true":::
43
42
44
-
For more information about the kernels, see [Use Jupyter Notebook kernels with Apache Spark clusters in HDInsight](apache-spark-jupyter-notebook-kernels.md).
43
+
For more information about the kernels, see [Use Jupyter Notebook kernels with Apache Spark clusters in HDInsight](apache-spark-jupyter-notebook-kernels.md).
45
44
46
-
> [!NOTE]
47
-
> In this article, we use a Spark (Scala) kernel because streaming data from Spark into SQL Database is only supported in Scala and Java currently. Even though reading from and writing into SQL can be done using Python, for consistency in this article, we use Scala for all three operations.
45
+
> [!NOTE]
46
+
> In this article, we use a Spark (Scala) kernel because streaming data from Spark into SQL Database is only supported in Scala and Java currently. Even though reading from and writing into SQL can be done using Python, for consistency in this article, we use Scala for all three operations.
48
47
49
48
1. A new notebook opens with a default name, **Untitled**. Click the notebook name and enter a name of your choice.
50
49
51
-
:::image type="content" source="./media/apache-spark-connect-to-sql-database/hdinsight-spark-jupyter-notebook-name.png " alt-text="Provide a name for the notebook" border="true":::
50
+
:::image type="content" source="./media/apache-spark-connect-to-sql-database/new-hdinsight-spark-jupyter-notebook-name.png " alt-text="Provide a name for the notebook" border="true":::
52
51
53
-
You can now start creating your application.
52
+
You can now start creating your application.
54
53
55
54
## Read data from Azure SQL Database
56
55
57
56
In this section, you read data from a table (for example, **SalesLT.Address**) that exists in the AdventureWorks database.
58
57
59
58
1. In a new Jupyter Notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your database.
60
59
61
-
```scala
62
-
// Declare the values for your database
63
-
64
-
valjdbcUsername="<SQL DB ADMIN USER>"
65
-
valjdbcPassword="<SQL DB ADMIN PWD>"
66
-
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
67
-
valjdbcPort=1433
68
-
valjdbcDatabase="<AZURE SQL DB NAME>"
69
-
```
60
+
```scala
61
+
// Declare the values for your database
62
+
63
+
valjdbcUsername="<SQL DB ADMIN USER>"
64
+
valjdbcPassword="<SQL DB ADMIN PWD>"
65
+
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
66
+
valjdbcPort=1433
67
+
valjdbcDatabase="<AZURE SQL DB NAME>"
68
+
```
70
69
71
70
Press **SHIFT + ENTER** to run the code cell.
72
-
73
71
1. Use the snippet below to build a JDBC URL that you can pass to the Spark dataframe APIs. The code creates a `Properties` object to hold the parameters. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
1. Use the snippet below to create a dataframe with the data from a table in your database. In this snippet, we use a `SalesLT.Address` table that is available as part of the **AdventureWorksLT** database. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
In this section, we use a sample CSV file available on the cluster to create a table in your database and populate it with data. The sample CSV file (**HVAC.csv**) is available on all HDInsight clusters at `HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv`.
115
113
116
114
1. In a new Jupyter Notebook, in a code cell, paste the following snippet and replace the placeholder values with the values for your database.
117
115
118
-
```scala
119
-
// Declare the values for your database
120
-
121
-
valjdbcUsername="<SQL DB ADMIN USER>"
122
-
valjdbcPassword="<SQL DB ADMIN PWD>"
123
-
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
124
-
valjdbcPort=1433
125
-
valjdbcDatabase="<AZURE SQL DB NAME>"
126
-
```
116
+
```scala
117
+
// Declare the values for your database
118
+
119
+
valjdbcUsername="<SQL DB ADMIN USER>"
120
+
valjdbcPassword="<SQL DB ADMIN PWD>"
121
+
valjdbcHostname="<SQL SERVER NAME HOSTING SDL DB>"//typically, this is in the form or servername.database.windows.net
122
+
valjdbcPort=1433
123
+
valjdbcDatabase="<AZURE SQL DB NAME>"
124
+
```
127
125
128
126
Press **SHIFT + ENTER** to run the code cell.
129
-
130
127
1. The following snippet builds a JDBC URL that you can pass to the Spark dataframe APIs. The code creates a `Properties` object to hold the parameters. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
1. Use the following snippet to extract the schema of the data in HVAC.csv and use the schema to load the data from the CSV in a dataframe, `readDf`. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
1. We stream data from the **HVAC.csv** into the `hvactable`. HVAC.csv file is available on the cluster at `/HdiSamples/HdiSamples/SensorSampleData/HVAC/`. In the following snippet, we first get the schema of the data to be streamed. Then, we create a streaming dataframe using that schema. Paste the snippet in a code cell and press **SHIFT + ENTER** to run.
1. Finally, use the following snippet to read data from the HVAC.csv and stream it into the `hvactable` in your database. Paste the snippet in a code cell, replace the placeholder values with the values for your database, and then press **SHIFT + ENTER** to run.
statement.execute("INSERT INTO "+"dbo.hvactable"+" VALUES ("+ valueStr +")")
238
+
}
239
+
240
+
defclose(errorOrNull: Throwable):Unit= {
241
+
connection.close
242
+
}
243
+
})
244
+
245
+
varstreamingQuery=WriteToSQLQuery.start()
246
+
```
251
247
252
248
1. Verify that the data is being streamed into the `hvactable` by running the following query in SQL Server Management Studio (SSMS). Every time you run the query, it shows the number of rows in the table increasing.
0 commit comments