Skip to content

Commit 80599d7

Browse files
author
Rahul Ajmera
committed
Update Twitter sample to include steps to create Data Source and External Table
1 parent dc0b9e1 commit 80599d7

File tree

1 file changed

+71
-12
lines changed

1 file changed

+71
-12
lines changed

samples/features/sql-big-data-cluster/spark/data-loading/spark-twitter-streaming-sample.ipynb

Lines changed: 71 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,23 @@
1616
{
1717
"cell_type": "markdown",
1818
"source": [
19-
"![Microsoft](https://raw.githubusercontent.com/microsoft/azuredatastudio/master/src/sql/media/microsoft-small-logo.png)\r\n",
19+
"<p align=\"center\">\r\n",
20+
"<img src =\"https://raw.githubusercontent.com/microsoft/azuredatastudio/master/src/sql/media/microsoft_logo_gray.svg?sanitize=true\" width=\"250\" align=\"center\">\r\n",
21+
"</p>\r\n",
22+
"\r\n",
2023
"# **Twitter Streaming with SQL Server & Spark**\r\n",
2124
"\r\n",
2225
"In this notebook, we will go through the process of using Spark to stream tweets from the Twitter API, and then stream the resulting data into the SQL Server data pool. Once the data is in the data pool, we will perform queries on it using T-SQL or the Spark-SQL connector. \r\n",
2326
"\r\n",
2427
"## **Steps**\r\n",
2528
"1. [Create a Twitter Developer Account](https://developer.twitter.com/en/apply-for-access.html).\r\n",
2629
"2. Setup\r\n",
27-
" 1. Create 'TwitterData' database and retrieve server hostname.\r\n",
28-
" 2. Change kernel from \"SQL\" to \"Spark | Scala\".\r\n",
29-
" 3. Import packages.\r\n",
30-
" 4. Enter required parameters.\r\n",
30+
" 1. Create 'TwitterData' database.\r\n",
31+
" 2. Create an External Data Source 'TweetsDataSource'.\r\n",
32+
" 3. Create an External Table 'Tweets'.\r\n",
33+
" 4. Change kernel from \"SQL\" to \"Spark | Scala\".\r\n",
34+
" 5. Import packages.\r\n",
35+
" 6. Enter required parameters.\r\n",
3136
"3. Define and create a TwitterStream object.\r\n",
3237
"4. Start the TwitterStream.\r\n",
3338
"5. Validate streaming data.\r\n",
@@ -53,10 +58,12 @@
5358
"cell_type": "markdown",
5459
"source": [
5560
"## **2. Setup**\n",
56-
"1. Create a database in the SQL Server master instance called 'TwitterData', and retrieve server hostname. \n",
57-
"2. Change the Kernel from \"SQL\" to \"Spark | Scala\".\n",
58-
"3. Import Java packages.\n",
59-
"4. Specify setup parameters"
61+
"1. Create a database in the SQL Server master instance named 'TwitterData'.\n",
62+
"2. Create an External Data Source to the Data Pool named 'TweetsDataSource'.\n",
63+
"3. Create an External Table in the Data Pool named 'Tweets'.\n",
64+
"4. Change the Kernel from \"SQL\" to \"Spark | Scala\".\n",
65+
"5. Import Java packages.\n",
66+
"6. Specify setup parameters"
6067
],
6168
"metadata": {
6269
"azdata_cell_guid": "514963d4-c9eb-42a7-bd81-c6735f79d647"
@@ -112,7 +119,59 @@
112119
{
113120
"cell_type": "markdown",
114121
"source": [
115-
"### **2.2 Change the kernel from \"SQL\" to \"Spark | Scala\"**\n",
122+
"### **2.2 Create External Data Source 'TweetsDataSource'**"
123+
],
124+
"metadata": {
125+
"azdata_cell_guid": "03542af4-1e39-4049-a982-a44fce4cebd4"
126+
}
127+
},
128+
{
129+
"cell_type": "code",
130+
"source": [
131+
"USE TwitterData\n",
132+
"GO\n",
133+
"\n",
134+
"IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'TweetsDataSource')\n",
135+
" CREATE EXTERNAL DATA SOURCE TweetsDataSource\n",
136+
" WITH (LOCATION = 'sqldatapool://controller-svc/default');"
137+
],
138+
"metadata": {
139+
"azdata_cell_guid": "b01e9faf-d701-4a5e-95a3-7afb66b1249b"
140+
},
141+
"outputs": [],
142+
"execution_count": 0
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"source": [
147+
"### **2.3 Create External Table 'Tweets'**"
148+
],
149+
"metadata": {
150+
"azdata_cell_guid": "a2576ce9-bd62-4138-937c-f5ccdfe0834e"
151+
}
152+
},
153+
{
154+
"cell_type": "code",
155+
"source": [
156+
"IF NOT EXISTS(SELECT * FROM sys.external_tables WHERE name = 'Tweets')\n",
157+
" CREATE EXTERNAL TABLE [Tweets]\n",
158+
" (\"screen_name\" NVARCHAR(MAX), \"createdAt\" DATETIME , \"num_followers\" BIGINT, \"text\" NVARCHAR(MAX))\n",
159+
" WITH\n",
160+
" (\n",
161+
" DATA_SOURCE = TweetsDataSource,\n",
162+
" DISTRIBUTION = ROUND_ROBIN\n",
163+
" );"
164+
],
165+
"metadata": {
166+
"azdata_cell_guid": "e80447c6-92a8-459f-aa17-517b89bd5fed"
167+
},
168+
"outputs": [],
169+
"execution_count": 0
170+
},
171+
{
172+
"cell_type": "markdown",
173+
"source": [
174+
"### **2.4 Change the kernel from \"SQL\" to \"Spark | Scala\"**\n",
116175
"At the top of the editor, click the Kernel dropdown menu and change the kernel from \"SQL\" to \"Spark | Scala\". This will update the notebook language, and allow you to proceed with the next steps."
117176
],
118177
"metadata": {
@@ -122,7 +181,7 @@
122181
{
123182
"cell_type": "markdown",
124183
"source": [
125-
"### **2.3 Import packages**"
184+
"### **2.5 Import packages**"
126185
],
127186
"metadata": {
128187
"azdata_cell_guid": "04406211-4b11-4be8-b0da-e8ade7e6bdfc"
@@ -157,7 +216,7 @@
157216
{
158217
"cell_type": "markdown",
159218
"source": [
160-
"### **2.4 Parameters**\r\n",
219+
"### **2.6 Parameters**\r\n",
161220
"Enter the required parameters for the Spark streaming job to connect to SQL Server.\r\n",
162221
"\r\n",
163222
"In this example, the connection is made from Spark to the SQL Server master instance using the internal DNS name (Ex: master-0.master-svc) and port (1433). Alternatively, and especially if you are using a highly available Always On Availability Group, you can connect to the Kubernetes service that exposes the primary node of the Always On Availability Group.\r\n",

0 commit comments

Comments
 (0)