Skip to content

Commit 4e53e23

Browse files
authored
Merge pull request #92272 from dagiro/freshness24
freshness24
2 parents 9d559fc + 659e140 commit 4e53e23

File tree

1 file changed

+58
-59
lines changed

1 file changed

+58
-59
lines changed

articles/hdinsight/spark/apache-spark-eventhub-streaming.md

Lines changed: 58 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.custom: hdinsightactive,mvc
99
ms.topic: tutorial
10-
ms.date: 05/24/2019
10+
ms.date: 10/17/2019
1111

1212
#customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to use Apache Spark in Azure HDInsight to process streaming data from Azure Event Hubs.
1313
---
@@ -29,53 +29,53 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
2929

3030
* Familiarity with using Jupyter Notebooks with Spark on HDInsight. For more information, see [Load data and run queries with Apache Spark on HDInsight](./apache-spark-load-data-run-query.md).
3131

32-
* A [Twitter account](https://twitter.com/i/flow/signup).
32+
* A [Twitter account](https://twitter.com/i/flow/signup) and familiarity with Twitter.
3333

3434
## Create a Twitter application
3535

3636
To receive a stream of tweets, you create an application in Twitter. Follow the instructions create a Twitter application and write down the values that you need to complete this tutorial.
3737

3838
1. Browse to [Twitter Application Management](https://apps.twitter.com/).
3939

40-
1. Select **Create New App**.
40+
1. Select **Create an app**.
4141

42-
1. Provide the following values:
42+
1. Provide the following required values:
4343

4444
|Property |Value |
4545
|---|---|
46-
|Name|Provide the application name. The value used for this tutorial is **HDISparkStreamApp0423**. This name has to be a unique name.|
47-
|Description|Provide a short description of the application. The value used for this tutorial is **A simple HDInsight Spark streaming application**.|
48-
|Website|Provide the application's website. It doesn't have to be a valid website. The value used for this tutorial is `http://www.contoso.com`.|
49-
|Callback URL|You can leave it blank.|
46+
|App name|Provide the application name. The value used for this tutorial is **HDISparkStreamApp0423**. This name has to be a unique name.|
47+
|Application description|Provide a short description of the application. The value used for this tutorial is **A simple HDInsight Spark streaming application**.|
48+
|Website URL|Provide the application's website. It doesn't have to be a valid website. The value used for this tutorial is `http://www.contoso.com`.|
49+
|Tell us how this app will be used|Testing purposes only. Creating an Apache Spark streaming application to send tweets to an Azure event hub.|
5050

51-
1. Select **Yes, I have read and agree to the Twitter Developer Agreement**, and then Select **Create your Twitter application**.
51+
1. Select **Create**.
5252

53-
1. Select the **Keys and Access Tokens** tab.
53+
1. From the **Review our Developer Terms** pop-up, select **Create**.
5454

55-
1. Select **Create my access token** at the end of the page.
55+
1. Select the **Keys and tokens** tab.
5656

57-
1. Write down the following values from the page. You need these values later in the tutorial:
57+
1. Under **Access token & access token secret**, select **Create**.
5858

59-
- **Consumer Key (API Key)**
60-
- **Consumer Secret (API Secret)**
61-
- **Access Token**
62-
- **Access Token Secret**
59+
1. Write down the following four values that now appear on the page for later use:
60+
61+
- **Consumer key (API key)**
62+
- **Consumer secret (API secret key)**
63+
- **Access token**
64+
- **Access token secret**
6365

6466
## Create an Azure Event Hubs namespace
6567

6668
You use this event hub to store tweets.
6769

68-
1. Sign in to the [Azure portal](https://portal.azure.com).
69-
70-
2. From the left menu, select **All services**.
70+
1. Sign in to the [Azure portal](https://portal.azure.com).
7171

72-
3. Under **INTERNET OF THINGS**, select **Event Hubs**.
72+
1. From the left menu, navigate to **All services** > **Internet of things** > **Event Hubs**.
7373

7474
![Create event hub for Spark streaming example](./media/apache-spark-eventhub-streaming/hdinsight-create-event-hub-for-spark-streaming.png "Create event hub for Spark streaming example")
7575

76-
4. Select **+ Add**.
76+
1. Select **+ Add**.
7777

78-
5. Enter the following values for the new Event Hubs namespace:
78+
1. Enter the following values for the new Event Hubs namespace:
7979

8080
|Property |Value |
8181
|---|---|
@@ -89,44 +89,43 @@ You use this event hub to store tweets.
8989

9090
![Provide an event hub name for Spark streaming example](./media/apache-spark-eventhub-streaming/hdinsight-provide-event-hub-name-for-spark-streaming.png "Provide an event hub name for Spark streaming example")
9191

92-
6. Select **Create** to create the namespace. The deployment will complete in a few minutes.
92+
1. Select **Create** to create the namespace. The deployment will complete in a few minutes.
9393

9494
## Create an Azure event hub
95-
Create an event hub after the Event Hubs namespace has been deployed. From the portal:
9695

97-
1. From the left menu, select **All services**.
96+
Create an event hub after the Event Hubs namespace has been deployed. From the portal:
9897

99-
1. Under **INTERNET OF THINGS**, select **Event Hubs**.
98+
1. From the left menu, navigate to **All services** > **Internet of things** > **Event Hubs**.
10099

101-
1. Select your Event Hubs namespace from the list.
100+
1. Select your Event Hubs namespace from the list.
102101

103102
1. From the **Event Hubs Namespace** page, select **+ Event Hub**.
103+
104104
1. Enter the following values in the **Create Event Hub** page:
105105

106-
- **Name**: Give a name for your Event Hub.
107-
106+
- **Name**: Give a name for your Event Hub.
107+
108108
- **Partition count**: 10.
109109

110-
- **Message retention**: 1.
111-
110+
- **Message retention**: 1.
111+
112112
![Provide event hub details for Spark streaming example](./media/apache-spark-eventhub-streaming/hdinsight-provide-event-hub-details-for-spark-streaming-example.png "Provide event hub details for Spark streaming example")
113113

114-
1. Select **Create**. The deployment should complete in a few seconds and you will be returned to the Event Hubs Namespace page.
114+
1. Select **Create**. The deployment should complete in a few seconds and you'll be returned to the Event Hubs Namespace page.
115115

116116
1. Under **Settings**, select **Shared access policies**.
117117

118118
1. Select **RootManageSharedAccessKey**.
119-
119+
120120
![Set Event Hub policies for the Spark streaming example](./media/apache-spark-eventhub-streaming/hdinsight-set-event-hub-policies-for-spark-streaming-example.png "Set Event Hub policies for the Spark streaming example")
121121

122-
1. Save the values of **Primary key** and **Connection string-primary key** to use later in the tutorial.
122+
1. Save the values of **Primary key** and **Connection string-primary key** for use later in the tutorial.
123123

124124
![View Event Hub policy keys for the Spark streaming example](./media/apache-spark-eventhub-streaming/hdinsight-view-event-hub-policy-keys.png "View Event Hub policy keys for the Spark streaming example")
125125

126-
127126
## Send tweets to the event hub
128127

129-
Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
128+
1. Navigate to `https://CLUSTERNAME.azurehdinsight.net/jupyter` where `CLUSTERNAME` is the name of your Apache Spark cluster. Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
130129

131130
1. Run the following code to add the external Apache Maven libraries:
132131

@@ -135,53 +134,53 @@ Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
135134
{"conf":{"spark.jars.packages":"com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.13,org.twitter4j:twitter4j-core:4.0.6"}}
136135
```
137136
138-
2. Edit the code below by replacing `<Event hub name>`, `<Event hub namespace connection string>`, `<CONSUMER KEY>`, `<CONSUMER SECRET>`, `<ACCESS TOKEN>`, and `<TOKEN SECRET>` with the appropriate values. Run the edited code to send tweets to your event hub:
137+
1. Edit the code below by replacing `<Event hub name>`, `<Event hub namespace connection string>`, `<CONSUMER KEY>`, `<CONSUMER SECRET>`, `<ACCESS TOKEN>`, and `<TOKEN SECRET>` with the appropriate values. Run the edited code to send tweets to your event hub:
139138
140139
```scala
141140
import java.util._
142141
import scala.collection.JavaConverters._
143142
import java.util.concurrent._
144-
143+
145144
import org.apache.spark._
146145
import org.apache.spark.streaming._
147146
import org.apache.spark.eventhubs.ConnectionStringBuilder
148147
149148
// Event hub configurations
150-
// Replace values below with yours
149+
// Replace values below with yours
151150
val eventHubName = "<Event hub name>"
152151
val eventHubNSConnStr = "<Event hub namespace connection string>"
153-
val connStr = ConnectionStringBuilder(eventHubNSConnStr).setEventHubName(eventHubName).build
154-
152+
val connStr = ConnectionStringBuilder(eventHubNSConnStr).setEventHubName(eventHubName).build
153+
155154
import com.microsoft.azure.eventhubs._
156155
val pool = Executors.newFixedThreadPool(1)
157156
val eventHubClient = EventHubClient.create(connStr.toString(), pool)
158-
157+
159158
def sendEvent(message: String) = {
160159
val messageData = EventData.create(message.getBytes("UTF-8"))
161160
eventHubClient.get().send(messageData)
162161
println("Sent event: " + message + "\n")
163162
}
164-
163+
165164
import twitter4j._
166165
import twitter4j.TwitterFactory
167166
import twitter4j.Twitter
168167
import twitter4j.conf.ConfigurationBuilder
169168
170169
// Twitter application configurations
171-
// Replace values below with yours
170+
// Replace values below with yours
172171
val twitterConsumerKey = "<CONSUMER KEY>"
173172
val twitterConsumerSecret = "<CONSUMER SECRET>"
174173
val twitterOauthAccessToken = "<ACCESS TOKEN>"
175174
val twitterOauthTokenSecret = "<TOKEN SECRET>"
176-
175+
177176
val cb = new ConfigurationBuilder()
178177
cb.setDebugEnabled(true).setOAuthConsumerKey(twitterConsumerKey).setOAuthConsumerSecret(twitterConsumerSecret).setOAuthAccessToken(twitterOauthAccessToken).setOAuthAccessTokenSecret(twitterOauthTokenSecret)
179-
178+
180179
val twitterFactory = new TwitterFactory(cb.build())
181180
val twitter = twitterFactory.getInstance()
182181
183182
// Getting tweets with keyword "Azure" and sending them to the Event Hub in realtime!
184-
183+
185184
val query = new Query(" #Azure ")
186185
query.setCount(100)
187186
query.lang("en")
@@ -199,16 +198,16 @@ Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
199198
}
200199
query.setMaxId(lowestStatusId - 1)
201200
}
202-
201+
203202
// Closing connection to the Event Hub
204203
eventHubClient.get().close()
205204
```
206205
207-
3. Open the event hub in the Azure portal. On **Overview**, you shall see some charts showing the messages sent to the event hub.
206+
1. Open the event hub in the Azure portal. On **Overview**, you shall see some charts showing the messages sent to the event hub.
208207
209208
## Read tweets from the event hub
210209
211-
Create another Jupyter notebook, and name it **ReadTweetsFromEventHub**.
210+
Create another Jupyter notebook, and name it **ReadTweetsFromEventHub**.
212211
213212
1. Run the following code to add an external Apache Maven library:
214213
@@ -222,34 +221,34 @@ Create another Jupyter notebook, and name it **ReadTweetsFromEventHub**.
222221
```scala
223222
import org.apache.spark.eventhubs._
224223
// Event hub configurations
225-
// Replace values below with yours
224+
// Replace values below with yours
226225
val eventHubName = "<Event hub name>"
227226
val eventHubNSConnStr = "<Event hub namespace connection string>"
228227
val connStr = ConnectionStringBuilder(eventHubNSConnStr).setEventHubName(eventHubName).build
229-
228+
230229
val customEventhubParameters = EventHubsConf(connStr).setMaxEventsPerTrigger(5)
231230
val incomingStream = spark.readStream.format("eventhubs").options(customEventhubParameters.toMap).load()
232-
//incomingStream.printSchema
233-
231+
//incomingStream.printSchema
232+
234233
import org.apache.spark.sql.types._
235234
import org.apache.spark.sql.functions._
236-
235+
237236
// Event Hub message format is JSON and contains "body" field
238237
// Body is binary, so you cast it to string to see the actual content of the message
239238
val messages = incomingStream.withColumn("Offset", $"offset".cast(LongType)).withColumn("Time (readable)", $"enqueuedTime".cast(TimestampType)).withColumn("Timestamp", $"enqueuedTime".cast(LongType)).withColumn("Body", $"body".cast(StringType)).select("Offset", "Time (readable)", "Timestamp", "Body")
240-
239+
241240
messages.printSchema
242-
241+
243242
messages.writeStream.outputMode("append").format("console").option("truncate", false).start().awaitTermination()
244243
```
245244
246245
## Clean up resources
247246
248-
With HDInsight, your data is stored in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. If you plan to work on the next tutorial immediately, you might want to keep the cluster, otherwise go ahead, and delete the cluster.
247+
With HDInsight, your data is stored in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. If you plan to work on the next tutorial immediately, you might want to keep the cluster, otherwise go ahead, and delete the cluster.
249248
250249
Open the cluster in the Azure portal, and select **Delete**.
251250
252-
![HDInsight Azure Portal delete cluster](./media/apache-spark-load-data-run-query/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
251+
![HDInsight Azure portal delete cluster](./media/apache-spark-load-data-run-query/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
253252
254253
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight Spark cluster, and the default storage account.
255254

0 commit comments

Comments
 (0)