You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-eventhub-streaming.md
+58-59Lines changed: 58 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.custom: hdinsightactive,mvc
9
9
ms.topic: tutorial
10
-
ms.date: 05/24/2019
10
+
ms.date: 10/17/2019
11
11
12
12
#customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to use Apache Spark in Azure HDInsight to process streaming data from Azure Event Hubs.
13
13
---
@@ -29,53 +29,53 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
29
29
30
30
* Familiarity with using Jupyter Notebooks with Spark on HDInsight. For more information, see [Load data and run queries with Apache Spark on HDInsight](./apache-spark-load-data-run-query.md).
31
31
32
-
* A [Twitter account](https://twitter.com/i/flow/signup).
32
+
* A [Twitter account](https://twitter.com/i/flow/signup) and familiarity with Twitter.
33
33
34
34
## Create a Twitter application
35
35
36
36
To receive a stream of tweets, you create an application in Twitter. Follow the instructions create a Twitter application and write down the values that you need to complete this tutorial.
37
37
38
38
1. Browse to [Twitter Application Management](https://apps.twitter.com/).
39
39
40
-
1. Select **Create New App**.
40
+
1. Select **Create an app**.
41
41
42
-
1. Provide the following values:
42
+
1. Provide the following required values:
43
43
44
44
|Property |Value |
45
45
|---|---|
46
-
|Name|Provide the application name. The value used for this tutorial is **HDISparkStreamApp0423**. This name has to be a unique name.|
47
-
|Description|Provide a short description of the application. The value used for this tutorial is **A simple HDInsight Spark streaming application**.|
48
-
|Website|Provide the application's website. It doesn't have to be a valid website. The value used for this tutorial is `http://www.contoso.com`.|
49
-
|Callback URL|You can leave it blank.|
46
+
|App name|Provide the application name. The value used for this tutorial is **HDISparkStreamApp0423**. This name has to be a unique name.|
47
+
|Application description|Provide a short description of the application. The value used for this tutorial is **A simple HDInsight Spark streaming application**.|
48
+
|Website URL|Provide the application's website. It doesn't have to be a valid website. The value used for this tutorial is `http://www.contoso.com`.|
49
+
|Tell us how this app will be used|Testing purposes only. Creating an Apache Spark streaming application to send tweets to an Azure event hub.|
50
50
51
-
1. Select **Yes, I have read and agree to the Twitter Developer Agreement**, and then Select **Create your Twitter application**.
51
+
1. Select **Create**.
52
52
53
-
1.Select the **Keys and Access Tokens**tab.
53
+
1.From the **Review our Developer Terms**pop-up, select **Create**.
54
54
55
-
1. Select **Create my access token**at the end of the page.
55
+
1. Select the **Keys and tokens**tab.
56
56
57
-
1.Write down the following values from the page. You need these values later in the tutorial:
1. Write down the following four values that now appear on the page for later use:
60
+
61
+
-**Consumer key (API key)**
62
+
-**Consumer secret (API secret key)**
63
+
-**Access token**
64
+
-**Access token secret**
63
65
64
66
## Create an Azure Event Hubs namespace
65
67
66
68
You use this event hub to store tweets.
67
69
68
-
1. Sign in to the [Azure portal](https://portal.azure.com).
69
-
70
-
2. From the left menu, select **All services**.
70
+
1. Sign in to the [Azure portal](https://portal.azure.com).
71
71
72
-
3. Under **INTERNET OF THINGS**, select **Event Hubs**.
72
+
1. From the left menu, navigate to **All services** > **Internet of things** > **Event Hubs**.
73
73
74
74

75
75
76
-
4. Select **+ Add**.
76
+
1. Select **+ Add**.
77
77
78
-
5. Enter the following values for the new Event Hubs namespace:
78
+
1. Enter the following values for the new Event Hubs namespace:
79
79
80
80
|Property |Value |
81
81
|---|---|
@@ -89,44 +89,43 @@ You use this event hub to store tweets.
89
89
90
90

91
91
92
-
6. Select **Create** to create the namespace. The deployment will complete in a few minutes.
92
+
1. Select **Create** to create the namespace. The deployment will complete in a few minutes.
93
93
94
94
## Create an Azure event hub
95
-
Create an event hub after the Event Hubs namespace has been deployed. From the portal:
96
95
97
-
1. From the left menu, select **All services**.
96
+
Create an event hub after the Event Hubs namespace has been deployed. From the portal:
98
97
99
-
1.Under **INTERNET OF THINGS**, select **Event Hubs**.
98
+
1.From the left menu, navigate to **All services** > **Internet of things** > **Event Hubs**.
100
99
101
-
1. Select your Event Hubs namespace from the list.
100
+
1. Select your Event Hubs namespace from the list.
102
101
103
102
1. From the **Event Hubs Namespace** page, select **+ Event Hub**.
103
+
104
104
1. Enter the following values in the **Create Event Hub** page:
105
105
106
-
-**Name**: Give a name for your Event Hub.
107
-
106
+
-**Name**: Give a name for your Event Hub.
107
+
108
108
-**Partition count**: 10.
109
109
110
-
-**Message retention**: 1.
111
-
110
+
-**Message retention**: 1.
111
+
112
112

113
113
114
-
1. Select **Create**. The deployment should complete in a few seconds and you will be returned to the Event Hubs Namespace page.
114
+
1. Select **Create**. The deployment should complete in a few seconds and you'll be returned to the Event Hubs Namespace page.
115
115
116
116
1. Under **Settings**, select **Shared access policies**.
117
117
118
118
1. Select **RootManageSharedAccessKey**.
119
-
119
+
120
120

121
121
122
-
1. Save the values of **Primary key** and **Connection string-primary key**to use later in the tutorial.
122
+
1. Save the values of **Primary key** and **Connection string-primary key**for use later in the tutorial.
123
123
124
124

125
125
126
-
127
126
## Send tweets to the event hub
128
127
129
-
Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
128
+
1. Navigate to `https://CLUSTERNAME.azurehdinsight.net/jupyter` where `CLUSTERNAME` is the name of your Apache Spark cluster. Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
130
129
131
130
1. Run the following code to add the external Apache Maven libraries:
132
131
@@ -135,53 +134,53 @@ Create a Jupyter notebook, and name it **SendTweetsToEventHub**.
2. Edit the code below by replacing `<Event hub name>`, `<Event hub namespace connection string>`, `<CONSUMER KEY>`, `<CONSUMER SECRET>`, `<ACCESS TOKEN>`, and `<TOKEN SECRET>` with the appropriate values. Run the edited code to send tweets to your event hub:
137
+
1. Edit the code below by replacing `<Event hub name>`, `<Event hub namespace connection string>`, `<CONSUMER KEY>`, `<CONSUMER SECRET>`, `<ACCESS TOKEN>`, and `<TOKEN SECRET>` with the appropriate values. Run the edited code to send tweets to your event hub:
With HDInsight, your data is stored in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. If you plan to work on the next tutorial immediately, you might want to keep the cluster, otherwise go ahead, and delete the cluster.
247
+
With HDInsight, your data is stored in Azure Storage or Azure Data Lake Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. If you plan to work on the next tutorial immediately, you might want to keep the cluster, otherwise go ahead, and delete the cluster.
249
248
250
249
Open the cluster in the Azure portal, and select **Delete**.
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight Spark cluster, and the default storage account.
0 commit comments