|
2 | 2 | title: Apache Storm with Python components - Azure HDInsight
|
3 | 3 | description: Learn how to create an Apache Storm topology that uses Python components in Azure HDInsight
|
4 | 4 | author: hrasheed-msft
|
| 5 | +ms.author: hrasheed |
5 | 6 | ms.reviewer: jasonh
|
6 |
| -keywords: apache storm python |
7 |
| - |
8 | 7 | ms.service: hdinsight
|
9 |
| -ms.custom: hdinsightactive,hdiseo17may2017 |
10 | 8 | ms.topic: conceptual
|
11 |
| -ms.date: 04/30/2018 |
12 |
| -ms.author: hrasheed |
13 |
| - |
| 9 | +ms.custom: hdinsightactive,hdiseo17may2017 |
| 10 | +ms.date: 12/16/2019 |
14 | 11 | ---
|
| 12 | + |
15 | 13 | # Develop Apache Storm topologies using Python on HDInsight
|
16 | 14 |
|
17 | 15 | Learn how to create an [Apache Storm](https://storm.apache.org/) topology that uses Python components. Apache Storm supports multiple languages, even allowing you to combine components from several languages in one topology. The [Flux](https://storm.apache.org/releases/current/flux.html) framework (introduced with Storm 0.10.0) allows you to easily create solutions that use Python components.
|
18 | 16 |
|
19 | 17 | > [!IMPORTANT]
|
20 |
| -> The information in this document was tested using Storm on HDInsight 3.6. |
21 |
| -
|
22 |
| -The code for this project is available at [https://github.com/Azure-Samples/hdinsight-python-storm-wordcount](https://github.com/Azure-Samples/hdinsight-python-storm-wordcount). |
| 18 | +> The information in this document was tested using Storm on HDInsight 3.6. |
23 | 19 |
|
24 | 20 | ## Prerequisites
|
25 | 21 |
|
26 |
| -* Python 2.7 or higher |
| 22 | +* An Apache Storm cluster on HDInsight. See [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md) and select **Storm** for **Cluster type**. |
| 23 | + |
| 24 | +* A local Storm development environment (Optional). A local Storm environment is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/current/Setting-up-development-environment.html). |
27 | 25 |
|
28 |
| -* Java JDK 1.8 or higher |
| 26 | +* [Python 2.7 or higher](https://www.python.org/downloads/). |
29 | 27 |
|
30 |
| -* [Apache Maven 3](https://maven.apache.org/download.cgi) |
| 28 | +* [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks). |
31 | 29 |
|
32 |
| -* (Optional) A local Storm development environment. A local Storm environment is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/current/Setting-up-development-environment.html). |
| 30 | +* [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects. |
33 | 31 |
|
34 | 32 | ## Storm multi-language support
|
35 | 33 |
|
@@ -67,80 +65,79 @@ Flux expects the Python scripts to be in the `/resources` directory inside the j
|
67 | 65 | </resource>
|
68 | 66 | ```
|
69 | 67 |
|
70 |
| -As mentioned earlier, there is a `storm.py` file that implements the Thrift definition for Storm. The Flux framework includes `storm.py` automatically when the project is built, so you don't have to worry about including it. |
| 68 | +As mentioned earlier, there's a `storm.py` file that implements the Thrift definition for Storm. The Flux framework includes `storm.py` automatically when the project is built, so you don't have to worry about including it. |
71 | 69 |
|
72 | 70 | ## Build the project
|
73 | 71 |
|
74 |
| -From the root of the project, use the following command: |
75 |
| - |
76 |
| -```bash |
77 |
| -mvn clean compile package |
78 |
| -``` |
79 |
| - |
80 |
| -This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file that contains the compiled topology. |
81 |
| - |
82 |
| -## Run the topology locally |
| 72 | +1. Download the project from [https://github.com/Azure-Samples/hdinsight-python-storm-wordcount](https://github.com/Azure-Samples/hdinsight-python-storm-wordcount). |
83 | 73 |
|
84 |
| -To run the topology locally, use the following command: |
| 74 | +1. Open a command prompt and navigate to the project root: `hdinsight-python-storm-wordcount-master`. Enter the following command: |
85 | 75 |
|
86 |
| -```bash |
87 |
| -storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml |
88 |
| -``` |
89 |
| - |
90 |
| -> [!NOTE] |
91 |
| -> This command requires a local Storm development environment. For more information, see [Setting up a development environment](https://storm.apache.org/releases/current/Setting-up-development-environment.html) |
| 76 | + ```cmd |
| 77 | + mvn clean compile package |
| 78 | + ``` |
92 | 79 |
|
93 |
| -Once the topology starts, it emits information to the local console similar to the following text: |
| 80 | + This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file that contains the compiled topology. |
94 | 81 |
|
| 82 | +## Run the Storm topology on HDInsight |
95 | 83 |
|
96 |
| - 24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon |
97 |
| - 24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the |
98 |
| - 24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160 |
99 |
| - 24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599} |
100 |
| - 24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302} |
101 |
| - 24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143} |
102 |
| - 24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon |
103 |
| - 24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow |
104 |
| - 24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160} |
| 84 | +1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command: |
105 | 85 |
|
| 86 | + ```cmd |
| 87 | + scp target/WordCount-1.0-SNAPSHOT.jar [email protected]: |
| 88 | + ``` |
106 | 89 |
|
107 |
| -To stop the topology, use __Ctrl + C__. |
| 90 | +1. Once the file has been uploaded, connect to the cluster using SSH: |
108 | 91 |
|
109 |
| -## Run the Storm topology on HDInsight |
| 92 | + ```cmd |
| 93 | + |
| 94 | + ``` |
110 | 95 |
|
111 |
| -1. Use the following command to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster: |
| 96 | +1. From the SSH session, use the following command to start the topology on the cluster: |
112 | 97 |
|
113 | 98 | ```bash
|
114 |
| - scp target\WordCount-1.0-SNAPSHOT.jar [email protected] |
| 99 | + storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml |
115 | 100 | ```
|
116 | 101 |
|
117 |
| - Replace `sshuser` with the SSH user for your cluster. Replace `mycluster` with the cluster name. You may be prompted to enter the password for the SSH user. |
| 102 | + Once started, a Storm topology runs until stopped. |
118 | 103 |
|
119 |
| - For more information on using SSH and SCP, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md). |
| 104 | +1. Use the Storm UI to view the topology on the cluster. The Storm UI is located at `https://CLUSTERNAME.azurehdinsight.net/stormui`. Replace `CLUSTERNAME` with your cluster name. |
120 | 105 |
|
121 |
| -2. Once the file has been uploaded, connect to the cluster using SSH: |
| 106 | +1. Stop the Storm topology. Use the following command to stop the topology on the cluster: |
122 | 107 |
|
123 | 108 | ```bash
|
124 |
| - |
| 109 | + storm kill wordcount |
125 | 110 | ```
|
126 | 111 |
|
127 |
| -3. From the SSH session, use the following command to start the topology on the cluster: |
| 112 | + Alternatively, you can use the Storm UI. Under **Topology actions** for the topology, select **Kill**. |
128 | 113 |
|
129 |
| - ```bash |
130 |
| - storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml |
131 |
| - ``` |
| 114 | +## Run the topology locally |
132 | 115 |
|
133 |
| -3. You can use the Storm UI to view the topology on the cluster. The Storm UI is located at https://mycluster.azurehdinsight.net/stormui. Replace `mycluster` with your cluster name. |
| 116 | +To run the topology locally, use the following command: |
| 117 | + |
| 118 | +```bash |
| 119 | +storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml |
| 120 | +``` |
134 | 121 |
|
135 | 122 | > [!NOTE]
|
136 |
| -> Once started, a Storm topology runs until stopped. To stop the topology, use one of the following methods: |
137 |
| -> |
138 |
| -> * The `storm kill TOPOLOGYNAME` command from the command line |
139 |
| -> * The **Kill** button in the Storm UI. |
| 123 | +> This command requires a local Storm development environment. For more information, see [Setting up a development environment](https://storm.apache.org/releases/current/Setting-up-development-environment.html). |
140 | 124 |
|
| 125 | +Once the topology starts, it emits information to the local console similar to the following text: |
141 | 126 |
|
142 |
| -## Next steps |
| 127 | +```output |
| 128 | +24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon |
| 129 | +24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the |
| 130 | +24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160 |
| 131 | +24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599} |
| 132 | +24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302} |
| 133 | +24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143} |
| 134 | +24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon |
| 135 | +24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow |
| 136 | +24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160} |
| 137 | +``` |
143 | 138 |
|
144 |
| -See the following documents for other ways to use Python with HDInsight: |
| 139 | +To stop the topology, use __Ctrl + C__. |
| 140 | + |
| 141 | +## Next steps |
145 | 142 |
|
146 |
| -* [How to use Python User Defined Functions (UDF) in Apache Pig and Apache Hive](../hadoop/python-udf-hdinsight.md) |
| 143 | +See the following documents for other ways to use Python with HDInsight: [How to use Python User Defined Functions (UDF) in Apache Pig and Apache Hive](../hadoop/python-udf-hdinsight.md). |
0 commit comments