Skip to content

Commit 243dadd

Browse files
committed
freshness118
1 parent f3f0fcf commit 243dadd

File tree

1 file changed

+57
-60
lines changed

1 file changed

+57
-60
lines changed

articles/hdinsight/storm/apache-storm-develop-python-topology.md

Lines changed: 57 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,32 @@
22
title: Apache Storm with Python components - Azure HDInsight
33
description: Learn how to create an Apache Storm topology that uses Python components in Azure HDInsight
44
author: hrasheed-msft
5+
ms.author: hrasheed
56
ms.reviewer: jasonh
6-
keywords: apache storm python
7-
87
ms.service: hdinsight
9-
ms.custom: hdinsightactive,hdiseo17may2017
108
ms.topic: conceptual
11-
ms.date: 04/30/2018
12-
ms.author: hrasheed
13-
9+
ms.custom: hdinsightactive,hdiseo17may2017
10+
ms.date: 12/16/2019
1411
---
12+
1513
# Develop Apache Storm topologies using Python on HDInsight
1614

1715
Learn how to create an [Apache Storm](https://storm.apache.org/) topology that uses Python components. Apache Storm supports multiple languages, even allowing you to combine components from several languages in one topology. The [Flux](https://storm.apache.org/releases/current/flux.html) framework (introduced with Storm 0.10.0) allows you to easily create solutions that use Python components.
1816

1917
> [!IMPORTANT]
20-
> The information in this document was tested using Storm on HDInsight 3.6.
21-
22-
The code for this project is available at [https://github.com/Azure-Samples/hdinsight-python-storm-wordcount](https://github.com/Azure-Samples/hdinsight-python-storm-wordcount).
18+
> The information in this document was tested using Storm on HDInsight 3.6.
2319
2420
## Prerequisites
2521

26-
* Python 2.7 or higher
22+
* An Apache Storm cluster on HDInsight. See [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md) and select **Storm** for **Cluster type**.
23+
24+
* A local Storm development environment (Optional). A local Storm environment is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/current/Setting-up-development-environment.html).
2725

28-
* Java JDK 1.8 or higher
26+
* [Python 2.7 or higher](https://www.python.org/downloads/).
2927

30-
* [Apache Maven 3](https://maven.apache.org/download.cgi)
28+
* [Java Developer Kit (JDK) version 8](https://aka.ms/azure-jdks).
3129

32-
* (Optional) A local Storm development environment. A local Storm environment is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/current/Setting-up-development-environment.html).
30+
* [Apache Maven](https://maven.apache.org/download.cgi) properly [installed](https://maven.apache.org/install.html) according to Apache. Maven is a project build system for Java projects.
3331

3432
## Storm multi-language support
3533

@@ -67,80 +65,79 @@ Flux expects the Python scripts to be in the `/resources` directory inside the j
6765
</resource>
6866
```
6967

70-
As mentioned earlier, there is a `storm.py` file that implements the Thrift definition for Storm. The Flux framework includes `storm.py` automatically when the project is built, so you don't have to worry about including it.
68+
As mentioned earlier, there's a `storm.py` file that implements the Thrift definition for Storm. The Flux framework includes `storm.py` automatically when the project is built, so you don't have to worry about including it.
7169

7270
## Build the project
7371

74-
From the root of the project, use the following command:
75-
76-
```bash
77-
mvn clean compile package
78-
```
79-
80-
This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file that contains the compiled topology.
81-
82-
## Run the topology locally
72+
1. Download the project from [https://github.com/Azure-Samples/hdinsight-python-storm-wordcount](https://github.com/Azure-Samples/hdinsight-python-storm-wordcount).
8373

84-
To run the topology locally, use the following command:
74+
1. Open a command prompt and navigate to the project root: `hdinsight-python-storm-wordcount-master`. Enter the following command:
8575

86-
```bash
87-
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml
88-
```
89-
90-
> [!NOTE]
91-
> This command requires a local Storm development environment. For more information, see [Setting up a development environment](https://storm.apache.org/releases/current/Setting-up-development-environment.html)
76+
```cmd
77+
mvn clean compile package
78+
```
9279

93-
Once the topology starts, it emits information to the local console similar to the following text:
80+
This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file that contains the compiled topology.
9481

82+
## Run the Storm topology on HDInsight
9583

96-
24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
97-
24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the
98-
24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160
99-
24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599}
100-
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302}
101-
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143}
102-
24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
103-
24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow
104-
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160}
84+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
10585

86+
```cmd
87+
scp target/WordCount-1.0-SNAPSHOT.jar [email protected]:
88+
```
10689

107-
To stop the topology, use __Ctrl + C__.
90+
1. Once the file has been uploaded, connect to the cluster using SSH:
10891

109-
## Run the Storm topology on HDInsight
92+
```cmd
93+
94+
```
11095

111-
1. Use the following command to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster:
96+
1. From the SSH session, use the following command to start the topology on the cluster:
11297

11398
```bash
114-
scp target\WordCount-1.0-SNAPSHOT.jar [email protected]
99+
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml
115100
```
116101

117-
Replace `sshuser` with the SSH user for your cluster. Replace `mycluster` with the cluster name. You may be prompted to enter the password for the SSH user.
102+
Once started, a Storm topology runs until stopped.
118103

119-
For more information on using SSH and SCP, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
104+
1. Use the Storm UI to view the topology on the cluster. The Storm UI is located at `https://CLUSTERNAME.azurehdinsight.net/stormui`. Replace `CLUSTERNAME` with your cluster name.
120105

121-
2. Once the file has been uploaded, connect to the cluster using SSH:
106+
1. Stop the Storm topology. Use the following command to stop the topology on the cluster:
122107

123108
```bash
124-
109+
storm kill wordcount
125110
```
126111

127-
3. From the SSH session, use the following command to start the topology on the cluster:
112+
Alternatively, you can use the Storm UI. Under **Topology actions** for the topology, select **Kill**.
128113

129-
```bash
130-
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml
131-
```
114+
## Run the topology locally
132115

133-
3. You can use the Storm UI to view the topology on the cluster. The Storm UI is located at https://mycluster.azurehdinsight.net/stormui. Replace `mycluster` with your cluster name.
116+
To run the topology locally, use the following command:
117+
118+
```bash
119+
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml
120+
```
134121

135122
> [!NOTE]
136-
> Once started, a Storm topology runs until stopped. To stop the topology, use one of the following methods:
137-
>
138-
> * The `storm kill TOPOLOGYNAME` command from the command line
139-
> * The **Kill** button in the Storm UI.
123+
> This command requires a local Storm development environment. For more information, see [Setting up a development environment](https://storm.apache.org/releases/current/Setting-up-development-environment.html).
140124

125+
Once the topology starts, it emits information to the local console similar to the following text:
141126

142-
## Next steps
127+
```output
128+
24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
129+
24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the
130+
24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160
131+
24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599}
132+
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302}
133+
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143}
134+
24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
135+
24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow
136+
24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160}
137+
```
143138

144-
See the following documents for other ways to use Python with HDInsight:
139+
To stop the topology, use __Ctrl + C__.
140+
141+
## Next steps
145142

146-
* [How to use Python User Defined Functions (UDF) in Apache Pig and Apache Hive](../hadoop/python-udf-hdinsight.md)
143+
See the following documents for other ways to use Python with HDInsight: [How to use Python User Defined Functions (UDF) in Apache Pig and Apache Hive](../hadoop/python-udf-hdinsight.md).

0 commit comments

Comments
 (0)