You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/storm/apache-storm-overview.md
+11-19Lines changed: 11 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,20 +7,20 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: overview
9
9
ms.custom: hdinsightactive,hdiseo17may2017
10
-
ms.date: 03/02/2020
10
+
ms.date: 04/20/2020
11
11
12
12
#Customer intent: As a developer, I want to understand how Storm on HDInsight is different from Storm on other platforms.
13
13
---
14
14
15
15
# What is Apache Storm on Azure HDInsight?
16
16
17
-
[Apache Storm](https://storm.apache.org/) is a distributed, fault-tolerant, open-source computation system. You can use Storm to process streams of data in real time with [Apache Hadoop](https://hadoop.apache.org/). Storm solutions can also provide guaranteed processing of data, with the ability to replay data that wasn't successfully processed the first time.
17
+
[Apache Storm](https://storm.apache.org/) is a distributed, fault-tolerant, open-source computation system. You can use Storm to process streams of data in real time with [Apache Hadoop](../hadoop/apache-hadoop-introduction.md). Storm solutions can also provide guaranteed processing of data, with the ability to replay data that wasn't successfully processed the first time.
18
18
19
19
## Why use Apache Storm on HDInsight?
20
20
21
21
Storm on HDInsight provides the following features:
22
22
23
-
*__99% Service Level Agreement (SLA) on Storm uptime__: For more information, see the[SLA information for HDInsight](https://azure.microsoft.com/support/legal/sla/hdinsight/v1_0/) document.
23
+
*__99% Service Level Agreement (SLA) on Storm uptime__: Storm on HDInsight comes with full continuous support. Storm on HDInsight also has an SLA of 99.9 percent. That means Microsoft guarantees that a Storm cluster has external connectivity at least 99.9 percent of the time. For more information, see [Azure support](https://azure.microsoft.com/support/options/). See also,[SLA information for HDInsight](https://azure.microsoft.com/support/legal/sla/hdinsight/v1_0/) document.
24
24
25
25
* Supports easy customization by running scripts against a Storm cluster during or after creation. For more information, see [Customize HDInsight clusters using script action](../hdinsight-hadoop-customize-cluster-linux.md).
26
26
@@ -30,9 +30,9 @@ Storm on HDInsight provides the following features:
30
30
31
31
* Supports the Trident Java interface. You can create Storm topologies that support exactly once processing of messages, transactional datastore persistence, and a set of common stream analytics operations.
32
32
33
-
***Dynamic scaling**: You can add or remove worker nodes with no impact to running Storm topologies. You must deactivate and reactivate running topologies to take advantage of new nodes added through scaling operations.
33
+
***Dynamic scaling**: You can add or remove worker nodes with no impact to running Storm topologies. Deactivate and reactivate running topologies to take advantage of new nodes added through scaling operations.
34
34
35
-
***Create streaming pipelines using multiple Azure services**: Storm on HDInsight integrates with other Azure services such as Event Hubs, SQL Database, Azure Storage, and Azure Data Lake Storage. For an example solution that integrates with Azure services, see [Process events from Event Hubs with Apache Storm on HDInsight](https://github.com/Azure-Samples/hdinsight-java-storm-eventhub).
35
+
***Create streaming pipelines using multiple Azure services**: Storm on HDInsight integrates with other Azure services. Such as Event Hubs, SQL Database, Azure Storage, and Azure Data Lake Storage. For an example solution that integrates with Azure services, see [Process events from Event Hubs with Apache Storm on HDInsight](https://github.com/Azure-Samples/hdinsight-java-storm-eventhub).
36
36
37
37
For a list of companies that are using Apache Storm for their real-time analytics solutions, see [Companies using Apache Storm](https://storm.apache.org/Powered-By.html).
38
38
@@ -52,16 +52,12 @@ Storm runs topologies instead of the [Apache Hadoop MapReduce](https://hadoop.ap
52
52
53
53
Apache Storm guarantees that each incoming message is always fully processed, even when the data analysis is spread over hundreds of nodes.
54
54
55
-
The Nimbus node provides functionality similar to the Apache Hadoop JobTracker, and it assigns tasks to other nodes in a cluster through [Apache ZooKeeper](https://zookeeper.apache.org/). Zookeeper nodes provide coordination for a cluster and facilitate communication between Nimbus and the Supervisor process on the worker nodes. If one processing node goes down, the Nimbus node is informed, and it assigns the task and associated data to another node.
55
+
The Nimbus node provides functionality similar to the Apache Hadoop JobTracker. Nimbus assigns tasks to other nodes in a cluster through Apache ZooKeeper. Zookeeper nodes provide coordination for a cluster and assist communication between Nimbus and the Supervisor process on the worker nodes. If one processing node goes down, the Nimbus node is informed, and it assigns the task and associated data to another node.
56
56
57
57
The default configuration for Apache Storm clusters is to have only one Nimbus node. Storm on HDInsight provides two Nimbus nodes. If the primary node fails, the Storm cluster switches to the secondary node while the primary node is recovered. The following diagram illustrates the task flow configuration for Storm on HDInsight:
58
58
59
59

60
60
61
-
## Ease of creation
62
-
63
-
You can create a new Storm cluster on HDInsight in minutes. For more information on creating a Storm cluster, see [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md).
64
-
65
61
## Ease of use
66
62
67
63
|Use |Description |
@@ -73,7 +69,7 @@ You can create a new Storm cluster on HDInsight in minutes. For more information
73
69
74
70
## Integration with other Azure services
75
71
76
-
*__Azure Data Lake Storage__: For an example of using Data Lake Storage with a Storm cluster, see[Use Azure Data Lake Storage with Apache Storm on HDInsight](apache-storm-write-data-lake-store.md).
72
+
*__Azure Data Lake Storage__: See[Use Azure Data Lake Storage with Apache Storm on HDInsight](apache-storm-write-data-lake-store.md).
77
73
78
74
*__Event Hubs__: For an example of using Event Hubs with a Storm cluster, see the following examples:
79
75
@@ -83,10 +79,6 @@ You can create a new Storm cluster on HDInsight in minutes. For more information
83
79
84
80
*__SQL Database__, __Cosmos DB__, __Event Hubs__, and __HBase__: Template examples are included in the Data Lake Tools for Visual Studio. For more information, see [Develop a C# topology for Apache Storm on HDInsight](apache-storm-develop-csharp-visual-studio-topology.md).
85
81
86
-
## Support
87
-
88
-
Storm on HDInsight comes with full enterprise-level continuous support. Storm on HDInsight also has an SLA of 99.9 percent. That means Microsoft guarantees that a Storm cluster has external connectivity at least 99.9 percent of the time. For more information, see [Azure support](https://azure.microsoft.com/support/options/).
89
-
90
82
## Apache Storm use cases
91
83
92
84
The following are some common scenarios for which you might use Storm on HDInsight:
@@ -113,15 +105,15 @@ Python can also be used to develop Storm components. For more information, see [
113
105
114
106
### Guaranteed message processing
115
107
116
-
Apache Storm can provide different levels of guaranteed message processing. For example, a basic Storm application can guarantee at-least-once processing, and [Trident](https://storm.apache.org/releases/current/Trident-API-Overview.html) can guarantee exactly once processing. For more information, see[Guarantees on data processing](https://storm.apache.org/about/guarantees-data-processing.html) at apache.org.
108
+
Apache Storm can provide different levels of guaranteed message processing. For example, a basic Storm application guarantees at-least-once processing, and Trident can guarantee exactly once processing. See[Guarantees on data processing](https://storm.apache.org/about/guarantees-data-processing.html) at apache.org.
117
109
118
110
### IBasicBolt
119
111
120
-
The pattern of reading an input tuple, emitting zero or more tuples, and then acknowledging the input tuple immediately at the end of the execute method is common. Storm provides the [IBasicBolt](https://storm.apache.org/releases/current/javadocs/org/apache/storm/topology/IBasicBolt.html) interface to automate this pattern.
112
+
The pattern of reading an input tuple, emitting zero or more tuples, and then confirming the input tuple immediately at the end of the execute method is common. Storm provides the [IBasicBolt](https://storm.apache.org/releases/current/javadocs/org/apache/storm/topology/IBasicBolt.html) interface to automate this pattern.
121
113
122
114
### Joins
123
115
124
-
How data streams are joined varies between applications. For example, you can join each tuple from multiple streams into one new stream, or you can join only batches of tuples for a specific window. Either way, joining can be accomplished by using [fieldsGrouping](https://storm.apache.org/releases/current/javadocs/org/apache/storm/topology/InputDeclarer.html#fieldsGrouping-java.lang.String-org.apache.storm.tuple.Fields-). Field grouping is a way of defining how tuples are routed to bolts.
116
+
How data streams are joined varies between applications. For example, you can join each tuple from multiple streams into one new stream, or join only batches of tuples for a specific window. Either way, joining can be accomplished by using [fieldsGrouping](https://storm.apache.org/releases/current/javadocs/org/apache/storm/topology/InputDeclarer.html#fieldsGrouping-java.lang.String-org.apache.storm.tuple.Fields-). Field grouping is a way of defining how tuples are routed to bolts.
125
117
126
118
In the following Java example, fieldsGrouping is used to route tuples that originate from components "1", "2", and "3" to the MyJoiner bolt:
127
119
@@ -147,7 +139,7 @@ For an example of calculating a top N value, see the [RollingTopWords](https://g
147
139
148
140
## Logging
149
141
150
-
Storm uses [Apache Log4j 2](https://logging.apache.org/log4j/2.x/) to log information. By default, a large amount of data is logged, and it can be difficult to sort through the information. You can include a logging configuration file as part of your Storm topology to control logging behavior.
142
+
Storm uses Apache Log4j 2 to log information. By default, a large amount of data is logged, and it can be difficult to sort through the information. You can include a logging configuration file as part of your Storm topology to control logging behavior.
151
143
152
144
For an example topology that demonstrates how to configure logging, see [Java-based WordCount](apache-storm-develop-java-topology.md) example for Storm on HDInsight.
0 commit comments