Skip to content

Commit e39bbe2

Browse files
authored
Merge pull request #180297 from whhender/update-hdinsight-overview
Adding links, updating formatting
2 parents d20c00a + cf1e266 commit e39bbe2

File tree

1 file changed

+24
-26
lines changed

1 file changed

+24
-26
lines changed

articles/hdinsight/hdinsight-overview.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,41 +4,52 @@ description: An introduction to HDInsight, and the Apache Hadoop and Apache Spar
44
ms.service: hdinsight
55
ms.topic: overview
66
ms.custom: contperf-fy21q1
7-
ms.date: 08/21/2020
7+
ms.date: 11/17/2021
88
#Customer intent: As a data analyst, I want understand what is Hadoop and how it is offered in Azure HDInsight so that I can decide on using HDInsight instead of on premises clusters.
99
---
1010

1111
# What is Azure HDInsight?
1212

13-
Azure HDInsight is a managed, full-spectrum, open-source analytics service in the cloud for enterprises. You can use open-source frameworks such as Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more.
13+
Azure HDInsight is a managed, full-spectrum, open-source analytics service in the cloud for enterprises. With HDInsight, you can use open-source frameworks such as Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more, in your Azure environment.
1414

1515
## What is HDInsight and the Hadoop technology stack?
1616

17-
Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.
17+
Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data in a customizable environment. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.
1818

1919
To see available Hadoop technology stack components on HDInsight, see [Components and versions available with HDInsight](./hdinsight-component-versioning.md). To read more about Hadoop in HDInsight, see the [Azure features page for HDInsight](https://azure.microsoft.com/services/hdinsight/).
2020

21-
## What is big data?
22-
23-
Big data is collected in escalating volumes, at higher velocities, and in a greater variety of formats than ever before. It can be historical (meaning stored) or real time (meaning streamed from the source). See [Scenarios for using HDInsight](#scenarios-for-using-hdinsight) to learn about the most common use cases for big data.
24-
2521
## Why should I use Azure HDInsight?
2622

27-
This section lists the capabilities of Azure HDInsight.
28-
2923
|Capability |Description |
3024
|---------|---------|
3125
|Cloud native | Azure HDInsight enables you to create optimized clusters for Hadoop, Spark, [Interactive query (LLAP)](./interactive-query/apache-interactive-query-get-started.md), Kafka, Storm, HBase on Azure. HDInsight also provides an end-to-end SLA on all your production workloads. |
3226
|Low-cost and scalable | HDInsight enables you to scale workloads up or down. You can reduce costs by creating clusters on demand and paying only for what you use. You can also build data pipelines to operationalize your jobs. Decoupled compute and storage provide better performance and flexibility. |
3327
|Secure and compliant | HDInsight enables you to protect your enterprise data assets with Azure Virtual Network, encryption, and integration with Azure Active Directory. HDInsight also meets the most popular industry and government compliance standards. |
3428
|Monitoring | Azure HDInsight integrates with Azure Monitor logs to provide a single interface with which you can monitor all your clusters. |
35-
|Global availability | HDInsight is available in more regions than any other big data analytics offering. Azure HDInsight is also available in Azure Government, China, and Germany, which allows you to meet your enterprise needs in key sovereign areas. |
29+
|Global availability | HDInsight is available in more regions than any other [big data](#what-is-big-data) analytics offering. Azure HDInsight is also available in Azure Government, China, and Germany, which allows you to meet your enterprise needs in key sovereign areas. |
3630
|Productivity | Azure HDInsight enables you to use rich productive tools for Hadoop and Spark with your preferred development environments. These development environments include Visual Studio, VSCode, Eclipse, and IntelliJ for Scala, Python, R, Java, and .NET support. Data scientists can also collaborate using popular notebooks such as Jupyter and Zeppelin. |
37-
|Extensibility | You can extend the HDInsight clusters with installed components (Hue, Presto, and so on) by using script actions, by adding edge nodes, or by integrating with other big data certified applications. HDInsight enables seamless integration with the most popular big data solutions with a one-click deployment.|
31+
|Extensibility | You can extend the HDInsight clusters with installed components (Hue, Presto, and so on) by using script actions, by adding edge nodes, or by integrating with other [big data](#what-is-big-data) certified applications. HDInsight enables seamless integration with the most popular [big data](#what-is-big-data) solutions with a one-click deployment.|
32+
33+
### What is big data?
34+
35+
Big data is collected in escalating volumes, at higher velocities, and in a greater variety of formats than ever before. It can be historical (meaning stored) or real time (meaning streamed from the source). See [Scenarios for using HDInsight](#scenarios-for-using-hdinsight) to learn about the most common use cases for big data.
36+
37+
## Cluster types in HDInsight
38+
39+
HDInsight includes specific cluster types and cluster customization capabilities, such as the capability to add components, utilities, and languages. HDInsight offers the following cluster types:
40+
41+
|Cluster Type | Description | Get Started |
42+
|---|---|---|
43+
|[Apache Hadoop](./hadoop/apache-hadoop-introduction.md)|A framework that uses HDFS, YARN resource management, and a simple MapReduce programming model to process and analyze batch data in parallel.| [Create an Apache Hadoop cluster](hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md)
44+
|[Apache Spark](./spark/apache-spark-overview.md)|An open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. See [What is Apache Spark in HDInsight?](./spark/apache-spark-overview.md).|[Create an Apache Spark cluster](spark/apache-spark-jupyter-spark-sql-use-portal.md)
45+
|[Apache HBase](./hbase/apache-hbase-overview.md)|A NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data--potentially billions of rows times millions of columns. See [What is HBase on HDInsight?](./hbase/apache-hbase-overview.md)|[Create an Apache HBase cluster](hbase/quickstart-resource-manager-template.md)
46+
|[Apache Storm](./storm/apache-storm-overview.md)|A distributed, real-time computation system for processing large streams of data fast. Storm is offered as a managed cluster in HDInsight. See [Analyze real-time sensor data using Storm and Hadoop](./storm/apache-storm-overview.md).|[Create an Apache Storm topology](storm/apache-storm-quickstart.md)
47+
|[Apache Interactive Query](./interactive-query/apache-interactive-query-get-started.md)|In-memory caching for interactive and faster Hive queries. See [Use Interactive Query in HDInsight](./interactive-query/apache-interactive-query-get-started.md).|[Create an Interactive Query cluster](interactive-query/quickstart-resource-manager-template.md)
48+
|[Apache Kafka](./kafka/apache-kafka-introduction.md)|An open-source platform that's used for building streaming data pipelines and applications. Kafka also provides message-queue functionality that allows you to publish and subscribe to data streams. See [Introduction to Apache Kafka on HDInsight](./kafka/apache-kafka-introduction.md).|[Create an Apache Kafka cluster](kafka/apache-kafka-get-started.md)
3849

3950
## Scenarios for using HDInsight
4051

41-
Azure HDInsight can be used for a variety of scenarios in big data processing. It can be historical data (data that's already collected and stored) or real-time data (data that's directly streamed from the source). The scenarios for processing such data can be summarized in the following categories:
52+
Azure HDInsight can be used for a variety of scenarios in [big data](#what-is-big-data) processing. It can be historical data (data that's already collected and stored) or real-time data (data that's directly streamed from the source). The scenarios for processing such data can be summarized in the following categories:
4253

4354
### Batch processing (ETL)
4455

@@ -64,23 +75,10 @@ You can use HDInsight to build applications that extract critical insights from
6475

6576
### Hybrid
6677

67-
You can use HDInsight to extend your existing on-premises big data infrastructure to Azure to leverage the advanced analytics capabilities of the cloud.
78+
You can use HDInsight to extend your existing on-premises [big data](#what-is-big-data) infrastructure to Azure to leverage the advanced analytics capabilities of the cloud.
6879

6980
:::image type="content" source="./hadoop/media/apache-hadoop-introduction/hdinsight-architecture-hybrid.png" alt-text="HDInsight architecture: Hybrid":::
7081

71-
## Cluster types in HDInsight
72-
73-
HDInsight includes specific cluster types and cluster customization capabilities, such as the capability to add components, utilities, and languages. HDInsight offers the following cluster types:
74-
75-
|Cluster Type | Description |
76-
|---|---|
77-
|[Apache Hadoop](./hadoop/apache-hadoop-introduction.md)|A framework that uses HDFS, YARN resource management, and a simple MapReduce programming model to process and analyze batch data in parallel.|
78-
|[Apache Spark](./spark/apache-spark-overview.md)|An open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. See [What is Apache Spark in HDInsight?](./spark/apache-spark-overview.md).|
79-
|[Apache HBase](./hbase/apache-hbase-overview.md)|A NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data--potentially billions of rows times millions of columns. See [What is HBase on HDInsight?](./hbase/apache-hbase-overview.md)|
80-
|[Apache Storm](./storm/apache-storm-overview.md)|A distributed, real-time computation system for processing large streams of data fast. Storm is offered as a managed cluster in HDInsight. See [Analyze real-time sensor data using Storm and Hadoop](./storm/apache-storm-overview.md).|
81-
|[Apache Interactive Query](./interactive-query/apache-interactive-query-get-started.md)|In-memory caching for interactive and faster Hive queries. See [Use Interactive Query in HDInsight](./interactive-query/apache-interactive-query-get-started.md).|
82-
|[Apache Kafka](./kafka/apache-kafka-introduction.md)|An open-source platform that's used for building streaming data pipelines and applications. Kafka also provides message-queue functionality that allows you to publish and subscribe to data streams. See [Introduction to Apache Kafka on HDInsight](./kafka/apache-kafka-introduction.md).|
83-
8482
## Open-source components in HDInsight
8583

8684
Azure HDInsight enables you to create clusters with open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, HBase, and R. These clusters, by default, come with other open-source components that are included on the cluster such as Apache Ambari5, Avro5, Apache Hive3, HCatalog2, Apache Mahout2, Apache Hadoop MapReduce3, Apache Hadoop YARN2, Apache Phoenix3, Apache Pig3, Apache Sqoop3, Apache Tez3, Apache Oozie2, and Apache ZooKeeper5.

0 commit comments

Comments
 (0)