Skip to content

Commit 5c7223f

Browse files
authored
Merge pull request #112044 from dagiro/freshness_c15
freshness_c15
2 parents 1eb0580 + 7fdc69c commit 5c7223f

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/hdinsight/hbase/apache-hbase-overview.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,22 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: overview
99
ms.custom: hdinsightactive,hdiseo17may2017
10-
ms.date: 03/03/2020
10+
ms.date: 04/20/2020
1111

1212
#Customer intent: As a developer new to Apache HBase and Apache HBase in Azure HDInsight, I want to have a basic understanding of Microsoft's implementation of Apache HBase in Azure HDInsight so I can decide if I want to use it rather than build my own cluster.
1313
---
1414

1515
# What is Apache HBase in Azure HDInsight
1616

17-
[Apache HBase](https://hbase.apache.org/) is an open-source, NoSQL database that is built on [Apache Hadoop](https://hadoop.apache.org/) and modeled after [Google BigTable](https://cloud.google.com/bigtable/). HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families.
17+
[Apache HBase](https://hbase.apache.org/) is an open-source, NoSQL database that is built on Apache Hadoop and modeled after [Google BigTable](https://cloud.google.com/bigtable/). HBase provides random access and strong consistency for large amounts of data in a schemaless database. The database is organized by column families.
1818

19-
From user perspective, HBase is similar to a database. Data is stored in the rows and columns of a table, and data within a row is grouped by column family. HBase is a schemaless database in the sense that neither the columns nor the type of data stored in them need to be defined before using them. The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem.
19+
From user perspective, HBase is similar to a database. Data is stored in the rows and columns of a table, and data within a row is grouped by column family. HBase is a schemaless database. The columns and data types can be undefined before using them. The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop environment.
2020

2121
## How is Apache HBase implemented in Azure HDInsight?
2222

23-
HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in [Azure Storage](./../hdinsight-hadoop-use-blob-storage.md), which provides low latency and increased elasticity in performance and cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure; in particular, they can enable real-time applications to work with large datasets.
23+
HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in [Azure Storage](./../hdinsight-hadoop-use-blob-storage.md), which provides low latency and increased elasticity in performance and cost choices. This property enables customers to build interactive websites that work with large datasets. To build services that store sensor and telemetry data from millions of end points. And to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure. The services can enable real-time applications to work with large datasets.
2424

25-
The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding of tables, strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory caching for reads and high-throughput streaming for writes. HBase cluster can be created inside virtual network. For details, see [Create HDInsight clusters on Azure Virtual Network](./apache-hbase-provision-vnet.md).
25+
The HDInsight implementation uses the scale-out architecture of HBase to provide automatic sharding of tables. And strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory caching for reads and high-throughput streaming for writes. HBase cluster can be created inside virtual network. For details, see [Create HDInsight clusters on Azure Virtual Network](./apache-hbase-provision-vnet.md).
2626

2727
## How is data managed in HDInsight HBase?
2828

@@ -38,9 +38,9 @@ The canonical use case for which BigTable (and by extension, HBase) was created
3838
|Scenario |Description |
3939
|---|---|
4040
|Key-value store|HBase can be used as a key-value store, and it's suitable for managing message systems. Facebook uses HBase for their messaging system, and it's ideal for storing and managing Internet communications. WebTable uses HBase to search for and manage tables that are extracted from webpages.|
41-
|Sensor data|HBase is useful for capturing data that is collected incrementally from various sources. This includes social analytics, time series, keeping interactive dashboards up to date with trends and counters, and managing audit log systems. Examples include Bloomberg trader terminal and the Open Time Series Database (OpenTSDB), which stores and provides access to metrics collected about the health of server systems.|
41+
|Sensor data|HBase is useful for capturing data that is collected incrementally from various sources. This data includes social analytics, and time series. And keeping interactive dashboards up to date with trends and counters, and managing audit log systems. Examples include Bloomberg trader terminal and the Open Time Series Database (OpenTSDB). OpenTSDB stores and provides access to metrics collected about the health of server systems.|
4242
|Real-time query|[Apache Phoenix](https://phoenix.apache.org/) is a SQL query engine for Apache HBase. It's accessed as a JDBC driver, and it enables querying and managing HBase tables by using SQL.|
43-
|HBase as a platform|Applications can run on top of HBase by using it as a datastore. Examples include Phoenix, [OpenTSDB](http://opentsdb.net/), Kiji, and Titan. Applications can also integrate with HBase. Examples include [Apache Hive](https://hive.apache.org/), [Apache Pig](https://pig.apache.org/), [Solr](https://lucene.apache.org/solr/), [Apache Storm](https://storm.apache.org/), [Apache Flume](https://flume.apache.org/), [Apache Impala](https://impala.apache.org/), [Apache Spark](https://spark.apache.org/) , [Ganglia](http://ganglia.info/), and [Apache Drill](https://drill.apache.org/).|
43+
|HBase as a platform|Applications can run on top of HBase by using it as a datastore. Examples include Phoenix, OpenTSDB, `Kiji`, and Titan. Applications can also integrate with HBase. Examples include: [Apache Hive](https://hive.apache.org/), Apache Pig, [Solr](https://lucene.apache.org/solr/), Apache Storm, Apache Flume, [Apache Impala](https://impala.apache.org/), Apache Spark, `Ganglia`, and Apache Drill.|
4444

4545
## Next steps
4646

0 commit comments

Comments
 (0)