Skip to content

Commit ce29cd7

Browse files
committed
edit pass: apache-hadoop-etl-at-scale
1 parent fabff66 commit ce29cd7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/hdinsight/hadoop/apache-hadoop-etl-at-scale.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Source data files are typically loaded into a location on Azure Storage or Azure
5353

5454
### Azure Storage
5555

56-
Azure Storage has specific adaptability targets. See [Scalability and performance targets for Blob storage](../../storage/blobs/scalability-targets.md) for more information. For most analytic nodes, Azure Storage scales best when dealing with many smaller files. As long as you're within your account limits, Azure Storage guarantees the same performance, no matter how large the files are. You can store terabytes of data and still get consistent performance. This is true whether you're using a subset or all of the data.
56+
Azure Storage has specific adaptability targets. See [Scalability and performance targets for Blob storage](../../storage/blobs/scalability-targets.md) for more information. For most analytic nodes, Azure Storage scales best when dealing with many smaller files. As long as you're within your account limits, Azure Storage guarantees the same performance, no matter how large the files are. You can store terabytes of data and still get consistent performance. This statement is true whether you're using a subset or all of the data.
5757

5858
Azure Storage has several different types of blobs. An *append blob* is a great option for storing web logs or sensor data.
5959

@@ -83,15 +83,15 @@ For uploading datasets in the terabyte range, network latency can be a major pro
8383

8484
Azure SQL Data Warehouse is a great choice to store prepared results. Azure HDInsight can be used to perform those services for SQL Data Warehouse.
8585

86-
Azure SQL Data Warehouse is a relational database store optimized for analytic workloads. It scales based on partitioned tables. Tables can be partitioned across multiple nodes. The nodes are selected at the time of creation. They can scale after the fact, but that's an active process which might require data movement. For more information, see [SQL Data Warehouse - Manage Compute](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-manage-compute-overview.md).
86+
Azure SQL Data Warehouse is a relational database store optimized for analytic workloads. It scales based on partitioned tables. Tables can be partitioned across multiple nodes. The nodes are selected at the time of creation. They can scale after the fact, but that's an active process that might require data movement. For more information, see [SQL Data Warehouse - Manage Compute](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-manage-compute-overview.md).
8787

8888
### Apache HBase
8989

9090
Apache HBase is a key-value store available in Azure HDInsight. It's an open-source, NoSQL database that's built on Hadoop and modeled after Google BigTable. HBase provides performant random access and strong consistency for large amounts of unstructured and semi-structured data.
9191

9292
Because HBase is a schemaless database, columns and data types don't need to be defined before using them. Data is stored in the rows of a table, and is grouped by column family.
9393

94-
The open-source code scales linearly to handle petabytes of data on thousands of nodes. HBase can rely on data redundancy, batch processing, and other features which are provided by distributed applications in the Hadoop environment.
94+
The open-source code scales linearly to handle petabytes of data on thousands of nodes. HBase can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop environment.
9595

9696
HBase is an excellent destination for sensor and log data for future analysis.
9797

0 commit comments

Comments
 (0)