Skip to content

Commit f28b64b

Browse files
author
Sreekanth Iyer (Ushta Te Consultancy Services)
committed
Improved Correctness Score
1 parent 400fb4b commit f28b64b

9 files changed

+16
-16
lines changed

articles/hdinsight/hbase/apache-hbase-backup-replication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ The general steps to set up replication are:
238238
5. Copy existing data from the source tables to the destination tables.
239239
6. Replication automatically copies new data modifications to the source tables into the destination tables.
240240

241-
To enable replication on HDInsight, apply a Script Action to your running source HDInsight cluster. For a walkthrough of enabling replication in your cluster, or to experiment with replication on sample clusters created in virtual networks using Azure Resource Manager templates, see [Configure Apache HBase replication](apache-hbase-replication.md). That article also includes instructions for enabling replication of Phoenix metadata.
241+
To enable replication on HDInsight, apply a Script Action to the running source HDInsight cluster. For a walkthrough of enabling replication in your cluster, or to experiment with replication on sample clusters created in virtual networks using Azure Resource Manager templates, see [Configure Apache HBase replication](apache-hbase-replication.md). That article also includes instructions for enabling replication of Phoenix metadata.
242242

243243
## Next steps
244244

articles/hdinsight/hbase/apache-hbase-migrate-new-version-new-storage-account.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Migrate an HBase cluster to a new version and Storage account - Azure HDInsight
2+
title: Migrate a HBase cluster to a new version and Storage account - Azure HDInsight
33
description: Learn how to migrate an Apache HBase cluster in Azure HDInsight to a newer version with a different Azure Storage account.
44
ms.service: azure-hdinsight
55
ms.topic: how-to

articles/hdinsight/hbase/apache-hbase-phoenix-performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ The most important aspect of [Apache Phoenix](https://phoenix.apache.org/) perfo
1313

1414
## Table schema design
1515

16-
When you create a table in Phoenix, that table is stored in an HBase table. The HBase table contains groups of columns (column families) that are accessed together. A row in the Phoenix table is a row in the HBase table, where each row consists of versioned cells associated with one or more columns. Logically, a single HBase row is a collection of key-value pairs, each having the same rowkey value. That is, each key-value pair has a rowkey attribute, and the value of that rowkey attribute is the same for a particular row.
16+
When you create a table in Phoenix, that table is stored in a HBase table. The HBase table contains groups of columns (column families) that are accessed together. A row in the Phoenix table is a row in the HBase table, where each row consists of versioned cells associated with one or more columns. Logically, a single HBase row is a collection of key-value pairs, each having the same rowkey value. That is, each key-value pair has a rowkey attribute, and the value of that rowkey attribute is the same for a particular row.
1717

1818
The schema design of a Phoenix table includes the primary key design, column family design, individual column design, and how the data is partitioned.
1919

2020
### Primary key design
2121

22-
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
22+
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in a HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
2323

2424
For example, a table for contacts has the first name, last name, phone number, and address, all in the same column family. You could define a primary key based on an increasing sequence number:
2525

@@ -64,7 +64,7 @@ Also, if certain columns tend to be accessed together, put those columns in the
6464

6565
### Column design
6666

67-
* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
67+
* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When you process queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
6868
* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON isn't recommended, as it's larger.
6969
* Consider compressing data before storage to cut latency and I/O costs.
7070

@@ -88,7 +88,7 @@ CREATE TABLE CONTACTS (...) SPLIT ON ('CS','EU','NA')
8888

8989
## Index design
9090

91-
A Phoenix index is an HBase table that stores a copy of some or all of the data from the indexed table. An index improves performance for specific types of queries.
91+
A Phoenix index is a HBase table that stores a copy of some or all of the data from the indexed table. An index improves performance for specific types of queries.
9292

9393
When you have multiple indexes defined and then query a table, Phoenix automatically selects the best index for the query. The primary index is created automatically based on the primary keys you select.
9494

articles/hdinsight/hbase/query-hbase-with-hbase-shell.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
1717

1818
## Prerequisites
1919

20-
* An Apache HBase cluster. See [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md) to create an HDInsight cluster. Ensure you choose the **HBase** cluster type.
20+
* An Apache HBase cluster. See [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md) to create a HDInsight cluster. Ensure you choose the **HBase** cluster type.
2121

2222
* An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
2323

@@ -108,9 +108,9 @@ For more information about the HBase table schema, see [Introduction to Apache H
108108
109109
## Clean up resources
110110
111-
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
111+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for a HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
112112
113-
To delete a cluster, see [Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI](../hdinsight-delete-cluster.md).
113+
To delete a cluster, see [Delete a HDInsight cluster using your browser, PowerShell, or the Azure CLI](../hdinsight-delete-cluster.md).
114114
115115
## Next steps
116116

articles/hdinsight/hdinsight-apps-install-custom-applications.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.date: 01/02/2025
1111

1212
In this article, you'll learn how to install an [Apache Hadoop](https://hadoop.apache.org/) application on Azure HDInsight, which hasn't been published to the Azure portal. The application you'll install in this article is [Hue](https://gethue.com/).
1313

14-
An HDInsight application is an application that users can install on an HDInsight cluster. These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself.
14+
An HDInsight application is an application that users can install on a HDInsight cluster. These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself.
1515

1616
## Prerequisites
1717

articles/hdinsight/hdinsight-changing-configs-via-ambari.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Log in to Ambari at `https://CLUSTERNAME.azurehdidnsight.net` with your cluster
1717

1818
:::image type="content" source="./media/hdinsight-changing-configs-via-ambari/apache-ambari-dashboard.png" alt-text="Apache Ambari user dashboard displayed.":::
1919

20-
The Ambari web UI is used to manage hosts, services, alerts, configurations, and views. Ambari can't be used to create an HDInsight cluster, or upgrade services. Also can't manage stacks and versions, decommission or recommission hosts, or add services to the cluster.
20+
The Ambari web UI is used to manage hosts, services, alerts, configurations, and views. Ambari can't be used to create a HDInsight cluster, or upgrade services. Also can't manage stacks and versions, decommission or recommission hosts, or add services to the cluster.
2121

2222
## Manage your cluster's configuration
2323

articles/hdinsight/hdinsight-create-non-interactive-authentication-dotnet-applications.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ From your non-interactive .NET application, you need:
1919

2020
## Prerequisites
2121

22-
An HDInsight cluster. See the [getting started tutorial](hadoop/apache-hadoop-linux-tutorial-get-started.md).
22+
A HDInsight cluster. See the [getting started tutorial](hadoop/apache-hadoop-linux-tutorial-get-started.md).
2323

2424
<a name='assign-a-role-to-the-azure-ad-application'></a>
2525

@@ -36,7 +36,7 @@ Assign your Microsoft Entra application a [role](../role-based-access-control/bu
3636
1. At the top of the page, select **+ Add**.
3737
1. Follow the instructions to add the Owner role to your Microsoft Entra application. After you successfully add the role, the application is listed under the Owner role.
3838

39-
## Develop an HDInsight client application
39+
## Develop a HDInsight client application
4040

4141
1. Create a C# console application.
4242
2. Add the following [NuGet](https://www.nuget.org/) packages:

articles/hdinsight/hdinsight-log-management.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.date: 01/02/2025
99

1010
# Manage logs for a HDInsight cluster
1111

12-
a HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
12+
HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
1313

1414
Managing HDInsight cluster logs includes retaining information about all aspects of the cluster environment. This information includes all associated Azure Service logs, cluster configuration, job execution information, any error states, and other data as needed.
1515

articles/hdinsight/hdinsight-operationalize-data-pipeline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ The sample data is now available. However, the pipeline requires two Hive tables
159159

160160
5. Select **Execute** to create the table.
161161

162-
:::image type="content" source="./media/hdinsight-operationalize-data-pipeline/hdi-ambari-services-hive-query.png" alt-text="hdi ambari services hive query.":::
162+
:::image type="content" source="./media/hdinsight-operationalize-data-pipeline/hdi-ambari-services-hive-query.png" alt-text="HDInsight Ambari services hive query.":::
163163

164164
6. To create the `flights` table, replace the text in the query text area with the following statements. The `flights` table is a Hive-managed table that partitions data loaded into it by year, month, and day of month. This table will contain all historical flight data, with the lowest granularity present in the source data of one row per flight.
165165

@@ -499,7 +499,7 @@ As you can see, the majority of the coordinator is just passing configuration in
499499
<coordinator-app ... start="2017-01-01T00:00Z" end="2017-01-05T00:00Z" frequency="${coord:days(1)}" ...>
500500
```
501501
502-
A coordinator is responsible for scheduling actions within the `start` and `end` date range, according to the interval specified by the `frequency` attribute. Each scheduled action in turn runs the workflow as configured. In the coordinator definition above, the coordinator is configured to run actions from January 1, 2017 to January 5, 2017. The frequency is set to one day by the [Oozie Expression Language](https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a4.4._Frequency_and_Time-Period_Representation) frequency expression `${coord:days(1)}`. This results in the coordinator scheduling an action (and hence the workflow) once per day. For date ranges that are in the past, as in this example, the action will be scheduled to run without delay. The start of the date from which an action is scheduled to run is called the *nominal time*. For example, to process the data for January 1, 2017 the coordinator will schedule action with a nominal time of 2017-01-01T00:00:00 GMT.
502+
A coordinator is responsible for scheduling actions within the `start` and `end` date range, according to the interval specified by the `frequency` attribute. Each scheduled action in turn runs the workflow as configured. In the coordinator definition above, the coordinator is configured to run actions from January 1, 2017 to January 5, 2017. The frequency is set to one day by the [Oozie Expression Language](https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a4.4._Frequency_and_Time-Period_Representation) frequency expression `${coord:days(1)}`. This results in the coordinator scheduling an action (and hence the workflow) once per day. For date ranges that are in the past, as in this example, the action will be scheduled to run without delay. The start of the date from which an action is scheduled to run is call the *nominal time*. For example, to process the data for January 1, 2017 the coordinator will schedule action with a nominal time of 2017-01-01T00:00:00 GMT.
503503
504504
* Point 2: Within the date range of the workflow, the `dataset` element specifies where to look in HDFS for the data for a particular date range, and configures how Oozie determines whether the data is available yet for processing.
505505

0 commit comments

Comments
 (0)