You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/apache-hbase-backup-replication.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -238,7 +238,7 @@ The general steps to set up replication are:
238
238
5. Copy existing data from the source tables to the destination tables.
239
239
6. Replication automatically copies new data modifications to the source tables into the destination tables.
240
240
241
-
To enable replication on HDInsight, apply a Script Action to your running source HDInsight cluster. For a walkthrough of enabling replication in your cluster, or to experiment with replication on sample clusters created in virtual networks using Azure Resource Manager templates, see [Configure Apache HBase replication](apache-hbase-replication.md). That article also includes instructions for enabling replication of Phoenix metadata.
241
+
To enable replication on HDInsight, apply a Script Action to the running source HDInsight cluster. For a walkthrough of enabling replication in your cluster, or to experiment with replication on sample clusters created in virtual networks using Azure Resource Manager templates, see [Configure Apache HBase replication](apache-hbase-replication.md). That article also includes instructions for enabling replication of Phoenix metadata.
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/apache-hbase-phoenix-performance.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,13 @@ The most important aspect of [Apache Phoenix](https://phoenix.apache.org/) perfo
13
13
14
14
## Table schema design
15
15
16
-
When you create a table in Phoenix, that table is stored in an HBase table. The HBase table contains groups of columns (column families) that are accessed together. A row in the Phoenix table is a row in the HBase table, where each row consists of versioned cells associated with one or more columns. Logically, a single HBase row is a collection of key-value pairs, each having the same rowkey value. That is, each key-value pair has a rowkey attribute, and the value of that rowkey attribute is the same for a particular row.
16
+
When you create a table in Phoenix, that table is stored in a HBase table. The HBase table contains groups of columns (column families) that are accessed together. A row in the Phoenix table is a row in the HBase table, where each row consists of versioned cells associated with one or more columns. Logically, a single HBase row is a collection of key-value pairs, each having the same rowkey value. That is, each key-value pair has a rowkey attribute, and the value of that rowkey attribute is the same for a particular row.
17
17
18
18
The schema design of a Phoenix table includes the primary key design, column family design, individual column design, and how the data is partitioned.
19
19
20
20
### Primary key design
21
21
22
-
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
22
+
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in a HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
23
23
24
24
For example, a table for contacts has the first name, last name, phone number, and address, all in the same column family. You could define a primary key based on an increasing sequence number:
25
25
@@ -64,7 +64,7 @@ Also, if certain columns tend to be accessed together, put those columns in the
64
64
65
65
### Column design
66
66
67
-
* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
67
+
* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When you process queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
68
68
* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON isn't recommended, as it's larger.
69
69
* Consider compressing data before storage to cut latency and I/O costs.
A Phoenix index is an HBase table that stores a copy of some or all of the data from the indexed table. An index improves performance for specific types of queries.
91
+
A Phoenix index is a HBase table that stores a copy of some or all of the data from the indexed table. An index improves performance for specific types of queries.
92
92
93
93
When you have multiple indexes defined and then query a table, Phoenix automatically selects the best index for the query. The primary index is created automatically based on the primary keys you select.
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/query-hbase-with-hbase-shell.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
17
17
18
18
## Prerequisites
19
19
20
-
* An Apache HBase cluster. See [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md) to create an HDInsight cluster. Ensure you choose the **HBase** cluster type.
20
+
* An Apache HBase cluster. See [Create cluster](../hadoop/apache-hadoop-linux-tutorial-get-started.md) to create a HDInsight cluster. Ensure you choose the **HBase** cluster type.
21
21
22
22
* An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
23
23
@@ -108,9 +108,9 @@ For more information about the HBase table schema, see [Introduction to Apache H
108
108
109
109
## Clean up resources
110
110
111
-
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
111
+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for a HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
112
112
113
-
To delete a cluster, see [Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI](../hdinsight-delete-cluster.md).
113
+
To delete a cluster, see [Delete a HDInsight cluster using your browser, PowerShell, or the Azure CLI](../hdinsight-delete-cluster.md).
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-apps-install-custom-applications.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.date: 01/02/2025
11
11
12
12
In this article, you'll learn how to install an [Apache Hadoop](https://hadoop.apache.org/) application on Azure HDInsight, which hasn't been published to the Azure portal. The application you'll install in this article is [Hue](https://gethue.com/).
13
13
14
-
An HDInsight application is an application that users can install on an HDInsight cluster. These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself.
14
+
An HDInsight application is an application that users can install on a HDInsight cluster. These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-changing-configs-via-ambari.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ Log in to Ambari at `https://CLUSTERNAME.azurehdidnsight.net` with your cluster
17
17
18
18
:::image type="content" source="./media/hdinsight-changing-configs-via-ambari/apache-ambari-dashboard.png" alt-text="Apache Ambari user dashboard displayed.":::
19
19
20
-
The Ambari web UI is used to manage hosts, services, alerts, configurations, and views. Ambari can't be used to create an HDInsight cluster, or upgrade services. Also can't manage stacks and versions, decommission or recommission hosts, or add services to the cluster.
20
+
The Ambari web UI is used to manage hosts, services, alerts, configurations, and views. Ambari can't be used to create a HDInsight cluster, or upgrade services. Also can't manage stacks and versions, decommission or recommission hosts, or add services to the cluster.
@@ -36,7 +36,7 @@ Assign your Microsoft Entra application a [role](../role-based-access-control/bu
36
36
1. At the top of the page, select **+ Add**.
37
37
1. Follow the instructions to add the Owner role to your Microsoft Entra application. After you successfully add the role, the application is listed under the Owner role.
38
38
39
-
## Develop an HDInsight client application
39
+
## Develop a HDInsight client application
40
40
41
41
1. Create a C# console application.
42
42
2. Add the following [NuGet](https://www.nuget.org/) packages:
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-log-management.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.date: 01/02/2025
9
9
10
10
# Manage logs for a HDInsight cluster
11
11
12
-
a HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
12
+
HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
13
13
14
14
Managing HDInsight cluster logs includes retaining information about all aspects of the cluster environment. This information includes all associated Azure Service logs, cluster configuration, job execution information, any error states, and other data as needed.
6. To create the `flights` table, replace the textin the query text area with the following statements. The `flights` table is a Hive-managed table that partitions data loaded into it by year, month, and day of month. This table will contain all historical flight data, with the lowest granularity present in the source data of one row per flight.
165
165
@@ -499,7 +499,7 @@ As you can see, the majority of the coordinator is just passing configuration in
A coordinator is responsible for scheduling actions within the `start` and `end` date range, according to the interval specified by the `frequency` attribute. Each scheduled action in turn runs the workflow as configured. In the coordinator definition above, the coordinator is configured to run actions from January 1, 2017 to January 5, 2017. The frequency is set to one day by the [Oozie Expression Language](https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a4.4._Frequency_and_Time-Period_Representation) frequency expression `${coord:days(1)}`. This results in the coordinator scheduling an action (and hence the workflow) once per day. For date ranges that are in the past, as in this example, the action will be scheduled to run without delay. The start of the date from which an action is scheduled to run is called the *nominal time*. For example, to process the data for January 1, 2017 the coordinator will schedule action with a nominal time of 2017-01-01T00:00:00 GMT.
502
+
A coordinator is responsible for scheduling actions within the `start` and `end` date range, according to the interval specified by the `frequency` attribute. Each scheduled action in turn runs the workflow as configured. In the coordinator definition above, the coordinator is configured to run actions from January 1, 2017 to January 5, 2017. The frequency is set to one day by the [Oozie Expression Language](https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a4.4._Frequency_and_Time-Period_Representation) frequency expression `${coord:days(1)}`. This results in the coordinator scheduling an action (and hence the workflow) once per day. For date ranges that are in the past, as in this example, the action will be scheduled to run without delay. The start of the date from which an action is scheduled to run is call the *nominal time*. For example, to process the data for January 1, 2017 the coordinator will schedule action with a nominal time of 2017-01-01T00:00:00 GMT.
503
503
504
504
* Point 2: Within the date range of the workflow, the `dataset` element specifies where to look in HDFS for the data for a particular date range, and configures how Oozie determines whether the data is available yet for processing.
0 commit comments