Skip to content

Commit 647198e

Browse files
authored
Merge pull request #110154 from dagiro/freshness33
freshness33
2 parents 66dcd6b + 3a0fa88 commit 647198e

File tree

2 files changed

+17
-21
lines changed

2 files changed

+17
-21
lines changed

articles/hdinsight/hdinsight-faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ It depends on the type of metastore that your cluster is configured to use.
116116

117117
For a default metastore: The default metastore is part of the cluster lifecycle. When you delete a cluster, the corresponding metastore and metadata are also deleted.
118118

119-
For a custom metastore: The lifecycle of the metastore is not tied to a clusters lifecycle. Therefore, you can create and delete clusters without losing metadata. Metadata such as your Hive schemas persists even after you delete and re-create the HDInsight cluster.
119+
For a custom metastore: The lifecycle of the metastore is not tied to a cluster's lifecycle. Therefore, you can create and delete clusters without losing metadata. Metadata such as your Hive schemas persists even after you delete and re-create the HDInsight cluster.
120120

121121
For more information, see [Use external metadata stores in Azure HDInsight](hdinsight-use-external-metadata-stores.md).
122122

@@ -130,7 +130,7 @@ Yes, you can migrate a Hive metastore from an ESP to a non-ESP cluster.
130130

131131
### How can I estimate the size of a Hive metastore database?
132132

133-
A Hive metastore is used to store the metadata for data sources that are used by the Hive server.The size requirements depend partly on the number and complexity of your Hive data sources, and can't be estimated up front. As outlined in [Hive metastore best practices](hdinsight-use-external-metadata-stores.md#hive-metastore-best-practices), you can start with a S2 tier, which provides 50 DTU and 250 GB of storage, and if you see a bottleneck, you can scale up the database.
133+
A Hive metastore is used to store the metadata for data sources that are used by the Hive server.The size requirements depend partly on the number and complexity of your Hive data sources, and can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#hive-metastore-guidelines), you can start with a S2 tier, which provides 50 DTU and 250 GB of storage, and if you see a bottleneck, you can scale up the database.
134134

135135
### Do you support any other database other than Azure SQL Database as an external metastore?
136136

articles/hdinsight/hdinsight-use-external-metadata-stores.md

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
22
title: Use external metadata stores - Azure HDInsight
3-
description: Use external metadata stores with Azure HDInsight clusters, and best practices.
3+
description: Use external metadata stores with Azure HDInsight clusters.
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 03/02/2020
9+
ms.custom: hdinsightactive
10+
ms.date: 04/03/2020
1111
---
1212

1313
# Use external metadata stores in Azure HDInsight
1414

15-
HDInsight allows you to take control of your data and metadata by deploying key metadata solutions and management databases to external data stores. This feature is currently available for [Apache Hive metastore](#custom-metastore), [Apache Oozie metastore](#apache-oozie-metastore) and [Apache Ambari database](#custom-ambari-db).
15+
HDInsight allows you to take control of your data and metadata with external data stores. This feature is available for [Apache Hive metastore](#custom-metastore), [Apache Oozie metastore](#apache-oozie-metastore), and [Apache Ambari database](#custom-ambari-db).
1616

17-
The Apache Hive metastore in HDInsight is an essential part of the Apache Hadoop architecture. A metastore is the central schema repository that can be used by other big data access tools such as Apache Spark, Interactive Query (LLAP), Presto, or Apache Pig. HDInsight uses an Azure SQL Database as the Hive metastore.
17+
The Apache Hive metastore in HDInsight is an essential part of the Apache Hadoop architecture. A metastore is the central schema repository. The metastore is used by other big data access tools such as Apache Spark, Interactive Query (LLAP), Presto, or Apache Pig. HDInsight uses an Azure SQL Database as the Hive metastore.
1818

1919
![HDInsight Hive Metadata Store Architecture](./media/hdinsight-use-external-metadata-stores/metadata-store-architecture.png)
2020

@@ -34,7 +34,7 @@ By default, HDInsight creates a metastore with every cluster type. You can inste
3434
* You can't share the default metastore with other clusters.
3535

3636
* The default metastore uses the basic Azure SQL DB, which has a five DTU (database transaction unit) limit.
37-
This default metastore is typically used for relatively simple workloads that don't require multiple clusters and dont need metadata preserved beyond the cluster's lifecycle.
37+
This default metastore is typically used for relatively simple workloads. Workloads that don't require multiple clusters and don't need metadata preserved beyond the cluster's lifecycle.
3838

3939
## Custom metastore
4040

@@ -56,25 +56,21 @@ HDInsight also supports custom metastores, which are recommended for production
5656

5757
### Create and config Azure SQL Database for the custom metastore
5858

59-
You need to create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster. For more information, see [Quickstart: Create a single database in Azure SQL DB](https://docs.microsoft.com/azure/sql-database/sql-database-single-database-get-started?tabs=azure-portal).
59+
Create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster. For more information, see [Quickstart: Create a single database in Azure SQL DB](https://docs.microsoft.com/azure/sql-database/sql-database-single-database-get-started?tabs=azure-portal).
6060

61-
To make sure that your HDInsight cluster can access the connected Azure SQL Database, configure Azure SQL Database firewall rules to allow Azure services and resources to access the server.
62-
63-
You can enable this option in the Azure portal by clicking **Set server firewall**, and clicking **ON** underneath **Allow Azure services and resources to access this server** for the Azure SQL Database server or database. For more information, see [Create and manage IP firewall rules](https://docs.microsoft.com/azure/sql-database/sql-database-firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
61+
Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **ON** underneath **Allow Azure services and resources to access this server** for the Azure SQL Database server or database. For more information, see [Create and manage IP firewall rules](https://docs.microsoft.com/azure/sql-database/sql-database-firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
6462

6563
![set server firewall button](./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall1.png)
6664

6765
![allow azure services access](./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall2.png)
6866

6967
### Select a custom metastore during cluster creation
7068

71-
You can point your cluster to a previously created Azure SQL Database during cluster creation, or you can configure the SQL Database after the cluster is created. This option is specified with the **Storage > Metastore settings** while creating a new Hadoop, Spark, or interactive Hive cluster from Azure portal.
69+
You can point your cluster to a previously created Azure SQL Database at any time. For cluster creation through the portal, the option is specified from the **Storage > Metastore settings**.
7270

7371
![HDInsight Hive Metadata Store Azure portal](./media/hdinsight-use-external-metadata-stores/azure-portal-cluster-storage-metastore.png)
7472

75-
## Hive metastore best practices
76-
77-
Here are some general HDInsight Hive metastore best practices:
73+
## Hive metastore guidelines
7874

7975
* Use a custom metastore whenever possible, to help separate compute resources (your running cluster) and metadata (stored in the metastore).
8076

@@ -84,19 +80,19 @@ Here are some general HDInsight Hive metastore best practices:
8480

8581
* Back up your custom metastore periodically. Azure SQL Database generates backups automatically, but the backup retention timeframe varies. For more information, see [Learn about automatic SQL Database backups](../sql-database/sql-database-automated-backups.md).
8682

87-
* Locate your metastore and HDInsight cluster in the same region, for highest performance and lowest network egress charges.
83+
* Locate your metastore and HDInsight cluster in the same region. This configuration will provide the highest performance and lowest network egress charges.
8884

89-
* Monitor your metastore for performance and availability using Azure SQL Database Monitoring tools, such as the Azure portal or Azure Monitor logs.
85+
* Monitor your metastore for performance and availability using Azure SQL Database Monitoring tools, or Azure Monitor logs.
9086

91-
* When a new, higher version of Azure HDInsight is created against an existing custom metastore database, the system upgrades the schema of the metastore, which is irreversible without restoring the database from backup.
87+
* When a new, higher version of Azure HDInsight is created against an existing custom metastore database, the system upgrades the schema of the metastore. The upgrade is irreversible without restoring the database from backup.
9288

9389
* If you share a metastore across multiple clusters, ensure all the clusters are the same HDInsight version. Different Hive versions use different metastore database schemas. For example, you can't share a metastore across Hive 2.1 and Hive 3.1 versioned clusters.
9490

95-
* In HDInsight 4.0, Spark and Hive use independent catalogs for accessing SparkSQL or Hive tables. A table created by Spark resides in the Spark catalog. A table created by Hive resides in the Hive catalog. This is different than HDInsight 3.6 where Hive and Spark shared common catalog. Hive and Spark Integration in HDInsight 4.0 relies on Hive Warehouse Connector (HWC). HWC works as a bridge between Spark and Hive. [Learn about Hive Warehouse Connector](../hdinsight/interactive-query/apache-hive-warehouse-connector.md).
91+
* In HDInsight 4.0, Spark and Hive use independent catalogs for accessing SparkSQL or Hive tables. A table created by Spark lives in the Spark catalog. A table created by Hive lives in the Hive catalog. This behavior is different than HDInsight 3.6 where Hive and Spark shared common catalog. Hive and Spark Integration in HDInsight 4.0 relies on Hive Warehouse Connector (HWC). HWC works as a bridge between Spark and Hive. [Learn about Hive Warehouse Connector](../hdinsight/interactive-query/apache-hive-warehouse-connector.md).
9692

9793
## Apache Oozie metastore
9894

99-
Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie supports Hadoop jobs for Apache MapReduce, Pig, Hive, and others. Oozie uses a metastore to store details about current and completed workflows. To increase performance when using Oozie, you can use Azure SQL Database as a custom metastore. The metastore can also provide access to Oozie job data after you delete your cluster.
95+
Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie supports Hadoop jobs for Apache MapReduce, Pig, Hive, and others. Oozie uses a metastore to store details about workflows. To increase performance when using Oozie, you can use Azure SQL Database as a custom metastore. The metastore provides access to Oozie job data after you delete your cluster.
10096

10197
For instructions on creating an Oozie metastore with Azure SQL Database, see [Use Apache Oozie for workflows](hdinsight-use-oozie-linux-mac.md).
10298

0 commit comments

Comments
 (0)