You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-faq.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,7 +116,7 @@ It depends on the type of metastore that your cluster is configured to use.
116
116
117
117
For a default metastore: The default metastore is part of the cluster lifecycle. When you delete a cluster, the corresponding metastore and metadata are also deleted.
118
118
119
-
For a custom metastore: The lifecycle of the metastore is not tied to a cluster’s lifecycle. Therefore, you can create and delete clusters without losing metadata. Metadata such as your Hive schemas persists even after you delete and re-create the HDInsight cluster.
119
+
For a custom metastore: The lifecycle of the metastore is not tied to a cluster's lifecycle. Therefore, you can create and delete clusters without losing metadata. Metadata such as your Hive schemas persists even after you delete and re-create the HDInsight cluster.
120
120
121
121
For more information, see [Use external metadata stores in Azure HDInsight](hdinsight-use-external-metadata-stores.md).
122
122
@@ -130,7 +130,7 @@ Yes, you can migrate a Hive metastore from an ESP to a non-ESP cluster.
130
130
131
131
### How can I estimate the size of a Hive metastore database?
132
132
133
-
A Hive metastore is used to store the metadata for data sources that are used by the Hive server.The size requirements depend partly on the number and complexity of your Hive data sources, and can't be estimated up front. As outlined in [Hive metastore best practices](hdinsight-use-external-metadata-stores.md#hive-metastore-best-practices), you can start with a S2 tier, which provides 50 DTU and 250 GB of storage, and if you see a bottleneck, you can scale up the database.
133
+
A Hive metastore is used to store the metadata for data sources that are used by the Hive server.The size requirements depend partly on the number and complexity of your Hive data sources, and can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#hive-metastore-guidelines), you can start with a S2 tier, which provides 50 DTU and 250 GB of storage, and if you see a bottleneck, you can scale up the database.
134
134
135
135
### Do you support any other database other than Azure SQL Database as an external metastore?
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-use-external-metadata-stores.md
+15-19Lines changed: 15 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
---
2
2
title: Use external metadata stores - Azure HDInsight
3
-
description: Use external metadata stores with Azure HDInsight clusters, and best practices.
3
+
description: Use external metadata stores with Azure HDInsight clusters.
4
4
author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
-
ms.custom: hdinsightactive
9
8
ms.topic: conceptual
10
-
ms.date: 03/02/2020
9
+
ms.custom: hdinsightactive
10
+
ms.date: 04/03/2020
11
11
---
12
12
13
13
# Use external metadata stores in Azure HDInsight
14
14
15
-
HDInsight allows you to take control of your data and metadata by deploying key metadata solutions and management databases to external data stores. This feature is currently available for [Apache Hive metastore](#custom-metastore), [Apache Oozie metastore](#apache-oozie-metastore) and [Apache Ambari database](#custom-ambari-db).
15
+
HDInsight allows you to take control of your data and metadata with external data stores. This feature is available for [Apache Hive metastore](#custom-metastore), [Apache Oozie metastore](#apache-oozie-metastore), and [Apache Ambari database](#custom-ambari-db).
16
16
17
-
The Apache Hive metastore in HDInsight is an essential part of the Apache Hadoop architecture. A metastore is the central schema repository that can be used by other big data access tools such as Apache Spark, Interactive Query (LLAP), Presto, or Apache Pig. HDInsight uses an Azure SQL Database as the Hive metastore.
17
+
The Apache Hive metastore in HDInsight is an essential part of the Apache Hadoop architecture. A metastore is the central schema repository. The metastore is used by other big data access tools such as Apache Spark, Interactive Query (LLAP), Presto, or Apache Pig. HDInsight uses an Azure SQL Database as the Hive metastore.
18
18
19
19

20
20
@@ -34,7 +34,7 @@ By default, HDInsight creates a metastore with every cluster type. You can inste
34
34
* You can't share the default metastore with other clusters.
35
35
36
36
* The default metastore uses the basic Azure SQL DB, which has a five DTU (database transaction unit) limit.
37
-
This default metastore is typically used for relatively simple workloadsthat don't require multiple clusters and don’t need metadata preserved beyond the cluster's lifecycle.
37
+
This default metastore is typically used for relatively simple workloads. Workloads that don't require multiple clusters and don't need metadata preserved beyond the cluster's lifecycle.
38
38
39
39
## Custom metastore
40
40
@@ -56,25 +56,21 @@ HDInsight also supports custom metastores, which are recommended for production
56
56
57
57
### Create and config Azure SQL Database for the custom metastore
58
58
59
-
You need to create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster. For more information, see [Quickstart: Create a single database in Azure SQL DB](https://docs.microsoft.com/azure/sql-database/sql-database-single-database-get-started?tabs=azure-portal).
59
+
Create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster. For more information, see [Quickstart: Create a single database in Azure SQL DB](https://docs.microsoft.com/azure/sql-database/sql-database-single-database-get-started?tabs=azure-portal).
60
60
61
-
To make sure that your HDInsight cluster can access the connected Azure SQL Database, configure Azure SQL Database firewall rules to allow Azure services and resources to access the server.
62
-
63
-
You can enable this option in the Azure portal by clicking **Set server firewall**, and clicking **ON** underneath **Allow Azure services and resources to access this server** for the Azure SQL Database server or database. For more information, see [Create and manage IP firewall rules](https://docs.microsoft.com/azure/sql-database/sql-database-firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
61
+
Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **ON** underneath **Allow Azure services and resources to access this server** for the Azure SQL Database server or database. For more information, see [Create and manage IP firewall rules](https://docs.microsoft.com/azure/sql-database/sql-database-firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
64
62
65
63

### Select a custom metastore during cluster creation
70
68
71
-
You can point your cluster to a previously created Azure SQL Database during cluster creation, or you can configure the SQL Database after the cluster is created. This option is specified with the **Storage > Metastore settings** while creating a new Hadoop, Spark, or interactive Hive cluster from Azure portal.
69
+
You can point your cluster to a previously created Azure SQL Database at any time. For cluster creation through the portal, the option is specified from the **Storage > Metastore settings**.
72
70
73
71

74
72
75
-
## Hive metastore best practices
76
-
77
-
Here are some general HDInsight Hive metastore best practices:
73
+
## Hive metastore guidelines
78
74
79
75
* Use a custom metastore whenever possible, to help separate compute resources (your running cluster) and metadata (stored in the metastore).
80
76
@@ -84,19 +80,19 @@ Here are some general HDInsight Hive metastore best practices:
84
80
85
81
* Back up your custom metastore periodically. Azure SQL Database generates backups automatically, but the backup retention timeframe varies. For more information, see [Learn about automatic SQL Database backups](../sql-database/sql-database-automated-backups.md).
86
82
87
-
* Locate your metastore and HDInsight cluster in the same region, for highest performance and lowest network egress charges.
83
+
* Locate your metastore and HDInsight cluster in the same region. This configuration will provide the highest performance and lowest network egress charges.
88
84
89
-
* Monitor your metastore for performance and availability using Azure SQL Database Monitoring tools, such as the Azure portal or Azure Monitor logs.
85
+
* Monitor your metastore for performance and availability using Azure SQL Database Monitoring tools, or Azure Monitor logs.
90
86
91
-
* When a new, higher version of Azure HDInsight is created against an existing custom metastore database, the system upgrades the schema of the metastore, which is irreversible without restoring the database from backup.
87
+
* When a new, higher version of Azure HDInsight is created against an existing custom metastore database, the system upgrades the schema of the metastore. The upgrade is irreversible without restoring the database from backup.
92
88
93
89
* If you share a metastore across multiple clusters, ensure all the clusters are the same HDInsight version. Different Hive versions use different metastore database schemas. For example, you can't share a metastore across Hive 2.1 and Hive 3.1 versioned clusters.
94
90
95
-
* In HDInsight 4.0, Spark and Hive use independent catalogs for accessing SparkSQL or Hive tables. A table created by Spark resides in the Spark catalog. A table created by Hive resides in the Hive catalog. This is different than HDInsight 3.6 where Hive and Spark shared common catalog. Hive and Spark Integration in HDInsight 4.0 relies on Hive Warehouse Connector (HWC). HWC works as a bridge between Spark and Hive. [Learn about Hive Warehouse Connector](../hdinsight/interactive-query/apache-hive-warehouse-connector.md).
91
+
* In HDInsight 4.0, Spark and Hive use independent catalogs for accessing SparkSQL or Hive tables. A table created by Spark lives in the Spark catalog. A table created by Hive lives in the Hive catalog. This behavior is different than HDInsight 3.6 where Hive and Spark shared common catalog. Hive and Spark Integration in HDInsight 4.0 relies on Hive Warehouse Connector (HWC). HWC works as a bridge between Spark and Hive. [Learn about Hive Warehouse Connector](../hdinsight/interactive-query/apache-hive-warehouse-connector.md).
96
92
97
93
## Apache Oozie metastore
98
94
99
-
Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie supports Hadoop jobs for Apache MapReduce, Pig, Hive, and others. Oozie uses a metastore to store details about current and completed workflows. To increase performance when using Oozie, you can use Azure SQL Database as a custom metastore. The metastore can also provide access to Oozie job data after you delete your cluster.
95
+
Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie supports Hadoop jobs for Apache MapReduce, Pig, Hive, and others. Oozie uses a metastore to store details about workflows. To increase performance when using Oozie, you can use Azure SQL Database as a custom metastore. The metastore provides access to Oozie job data after you delete your cluster.
100
96
101
97
For instructions on creating an Oozie metastore with Azure SQL Database, see [Use Apache Oozie for workflows](hdinsight-use-oozie-linux-mac.md).
0 commit comments