You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-faq.yml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -120,7 +120,7 @@ sections:
120
120
- question: |
121
121
How can I estimate the size of a Hive metastore database?
122
122
answer: |
123
-
A Hive metastore is used to store the metadata for data sources that are used by the Hive server. The size requirements depend partly on the number and complexity of your Hive data sources. These items can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#hive-metastore-guidelines), you can start with a S2 tier. The tier provides 50 DTU and 250 GB of storage, and if you see a bottleneck, scale up the database.
123
+
A Hive metastore is used to store the metadata for data sources that are used by the Hive server. The size requirements depend partly on the number and complexity of your Hive data sources. These items can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#apache-hive-metastore-guidelines), you can start with a S2 tier. The tier provides 50 DTU and 250 GB of storage, and if you see a bottleneck, scale up the database.
124
124
125
125
- question: |
126
126
Do you support any other database other than Azure SQL Database as an external metastore?
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-use-external-metadata-stores.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Use external metadata stores with Azure HDInsight clusters.
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
6
ms.custom: hdinsightactive
7
-
ms.date: 04/01/2022
7
+
ms.date: 05/05/2022
8
8
---
9
9
10
10
# Use external metadata stores in Azure HDInsight
@@ -22,6 +22,9 @@ There are two ways you can set up a metastore for your HDInsight clusters:
22
22
23
23
## Default metastore
24
24
25
+
> [!IMPORTANT]
26
+
> The default metastore provides a basic tier Azure SQL Database with only **5 DTU and 2 GB data max size (NOT UPGRADEABLE)**! Use this for QA and testing purposes only. **For production or large workloads, we recommend migrating to an external metastore!**
27
+
25
28
By default, HDInsight creates a metastore with every cluster type. You can instead specify a custom metastore. The default metastore includes the following considerations:
26
29
27
30
* No additional cost. HDInsight creates a metastore with every cluster type without any additional cost to you.
@@ -32,14 +35,11 @@ By default, HDInsight creates a metastore with every cluster type. You can inste
32
35
33
36
* Default metastore is recommended only for simple workloads. Workloads that don't require multiple clusters and don't need metadata preserved beyond the cluster's lifecycle.
34
37
35
-
> [!IMPORTANT]
36
-
> The default metastore provides an Azure SQL Database with a **basic tier 5 DTU limit (not upgradeable)**! Suitable for basic testing purposes. For large or production workloads, we recommend migrating to an external metastore.
37
-
38
38
## Custom metastore
39
39
40
40
HDInsight also supports custom metastores, which are recommended for production clusters:
41
41
42
-
* You specify your own Azure SQL Database as the metastore.
42
+
* You specify your own **Azure SQL Database** as the metastore.
43
43
44
44
* The lifecycle of the metastore isn't tied to a clusters lifecycle, so you can create and delete clusters without losing metadata. Metadata such as your Hive schemas will persist even after you delete and re-create the HDInsight cluster.
45
45
@@ -59,7 +59,7 @@ Create or have an existing Azure SQL Database before setting up a custom Hive me
59
59
60
60
While creating the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
61
61
62
-
Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentationa](./hdinsight-private-link.md).
62
+
Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
63
63
64
64
:::image type="content" source="./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall1.png" alt-text="set server firewall button":::
65
65
@@ -71,7 +71,7 @@ You can point your cluster to a previously created Azure SQL Database at any tim
71
71
72
72
:::image type="content" source="./media/hdinsight-use-external-metadata-stores/azure-portal-cluster-storage-metastore.png" alt-text="HDInsight Hive Metadata Store Azure portal":::
73
73
74
-
## Hive metastore guidelines
74
+
## Apache Hive metastore guidelines
75
75
76
76
> [!NOTE]
77
77
> Use a custom metastore whenever possible, to help separate compute resources (your running cluster) and metadata (stored in the metastore). Start with the S2 tier, which provides 50 DTU and 250 GB of storage. If you see a bottleneck, you can scale the database up.
@@ -92,7 +92,7 @@ You can point your cluster to a previously created Azure SQL Database at any tim
92
92
93
93
* In HDInsight 4.0 if you would like to Share the metastore between Hive and Spark, you can do so by changing the property metastore.catalog.default to hive in your Spark cluster. You can find this property in Ambari Advanced spark2-hive-site-override. It’s important to understand that sharing of metastore only works for external hive tables, this will not work if you have internal/managed hive tables or ACID tables.
94
94
95
-
###Updating the custom Hive metastore password
95
+
## Updating the custom Hive metastore password
96
96
When using a custom Hive metastore database, you have the ability to change the SQL DB password. If you change the password for the custom metastore, the Hive services will not work until you update the password in the HDInsight cluster.
97
97
98
98
To update the Hive metastore password:
@@ -109,7 +109,7 @@ Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie s
109
109
110
110
For instructions on creating an Oozie metastore with Azure SQL Database, see [Use Apache Oozie for workflows](hdinsight-use-oozie-linux-mac.md).
111
111
112
-
###Updating the custom Oozie metastore password
112
+
## Updating the custom Oozie metastore password
113
113
When using a custom Oozie metastore database, you have the ability to change the SQL DB password. If you change the password for the custom metastore, the Oozie services will not work until you update the password in the HDInsight cluster.
0 commit comments