Skip to content

Commit 19a963a

Browse files
authored
Merge pull request #197323 from kcheeeung/exmeta
Move notice up for HDI default metastore and update anchor
2 parents 8e57b35 + 7ef961d commit 19a963a

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

articles/hdinsight/hdinsight-faq.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ sections:
120120
- question: |
121121
How can I estimate the size of a Hive metastore database?
122122
answer: |
123-
A Hive metastore is used to store the metadata for data sources that are used by the Hive server. The size requirements depend partly on the number and complexity of your Hive data sources. These items can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#hive-metastore-guidelines), you can start with a S2 tier. The tier provides 50 DTU and 250 GB of storage, and if you see a bottleneck, scale up the database.
123+
A Hive metastore is used to store the metadata for data sources that are used by the Hive server. The size requirements depend partly on the number and complexity of your Hive data sources. These items can't be estimated up front. As outlined in [Hive metastore guidelines](hdinsight-use-external-metadata-stores.md#apache-hive-metastore-guidelines), you can start with a S2 tier. The tier provides 50 DTU and 250 GB of storage, and if you see a bottleneck, scale up the database.
124124
125125
- question: |
126126
Do you support any other database other than Azure SQL Database as an external metastore?

articles/hdinsight/hdinsight-use-external-metadata-stores.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Use external metadata stores with Azure HDInsight clusters.
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive
7-
ms.date: 04/01/2022
7+
ms.date: 05/05/2022
88
---
99

1010
# Use external metadata stores in Azure HDInsight
@@ -22,6 +22,9 @@ There are two ways you can set up a metastore for your HDInsight clusters:
2222

2323
## Default metastore
2424

25+
> [!IMPORTANT]
26+
> The default metastore provides a basic tier Azure SQL Database with only **5 DTU and 2 GB data max size (NOT UPGRADEABLE)**! Use this for QA and testing purposes only. **For production or large workloads, we recommend migrating to an external metastore!**
27+
2528
By default, HDInsight creates a metastore with every cluster type. You can instead specify a custom metastore. The default metastore includes the following considerations:
2629

2730
* No additional cost. HDInsight creates a metastore with every cluster type without any additional cost to you.
@@ -32,14 +35,11 @@ By default, HDInsight creates a metastore with every cluster type. You can inste
3235

3336
* Default metastore is recommended only for simple workloads. Workloads that don't require multiple clusters and don't need metadata preserved beyond the cluster's lifecycle.
3437

35-
> [!IMPORTANT]
36-
> The default metastore provides an Azure SQL Database with a **basic tier 5 DTU limit (not upgradeable)**! Suitable for basic testing purposes. For large or production workloads, we recommend migrating to an external metastore.
37-
3838
## Custom metastore
3939

4040
HDInsight also supports custom metastores, which are recommended for production clusters:
4141

42-
* You specify your own Azure SQL Database as the metastore.
42+
* You specify your own **Azure SQL Database** as the metastore.
4343

4444
* The lifecycle of the metastore isn't tied to a clusters lifecycle, so you can create and delete clusters without losing metadata. Metadata such as your Hive schemas will persist even after you delete and re-create the HDInsight cluster.
4545

@@ -59,7 +59,7 @@ Create or have an existing Azure SQL Database before setting up a custom Hive me
5959

6060
While creating the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
6161

62-
Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentationa](./hdinsight-private-link.md).
62+
Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
6363

6464
:::image type="content" source="./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall1.png" alt-text="set server firewall button":::
6565

@@ -71,7 +71,7 @@ You can point your cluster to a previously created Azure SQL Database at any tim
7171

7272
:::image type="content" source="./media/hdinsight-use-external-metadata-stores/azure-portal-cluster-storage-metastore.png" alt-text="HDInsight Hive Metadata Store Azure portal":::
7373

74-
## Hive metastore guidelines
74+
## Apache Hive metastore guidelines
7575

7676
> [!NOTE]
7777
> Use a custom metastore whenever possible, to help separate compute resources (your running cluster) and metadata (stored in the metastore). Start with the S2 tier, which provides 50 DTU and 250 GB of storage. If you see a bottleneck, you can scale the database up.
@@ -92,7 +92,7 @@ You can point your cluster to a previously created Azure SQL Database at any tim
9292

9393
* In HDInsight 4.0 if you would like to Share the metastore between Hive and Spark, you can do so by changing the property metastore.catalog.default to hive in your Spark cluster. You can find this property in Ambari Advanced spark2-hive-site-override. It’s important to understand that sharing of metastore only works for external hive tables, this will not work if you have internal/managed hive tables or ACID tables.
9494

95-
### Updating the custom Hive metastore password
95+
## Updating the custom Hive metastore password
9696
When using a custom Hive metastore database, you have the ability to change the SQL DB password. If you change the password for the custom metastore, the Hive services will not work until you update the password in the HDInsight cluster.
9797

9898
To update the Hive metastore password:
@@ -109,7 +109,7 @@ Apache Oozie is a workflow coordination system that manages Hadoop jobs. Oozie s
109109

110110
For instructions on creating an Oozie metastore with Azure SQL Database, see [Use Apache Oozie for workflows](hdinsight-use-oozie-linux-mac.md).
111111

112-
### Updating the custom Oozie metastore password
112+
## Updating the custom Oozie metastore password
113113
When using a custom Oozie metastore database, you have the ability to change the SQL DB password. If you change the password for the custom metastore, the Oozie services will not work until you update the password in the HDInsight cluster.
114114

115115
To update the Oozie metastore password:

0 commit comments

Comments
 (0)