Skip to content

Commit acb6460

Browse files
authored
Merge pull request #293214 from MicrosoftDocs/repo_sync_working_branch
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/azure-docs (branch main)
2 parents 103b8f0 + dbe8566 commit acb6460

File tree

2 files changed

+52
-26
lines changed

2 files changed

+52
-26
lines changed

articles/app-service/tutorial-connect-msi-azure-database.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ The following Azure CLI command uses a `--client-type` parameter.
172172
173173
1. Grant permission to pre-created tables
174174
175-
[!INCLUDE [Postgresql grant permission](../service-connector/includes/postgres-grant-permission.md)]
175+
[!INCLUDE [PostgreSQL grant permission](../service-connector/includes/postgres-grant-permission.md)]
176176
177177
-----
178178

articles/synapse-analytics/spark/apache-spark-external-metastore.md

Lines changed: 51 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -5,49 +5,73 @@ keywords: external Hive Metastore,share,Synapse
55
ms.service: azure-synapse-analytics
66
ms.topic: conceptual
77
ms.subservice: spark
8-
author: juluczni
9-
ms.author: juluczni
10-
ms.date: 02/15/2022
8+
author: jejiang
9+
ms.author: jejiang
10+
ms.date: 11/15/2024
1111
---
1212

1313
# Use external Hive Metastore for Synapse Spark Pool
1414

1515
> [!NOTE]
16-
> External Hive metastores will no longer be supported in [Azure Synapse Runtime for Apache Spark 3.4](./apache-spark-34-runtime.md) and subsequent versions in Synapse.
16+
> External Hive metastores will no longer be supported in subsequent versions after [Azure Synapse Runtime for Apache Spark 3.4](./apache-spark-34-runtime.md) in Synapse.
1717
1818
Azure Synapse Analytics allows Apache Spark pools in the same workspace to share a managed HMS (Hive Metastore) compatible metastore as their catalog. When customers want to persist the Hive catalog metadata outside of the workspace, and share catalog objects with other computational engines outside of the workspace, such as HDInsight and Azure Databricks, they can connect to an external Hive Metastore. In this article, you can learn how to connect Synapse Spark to an external Apache Hive Metastore.
1919

2020
## Supported Hive Metastore versions
2121

22-
The feature works with Spark 3.1. The following table shows the supported Hive Metastore versions for each Spark version.
23-
24-
|Spark Version|HMS 2.3.x|HMS 3.1.X|
25-
|--|--|--|
26-
|3.3|Yes|Yes|
22+
The feature works with Spark 3.3. The following table shows the supported Hive Metastore versions for each Spark version.
2723

24+
| Spark Version | HMS 2.3.x | HMS 3.1.X |
25+
|---------------|-----------|-----------|
26+
| 3.3 | Yes | Yes |
27+
| 3.4 | Yes | Yes |
2828

2929
## Set up linked service to Hive Metastore
3030

3131
> [!NOTE]
32-
> Only Azure SQL Database and Azure Database for MySQL are supported as an external Hive Metastore. And currently we only support User-Password authentication. If the provided database is blank, please provision it via [Hive Schema Tool](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) to create database schema.
32+
> Only **Azure SQL Database** and **Azure Database for MySQL** are supported as an external Hive Metastore. SQL(username-password) authentication is supported for both kinds of databases. Additionally, managed identity(including system-sssigned and user-assigned) authentication is supported only for Azure SQL Database and Spark 3.4. If the provided database is blank, please provision it via [Hive Schema Tool](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) to create database schema.
3333
3434
Follow below steps to set up a linked service to the external Hive Metastore in Synapse workspace.
3535

36-
1. Open Synapse Studio, go to **Manage > Linked services** at left, select **New** to create a new linked service.
36+
37+
# [Azure SQL Database](#tab/azure-sql-database)
38+
39+
1. Open Synapse Studio, go to **Manage > Linked services** at left, click **New** to create a new linked service.
40+
41+
:::image type="content" source="./media/use-external-metastore/set-up-hive-metastore-linked-service.png" alt-text="Screenshot of set up Hive Metastore linked service." border="true":::
42+
43+
2. Choose **Azure SQL Database**, click **Continue**.
44+
45+
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used to configure Spark shortly.
46+
47+
4. Choose **Legacy** version and select **Connection String**.
48+
49+
5. Either select **Azure SQL Database** for the external Hive Metastore from Azure subscription list, or enter the info manually.
50+
51+
6. Set **Authentication type** as one of `SQL Authentication`, `System-assigned managed identity` or `User-assigned managed identity`. For `SQL Authentication`, provide **User name** and **Password** to set up the connection. For `System-assigned managed identity`, the page will automatically populate the management identity associated with the current workspace. For `User-assigned managed identity`, pick or create a credential bound with your user-assigned managed identity.
52+
53+
7. **Test connection** to verify the authentication.
54+
55+
8. Click **Create** to create the linked service.
56+
57+
# [Azure Database for MySQL](#tab/azure-database-for-mysql)
58+
59+
1. Open Synapse Studio, go to **Manage > Linked services** at left, click **New** to create a new linked service.
3760

3861
:::image type="content" source="./media/use-external-metastore/set-up-hive-metastore-linked-service.png" alt-text="Set up Hive Metastore linked service" border="true":::
3962

40-
2. Choose **Azure SQL Database** or **Azure Database for MySQL** based on your database type, select **Continue**.
63+
2. Choose **Azure Database for MySQL**, click **Continue**.
4164

4265
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used to configure Spark shortly.
4366

44-
4. You can either select **Azure SQL Database**/**Azure Database for MySQL** for the external Hive Metastore from Azure subscription list, or enter the info manually.
67+
4. Either select **Azure Database for MySQL** for the external Hive Metastore from Azure subscription list, or enter the info manually.
4568

4669
5. Provide **User name** and **Password** to set up the connection.
4770

4871
6. **Test connection** to verify the username and password.
4972

50-
7. Select **Create** to create the linked service.
73+
7. Click **Create** to create the linked service.
74+
5175

5276
### Test connection and get the metastore version in notebook
5377

@@ -101,26 +125,28 @@ Here are the configurations and descriptions:
101125
> [!NOTE]
102126
> Synapse aims to work smoothly with computes from HDI. However HMS 3.1 in HDI 4.0 is not fully compatible with the OSS HMS 3.1. For OSS HMS 3.1, please check [here](#hms-schema-change-for-oss-hms-31).
103127
104-
|Spark config|Description|
105-
|--|--|
106-
|`spark.sql.hive.metastore.version`|Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first two parts without the third part|
107-
|`spark.sql.hive.metastore.jars`|<ul><li>Version 2.3: `/opt/hive-metastore/lib-2.3/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*` </li><li>Version 3.1: `/opt/hive-metastore/lib-3.1/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*`</li></ul>|
108-
|`spark.hadoop.hive.synapse.externalmetastore.linkedservice.name`|Name of your linked service|
109-
|`spark.sql.hive.metastore.sharedPrefixes`|`com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas`|
128+
129+
| Spark config | Description |
130+
|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
131+
| `spark.sql.hive.metastore.version` | Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first 2 parts without the 3rd part |
132+
| `spark.sql.hive.metastore.jars` | <ul><li>Version 2.3: `/opt/hive-metastore/lib-2.3/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*` </li><li>Version 3.1: `/opt/hive-metastore/lib-3.1/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*`</li></ul> |
133+
| `spark.hadoop.hive.synapse.externalmetastore.linkedservice.name` | Name of your linked service |
134+
| `spark.sql.hive.metastore.sharedPrefixes` | `com.mysql.jdbc,com.microsoft.vegas` |
135+
110136

111137

112138
### Configure at Spark pool level
113139
When creating the Spark pool, under **Additional Settings** tab, put below configurations in a text file and upload it in **Apache Spark configuration** section. You can also use the context menu for an existing Spark pool, choose Apache Spark configuration to add these configurations.
114140

115-
:::image type="content" source="./media/use-external-metastore/config-spark-pool.png" alt-text="Configure the Spark pool":::
141+
:::image type="content" source="./media/use-external-metastore/config-spark-pool.png" alt-text="Screenshot of Configure the Spark pool.":::
116142

117143
Update metastore version and linked service name, and save below configs in a text file for Spark pool configuration:
118144

119145
```properties
120146
spark.sql.hive.metastore.version <your hms version, Make sure you use the first 2 parts without the 3rd part>
121147
spark.hadoop.hive.synapse.externalmetastore.linkedservice.name <your linked service name>
122148
spark.sql.hive.metastore.jars /opt/hive-metastore/lib-<your hms version, 2 parts>/*:/usr/hdp/current/hadoop-client/lib/*
123-
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas
149+
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.vegas
124150
```
125151

126152
Here's an example for metastore version 2.3 with linked service named as HiveCatalog21:
@@ -129,7 +155,7 @@ Here's an example for metastore version 2.3 with linked service named as HiveCat
129155
spark.sql.hive.metastore.version 2.3
130156
spark.hadoop.hive.synapse.externalmetastore.linkedservice.name HiveCatalog21
131157
spark.sql.hive.metastore.jars /opt/hive-metastore/lib-2.3/*:/usr/hdp/current/hadoop-client/lib/*
132-
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas
158+
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.vegas
133159
```
134160

135161
### Configure at Spark session level
@@ -142,7 +168,7 @@ For notebook session, you can also configure the Spark session in notebook using
142168
"spark.sql.hive.metastore.version":"<your hms version, 2 parts>",
143169
"spark.hadoop.hive.synapse.externalmetastore.linkedservice.name":"<your linked service name>",
144170
"spark.sql.hive.metastore.jars":"/opt/hive-metastore/lib-<your hms version, 2 parts>/*:/usr/hdp/current/hadoop-client/lib/*",
145-
"spark.sql.hive.metastore.sharedPrefixes":"com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas"
171+
"spark.sql.hive.metastore.sharedPrefixes":"com.mysql.jdbc,com.microsoft.vegas"
146172
}
147173
}
148174
```
@@ -170,7 +196,7 @@ If the underlying data of your Hive tables are stored in Azure Blob storage acco
170196

171197
1. Open Synapse Studio, go to **Data > Linked tab > Add** button > **Connect to external data**.
172198

173-
:::image type="content" source="./media/use-external-metastore/connect-to-storage-account.png" alt-text="Connect to storage account" border="true":::
199+
:::image type="content" source="./media/use-external-metastore/connect-to-storage-account.png" alt-text="Screenshot of Connect to storage account." border="true":::
174200

175201
2. Choose **Azure Blob Storage** and select **Continue**.
176202
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used in Spark configuration shortly.

0 commit comments

Comments
 (0)