You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Use external Hive Metastore for Synapse Spark Pool
14
14
15
15
> [!NOTE]
16
-
> External Hive metastores will no longer be supported in [Azure Synapse Runtime for Apache Spark 3.4](./apache-spark-34-runtime.md) and subsequent versions in Synapse.
16
+
> External Hive metastores will no longer be supported in subsequent versions after [Azure Synapse Runtime for Apache Spark 3.4](./apache-spark-34-runtime.md) in Synapse.
17
17
18
18
Azure Synapse Analytics allows Apache Spark pools in the same workspace to share a managed HMS (Hive Metastore) compatible metastore as their catalog. When customers want to persist the Hive catalog metadata outside of the workspace, and share catalog objects with other computational engines outside of the workspace, such as HDInsight and Azure Databricks, they can connect to an external Hive Metastore. In this article, you can learn how to connect Synapse Spark to an external Apache Hive Metastore.
19
19
20
20
## Supported Hive Metastore versions
21
21
22
-
The feature works with Spark 3.1. The following table shows the supported Hive Metastore versions for each Spark version.
23
-
24
-
|Spark Version|HMS 2.3.x|HMS 3.1.X|
25
-
|--|--|--|
26
-
|3.3|Yes|Yes|
22
+
The feature works with Spark 3.3. The following table shows the supported Hive Metastore versions for each Spark version.
27
23
24
+
| Spark Version | HMS 2.3.x | HMS 3.1.X |
25
+
|---------------|-----------|-----------|
26
+
| 3.3 | Yes | Yes |
27
+
| 3.4 | Yes | Yes |
28
28
29
29
## Set up linked service to Hive Metastore
30
30
31
31
> [!NOTE]
32
-
> Only Azure SQL Database and Azure Database for MySQL are supported as an external Hive Metastore. And currently we only support User-Password authentication. If the provided database is blank, please provision it via [Hive Schema Tool](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) to create database schema.
32
+
> Only **Azure SQL Database** and **Azure Database for MySQL** are supported as an external Hive Metastore. SQL(username-password) authentication is supported for both kinds of databases. Additionally, managed identity(including system-sssigned and user-assigned) authentication is supported only for Azure SQL Database and Spark 3.4. If the provided database is blank, please provision it via [Hive Schema Tool](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) to create database schema.
33
33
34
34
Follow below steps to set up a linked service to the external Hive Metastore in Synapse workspace.
35
35
36
-
1. Open Synapse Studio, go to **Manage > Linked services** at left, select **New** to create a new linked service.
36
+
37
+
# [Azure SQL Database](#tab/azure-sql-database)
38
+
39
+
1. Open Synapse Studio, go to **Manage > Linked services** at left, click **New** to create a new linked service.
40
+
41
+
:::image type="content" source="./media/use-external-metastore/set-up-hive-metastore-linked-service.png" alt-text="Screenshot of set up Hive Metastore linked service." border="true":::
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used to configure Spark shortly.
46
+
47
+
4. Choose **Legacy** version and select **Connection String**.
48
+
49
+
5. Either select **Azure SQL Database** for the external Hive Metastore from Azure subscription list, or enter the info manually.
50
+
51
+
6. Set **Authentication type** as one of `SQL Authentication`, `System-assigned managed identity` or `User-assigned managed identity`. For `SQL Authentication`, provide **User name** and **Password** to set up the connection. For `System-assigned managed identity`, the page will automatically populate the management identity associated with the current workspace. For `User-assigned managed identity`, pick or create a credential bound with your user-assigned managed identity.
52
+
53
+
7.**Test connection** to verify the authentication.
54
+
55
+
8. Click **Create** to create the linked service.
56
+
57
+
# [Azure Database for MySQL](#tab/azure-database-for-mysql)
58
+
59
+
1. Open Synapse Studio, go to **Manage > Linked services** at left, click **New** to create a new linked service.
37
60
38
61
:::image type="content" source="./media/use-external-metastore/set-up-hive-metastore-linked-service.png" alt-text="Set up Hive Metastore linked service" border="true":::
39
62
40
-
2. Choose **Azure SQL Database** or **Azure Database for MySQL** based on your database type, select**Continue**.
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used to configure Spark shortly.
43
66
44
-
4.You can either select **Azure SQL Database**/**Azure Database for MySQL** for the external Hive Metastore from Azure subscription list, or enter the info manually.
67
+
4.Either select **Azure Database for MySQL** for the external Hive Metastore from Azure subscription list, or enter the info manually.
45
68
46
69
5. Provide **User name** and **Password** to set up the connection.
47
70
48
71
6.**Test connection** to verify the username and password.
49
72
50
-
7. Select **Create** to create the linked service.
73
+
7. Click **Create** to create the linked service.
74
+
51
75
52
76
### Test connection and get the metastore version in notebook
53
77
@@ -101,26 +125,28 @@ Here are the configurations and descriptions:
101
125
> [!NOTE]
102
126
> Synapse aims to work smoothly with computes from HDI. However HMS 3.1 in HDI 4.0 is not fully compatible with the OSS HMS 3.1. For OSS HMS 3.1, please check [here](#hms-schema-change-for-oss-hms-31).
103
127
104
-
|Spark config|Description|
105
-
|--|--|
106
-
|`spark.sql.hive.metastore.version`|Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first two parts without the third part|
|`spark.sql.hive.metastore.version`| Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first 2 parts without the 3rd part |
When creating the Spark pool, under **Additional Settings** tab, put below configurations in a text file and upload it in **Apache Spark configuration** section. You can also use the context menu for an existing Spark pool, choose Apache Spark configuration to add these configurations.
114
140
115
-
:::image type="content" source="./media/use-external-metastore/config-spark-pool.png" alt-text="Configure the Spark pool":::
141
+
:::image type="content" source="./media/use-external-metastore/config-spark-pool.png" alt-text="Screenshot of Configure the Spark pool.":::
116
142
117
143
Update metastore version and linked service name, and save below configs in a text file for Spark pool configuration:
118
144
119
145
```properties
120
146
spark.sql.hive.metastore.version <your hms version, Make sure you use the first 2 parts without the 3rd part>
121
147
spark.hadoop.hive.synapse.externalmetastore.linkedservice.name <your linked service name>
122
148
spark.sql.hive.metastore.jars /opt/hive-metastore/lib-<your hms version, 2 parts>/*:/usr/hdp/current/hadoop-client/lib/*
@@ -170,7 +196,7 @@ If the underlying data of your Hive tables are stored in Azure Blob storage acco
170
196
171
197
1. Open Synapse Studio, go to **Data > Linked tab > Add** button > **Connect to external data**.
172
198
173
-
:::image type="content" source="./media/use-external-metastore/connect-to-storage-account.png" alt-text="Connect to storage account" border="true":::
199
+
:::image type="content" source="./media/use-external-metastore/connect-to-storage-account.png" alt-text="Screenshot of Connect to storage account." border="true":::
174
200
175
201
2. Choose **Azure Blob Storage** and select **Continue**.
176
202
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used in Spark configuration shortly.
0 commit comments