Skip to content

Commit abd7dd3

Browse files
committed
Added warning for security risk
1 parent 17d5ebd commit abd7dd3

File tree

1 file changed

+21
-14
lines changed

1 file changed

+21
-14
lines changed

articles/synapse-analytics/spark/apache-spark-external-metastore.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,11 @@ The feature works with Spark 3.1. The following table shows the supported Hive M
3333
3434
Follow below steps to set up a linked service to the external Hive Metastore in Synapse workspace.
3535

36-
1. Open Synapse Studio, go to **Manage > Linked services** at left, click **New** to create a new linked service.
36+
1. Open Synapse Studio, go to **Manage > Linked services** at left, select **New** to create a new linked service.
3737

3838
:::image type="content" source="./media/use-external-metastore/set-up-hive-metastore-linked-service.png" alt-text="Set up Hive Metastore linked service" border="true":::
3939

40-
2. Choose **Azure SQL Database** or **Azure Database for MySQL** based on your database type, click **Continue**.
40+
2. Choose **Azure SQL Database** or **Azure Database for MySQL** based on your database type, select **Continue**.
4141

4242
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used to configure Spark shortly.
4343

@@ -47,14 +47,19 @@ Follow below steps to set up a linked service to the external Hive Metastore in
4747

4848
6. **Test connection** to verify the username and password.
4949

50-
7. Click **Create** to create the linked service.
50+
7. Select **Create** to create the linked service.
5151

5252
### Test connection and get the metastore version in notebook
53-
Some network security rule settings may block access from Spark pool to the external Hive Metastore DB. Before you configure the Spark pool, run below code in any Spark pool notebook to test connection to the external Hive Metastore DB.
53+
54+
Some network security rule settings could block access from Spark pool to the external Hive Metastore DB. Before you configure the Spark pool, run below code in any Spark pool notebook to test connection to the external Hive Metastore DB.
5455

5556
You can also get your Hive Metastore version from the output results. The Hive Metastore version will be used in the Spark configuration.
5657

58+
>[!WARNING]
59+
>Don't publish the test scripts in your notebook with your password hardcoded as this could cause a potential security risk for your Hive Metastore.
60+
5761
#### Connection testing code for Azure SQL
62+
5863
```scala
5964
%%spark
6065
import java.sql.DriverManager
@@ -71,6 +76,7 @@ try {
7176
```
7277

7378
#### Connection testing code for Azure Database for MySQL
79+
7480
```scala
7581
%%spark
7682
import java.sql.DriverManager
@@ -87,7 +93,8 @@ try {
8793
```
8894

8995
## Configure Spark to use the external Hive Metastore
90-
After creating the linked service to the external Hive Metastore successfully, you need to set up a few Spark configurations to use the external Hive Metastore. You can both set up the configuration at Spark pool level, or at Spark session level.
96+
97+
After creating the linked service to the external Hive Metastore successfully, you need to set up a few Spark configurations to use the external Hive Metastore. You can both set up the configuration at Spark pool level, or at Spark session level.
9198

9299
Here are the configurations and descriptions:
93100

@@ -96,7 +103,7 @@ Here are the configurations and descriptions:
96103
97104
|Spark config|Description|
98105
|--|--|
99-
|`spark.sql.hive.metastore.version`|Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first 2 parts without the 3rd part|
106+
|`spark.sql.hive.metastore.version`|Supported versions: <ul><li>`2.3`</li><li>`3.1`</li></ul> Make sure you use the first two parts without the third part|
100107
|`spark.sql.hive.metastore.jars`|<ul><li>Version 2.3: `/opt/hive-metastore/lib-2.3/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*` </li><li>Version 3.1: `/opt/hive-metastore/lib-3.1/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*`</li></ul>|
101108
|`spark.hadoop.hive.synapse.externalmetastore.linkedservice.name`|Name of your linked service|
102109
|`spark.sql.hive.metastore.sharedPrefixes`|`com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas`|
@@ -116,7 +123,7 @@ spark.sql.hive.metastore.jars /opt/hive-metastore/lib-<your hms version, 2 parts
116123
spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.sqlserver,com.microsoft.vegas
117124
```
118125

119-
Here is an example for metastore version 2.3 with linked service named as HiveCatalog21:
126+
Here's an example for metastore version 2.3 with linked service named as HiveCatalog21:
120127

121128
```properties
122129
spark.sql.hive.metastore.version 2.3
@@ -126,7 +133,7 @@ spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,com.microsoft.sqlserver,c
126133
```
127134

128135
### Configure at Spark session level
129-
For notebook session, you can also configure the Spark session in notebook using `%%configure` magic command. Here is the code.
136+
For notebook session, you can also configure the Spark session in notebook using `%%configure` magic command. Here's the code.
130137

131138
```json
132139
%%configure -f
@@ -165,10 +172,10 @@ If the underlying data of your Hive tables are stored in Azure Blob storage acco
165172

166173
:::image type="content" source="./media/use-external-metastore/connect-to-storage-account.png" alt-text="Connect to storage account" border="true":::
167174

168-
2. Choose **Azure Blob Storage** and click **Continue**.
175+
2. Choose **Azure Blob Storage** and select **Continue**.
169176
3. Provide **Name** of the linked service. Record the name of the linked service, this info will be used in Spark configuration shortly.
170177
4. Select the Azure Blob Storage account. Make sure Authentication method is **Account key**. Currently Spark pool can only access Blob Storage account via account key.
171-
5. **Test connection** and click **Create**.
178+
5. **Test connection** and select **Create**.
172179
6. After creating the linked service to Blob Storage account, when you run Spark queries, make sure you run below Spark code in the notebook to get access to the Blob Storage account for the Spark session. Learn more about why you need to do this [here](./apache-spark-secure-credentials-with-tokenlibrary.md).
173180

174181
```python
@@ -190,7 +197,7 @@ After setting up storage connections, you can query the existing tables in the H
190197
- [SQL <-> Spark synchronization](../sql/develop-storage-files-spark-tables.md) doesn't work when using external HMS.
191198
- Only Azure SQL Database and Azure Database for MySQL are supported as external Hive Metastore database. Only SQL authorization is supported.
192199
- Currently Spark only works on external Hive tables and non-transactional/non-ACID managed Hive tables. It doesn't support Hive ACID/transactional tables.
193-
- Apache Ranger integration is not supported.
200+
- Apache Ranger integration isn't supported.
194201

195202
## Troubleshooting
196203
### See below error when querying a Hive table with data stored in Blob Storage
@@ -237,16 +244,16 @@ spark.hadoop.hive.synapse.externalmetastore.schema.usedefault false
237244
If you need to migrate your HMS version, we recommend using [hive schema tool](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool). And if the HMS has been used by HDInsight clusters, we suggest using [HDI provided version](../../hdinsight/interactive-query/apache-hive-migrate-workloads.md).
238245

239246
### HMS schema change for OSS HMS 3.1
240-
Synapse aims to work smoothly with computes from HDI. However HMS 3.1 in HDI 4.0 is not fully compatible with the OSS HMS 3.1. So please apply the following manually to your HMS 3.1 if it’s not provisioned by HDI.
247+
Synapse aims to work smoothly with computes from HDI. However HMS 3.1 in HDI 4.0 isn't fully compatible with the OSS HMS 3.1. Apply the following manually to your HMS 3.1 if it’s not provisioned by HDI.
241248

242249
```sql
243250
-- HIVE-19416
244251
ALTER TABLE TBLS ADD WRITE_ID bigint NOT NULL DEFAULT(0);
245252
ALTER TABLE PARTITIONS ADD WRITE_ID bigint NOT NULL DEFAULT(0);
246253
```
247254

248-
### When sharing the metastore with HDInsight 4.0 Spark cluster, I cannot see the tables
249-
If you want to share the Hive catalog with a spark cluster in HDInsight 4.0, please ensure your property `spark.hadoop.metastore.catalog.default` in Synapse spark aligns with the value in HDInsight spark. The default value for HDI spark is `spark` and the default value for Synapse spark is `hive`.
255+
### When sharing the metastore with HDInsight 4.0 Spark cluster, I can't see the tables
256+
If you want to share the Hive catalog with a spark cluster in HDInsight 4.0, ensure your property `spark.hadoop.metastore.catalog.default` in Synapse spark aligns with the value in HDInsight spark. The default value for HDI spark is `spark` and the default value for Synapse spark is `hive`.
250257

251258
### When sharing the Hive Metastore with HDInsight 4.0 Hive cluster, I can list the tables successfully, but only get empty result when I query the table
252259
As mentioned in the limitations, Synapse Spark pool only supports external hive tables and non-transactional/ACID managed tables, it doesn't support Hive ACID/transactional tables currently. In HDInsight 4.0 Hive clusters, all managed tables are created as ACID/transactional tables by default, that's why you get empty results when querying those tables.

0 commit comments

Comments
 (0)