Skip to content

Commit d7c6ba1

Browse files
authored
Update ranger-policies-for-spark.md
1 parent ab8e580 commit d7c6ba1

File tree

1 file changed

+25
-31
lines changed

1 file changed

+25
-31
lines changed

articles/hdinsight/spark/ranger-policies-for-spark.md

Lines changed: 25 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
---
2-
title: Configure Apache Ranger policies for Spark SQL in HDInsight with ESP
2+
title: Configure Apache Ranger policies for Spark SQL in HDInsight with Enterprise security package.
33
description: This article describes how to configure Ranger policies for Spark SQL with Enterprise security package.
44
ms.service: hdinsight-aks
55
ms.topic: how-to
66
ms.date: 02/12/2024
77
---
88

9-
# Configure Apache Ranger policies for Spark SQL in HDInsight with ESP
9+
# Configure Apache Ranger policies for Spark SQL in HDInsight with Enterprise security package
1010

1111
This article describes how to configure Ranger policies for Spark SQL with Enterprise security package in HDInsight.
1212

13-
In this tutorial, you'll learn,
13+
In this tutorial, you'll learn how to,
1414
- Create Apache Ranger policies
1515
- Verify the applied Ranger policies
1616
- Guideline for setting Apache Ranger for Spark SQL
1717

1818
## Prerequisites
1919

20-
An Apache Spark cluster in HDInsight version 5.1 with [Enterprise Security Package](../domain-joined/apache-domain-joined-configure-using-azure-adds.md).
20+
An Apache Spark cluster in HDInsight version 5.1 with [Enterprise security package](../domain-joined/apache-domain-joined-configure-using-azure-adds.md).
2121

2222
## Connect to Apache Ranger Admin UI
2323

@@ -35,12 +35,12 @@ See [Create an HDInsight cluster with ESP](../domain-joined/apache-domain-joined
3535

3636
## Create Ranger policy
3737

38-
In this section, you create two Ranger policies
38+
In this section, you create two Ranger policies;
3939

4040
- [Access policy for accessing “hivesampletable” from spark-sql](./ranger-policies-for-spark.md#to-create-ranger-policies)
4141
- [Masking policy for obfuscating the columns in hivesampletable](./ranger-policies-for-spark.md#create-ranger-masking-policy)
4242

43-
### To create Ranger policies
43+
### Create Ranger Access policies
4444

4545
1. Open Ranger Admin UI.
4646

@@ -74,18 +74,18 @@ In this section, you create two Ranger policies
7474
select * from hivesampletable limit 10;
7575
```
7676

77-
Result before policy was saved
77+
Result before policy was saved:
7878

7979
:::image type="content" source="./media/ranger-policies-for-spark/result-before-access-policy.png" alt-text="Screenshot shows result before access policy." lightbox="./media/ranger-policies-for-spark/result-before-access-policy.png":::
8080

81-
Result after policy is applied
81+
Result after policy is applied:
8282

8383
:::image type="content" source="./media/ranger-policies-for-spark/result-after-access-policy.png" alt-text="Screenshot shows result after access policy." lightbox="./media/ranger-policies-for-spark/result-after-access-policy.png":::
8484

8585
#### Create Ranger masking policy
8686

8787

88-
The following example explains how to create a policy to mask a column
88+
The following example explains how to create a policy to mask a column.
8989

9090
1. Create another policy under **Masking** tab with the following properties using Ranger Admin UI
9191

@@ -126,18 +126,16 @@ The following example explains how to create a policy to mask a column
126126

127127
### Known issues
128128

129-
1. Apache Ranger Spark-sql integration not works if Ranger admin is down.
130-
131-
1. Ranger DB could be overloaded if >20 spark sessions are launched concurrently because of continuous policy pulls.
132-
133-
1. In Ranger Audit logs, “Resource” column, on hover, doesn’t show the entire query which got executed.
129+
- Apache Ranger Spark-sql integration not works if Ranger admin is down.
130+
- Ranger DB could be overloaded if >20 spark sessions are launched concurrently because of continuous policy pulls.
131+
- In Ranger Audit logs, “Resource” column, on hover, doesn’t show the entire query which got executed.
134132

135133

136134

137135

138136
## Guideline for setting up Apache Ranger for Spark-sql
139137

140-
**Scenario 1**: Using new Ranger database while creating HDInsight 5.1 Spark cluster
138+
**Scenario 1**: Using new Ranger database while creating HDInsight 5.1 Spark cluster.
141139

142140
When the cluster is created, the relevant Ranger repo containing the Hive and Spark Ranger policies are created under the name <hive_and_spark> in the Hadoop SQL service on the Ranger DB.
143141

@@ -150,10 +148,9 @@ You can edit the policies and these policies gets applied to both Hive and Spark
150148
Points to consider:
151149

152150
1. In case you have two metastore databases with the same name used for both hive (for example, DB1) and spark (for example, DB1) catalogs.
153-
If spark uses spark catalog (metastore.catalog.default=spark), the policy applies to the DB1 of the spark catalog.
154-
If spark uses hive catalog (metastore.catalog.default=hive), the policies get applied to the DB1 of the hive catalog.
155-
156-
151+
- If spark uses spark catalog (metastore.catalog.default=spark), the policy applies to the DB1 of the spark catalog.
152+
- If spark uses hive catalog (metastore.catalog.default=hive), the policies get applied to the DB1 of the hive catalog.
153+
157154
There is no way of differentiating between DB1 of hive and spark catalog from the perspective of Ranger.
158155

159156

@@ -164,15 +161,12 @@ Points to consider:
164161

165162
Let’s say you create a table **table1** through Hive with current ‘xyz’ user. It creates an HDFS file called **table1.db** whose owner is ‘xyz’ user.
166163

167-
Now consider, the user ‘abc’ is used while launching the Spark Sql session. In this session of user ‘abc’, if you try to write anything to **table1**, it is bound to fail since the table owner is ‘xyz’.
168-
In such case, it is recommended to use the same user in Hive and Spark SQL for updating the table and that user should have sufficient privileges to perform update operations.
169-
170-
171-
164+
- Now consider, the user ‘abc’ is used while launching the Spark Sql session. In this session of user ‘abc’, if you try to write anything to **table1**, it is bound to fail since the table owner is ‘xyz’.
165+
- In such case, it is recommended to use the same user in Hive and Spark SQL for updating the table and that user should have sufficient privileges to perform update operations.
172166

173-
**Scenario 2**: Using existing Ranger database (with existing policies) while creating HDInsight 5.1 Spark cluster
167+
**Scenario 2**: Using existing Ranger database (with existing policies) while creating HDInsight 5.1 Spark cluster.
174168

175-
In this case when the HDI 5.1 cluster is created using existing Ranger database then, new Ranger repo gets created again on this database with the name of the new cluster in this format - <hive_and_spark>.
169+
- In this case when the HDI 5.1 cluster is created using existing Ranger database then, new Ranger repo gets created again on this database with the name of the new cluster in this format - <hive_and_spark>.
176170

177171

178172
:::image type="content" source="./media/ranger-policies-for-spark/new-repo-old-ranger-database.png" alt-text="Screenshot shows new repo old ranger database." lightbox="./media/ranger-policies-for-spark/new-repo-old-ranger-database.png":::
@@ -182,27 +176,27 @@ Points to consider:
182176
> [!NOTE]
183177
> Config updates can be performed by the user with Ambari admin privileges.
184178
185-
1. Open Ambari UI from your new HDInsight 5.1 cluster
179+
1. Open Ambari UI from your new HDInsight 5.1 cluster.
186180

187-
1. Go to Spark 3 service -> Configs
181+
1. Go to Spark 3 service -> Configs.
188182

189183
1. Open “ranger-spark-security” security config.
190184

191185

192186

193-
Or Open “ranger-spark-security” security config in /etc/spark3/conf using SSH
187+
Or Open “ranger-spark-security” security config in /etc/spark3/conf using SSH.
194188

195189
:::image type="content" source="./media/ranger-policies-for-spark/ambari-config-ranger-security.png" alt-text="Screenshot shows Ambari config ranger security." lightbox="./media/ranger-policies-for-spark/ambari-config-ranger-security.png":::
196190

197191

198192

199193
1. Edit two configurations “ranger.plugin.spark.service.name“ and “ranger.plugin.spark.policy.cache.dir “ to point to old policy repo “oldclustername_hive” and “Save” the configurations.
200194

201-
Ambari
195+
Ambari:
202196

203197
:::image type="content" source="./media/ranger-policies-for-spark/config-update-service-name-ambari.png" alt-text="Screenshot shows config update service name Ambari." lightbox="./media/ranger-policies-for-spark/config-update-service-name-ambari.png":::
204198

205-
XML file
199+
XML file:
206200

207201
:::image type="content" source="./media/ranger-policies-for-spark/config-update-xml.png" alt-text="Screenshot shows config update xml." lightbox="./media/ranger-policies-for-spark/config-update-xml.png":::
208202

0 commit comments

Comments
 (0)