You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/ranger-policies-for-spark.md
+25-31Lines changed: 25 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,23 @@
1
1
---
2
-
title: Configure Apache Ranger policies for Spark SQL in HDInsight with ESP
2
+
title: Configure Apache Ranger policies for Spark SQL in HDInsight with Enterprise security package.
3
3
description: This article describes how to configure Ranger policies for Spark SQL with Enterprise security package.
4
4
ms.service: hdinsight-aks
5
5
ms.topic: how-to
6
6
ms.date: 02/12/2024
7
7
---
8
8
9
-
# Configure Apache Ranger policies for Spark SQL in HDInsight with ESP
9
+
# Configure Apache Ranger policies for Spark SQL in HDInsight with Enterprise security package
10
10
11
11
This article describes how to configure Ranger policies for Spark SQL with Enterprise security package in HDInsight.
12
12
13
-
In this tutorial, you'll learn,
13
+
In this tutorial, you'll learn how to,
14
14
- Create Apache Ranger policies
15
15
- Verify the applied Ranger policies
16
16
- Guideline for setting Apache Ranger for Spark SQL
17
17
18
18
## Prerequisites
19
19
20
-
An Apache Spark cluster in HDInsight version 5.1 with [Enterprise Security Package](../domain-joined/apache-domain-joined-configure-using-azure-adds.md).
20
+
An Apache Spark cluster in HDInsight version 5.1 with [Enterprise security package](../domain-joined/apache-domain-joined-configure-using-azure-adds.md).
21
21
22
22
## Connect to Apache Ranger Admin UI
23
23
@@ -35,12 +35,12 @@ See [Create an HDInsight cluster with ESP](../domain-joined/apache-domain-joined
35
35
36
36
## Create Ranger policy
37
37
38
-
In this section, you create two Ranger policies
38
+
In this section, you create two Ranger policies;
39
39
40
40
-[Access policy for accessing “hivesampletable” from spark-sql](./ranger-policies-for-spark.md#to-create-ranger-policies)
41
41
-[Masking policy for obfuscating the columns in hivesampletable](./ranger-policies-for-spark.md#create-ranger-masking-policy)
42
42
43
-
### To create Ranger policies
43
+
### Create Ranger Access policies
44
44
45
45
1. Open Ranger Admin UI.
46
46
@@ -74,18 +74,18 @@ In this section, you create two Ranger policies
74
74
select * from hivesampletable limit 10;
75
75
```
76
76
77
-
Result before policy was saved
77
+
Result before policy was saved:
78
78
79
79
:::image type="content" source="./media/ranger-policies-for-spark/result-before-access-policy.png" alt-text="Screenshot shows result before access policy." lightbox="./media/ranger-policies-for-spark/result-before-access-policy.png":::
80
80
81
-
Result after policy is applied
81
+
Result after policy is applied:
82
82
83
83
:::image type="content" source="./media/ranger-policies-for-spark/result-after-access-policy.png" alt-text="Screenshot shows result after access policy." lightbox="./media/ranger-policies-for-spark/result-after-access-policy.png":::
84
84
85
85
#### Create Ranger masking policy
86
86
87
87
88
-
The following example explains how to create a policy to mask a column
88
+
The following example explains how to create a policy to mask a column.
89
89
90
90
1. Create another policy under **Masking** tab with the following properties using Ranger Admin UI
91
91
@@ -126,18 +126,16 @@ The following example explains how to create a policy to mask a column
126
126
127
127
### Known issues
128
128
129
-
1. Apache Ranger Spark-sql integration not works if Ranger admin is down.
130
-
131
-
1. Ranger DB could be overloaded if >20 spark sessions are launched concurrently because of continuous policy pulls.
132
-
133
-
1. In Ranger Audit logs, “Resource” column, on hover, doesn’t show the entire query which got executed.
129
+
- Apache Ranger Spark-sql integration not works if Ranger admin is down.
130
+
- Ranger DB could be overloaded if >20 spark sessions are launched concurrently because of continuous policy pulls.
131
+
- In Ranger Audit logs, “Resource” column, on hover, doesn’t show the entire query which got executed.
134
132
135
133
136
134
137
135
138
136
## Guideline for setting up Apache Ranger for Spark-sql
139
137
140
-
**Scenario 1**: Using new Ranger database while creating HDInsight 5.1 Spark cluster
138
+
**Scenario 1**: Using new Ranger database while creating HDInsight 5.1 Spark cluster.
141
139
142
140
When the cluster is created, the relevant Ranger repo containing the Hive and Spark Ranger policies are created under the name <hive_and_spark> in the Hadoop SQL service on the Ranger DB.
143
141
@@ -150,10 +148,9 @@ You can edit the policies and these policies gets applied to both Hive and Spark
150
148
Points to consider:
151
149
152
150
1. In case you have two metastore databases with the same name used for both hive (for example, DB1) and spark (for example, DB1) catalogs.
153
-
If spark uses spark catalog (metastore.catalog.default=spark), the policy applies to the DB1 of the spark catalog.
154
-
If spark uses hive catalog (metastore.catalog.default=hive), the policies get applied to the DB1 of the hive catalog.
155
-
156
-
151
+
- If spark uses spark catalog (metastore.catalog.default=spark), the policy applies to the DB1 of the spark catalog.
152
+
- If spark uses hive catalog (metastore.catalog.default=hive), the policies get applied to the DB1 of the hive catalog.
153
+
157
154
There is no way of differentiating between DB1 of hive and spark catalog from the perspective of Ranger.
158
155
159
156
@@ -164,15 +161,12 @@ Points to consider:
164
161
165
162
Let’s say you create a table **table1** through Hive with current ‘xyz’ user. It creates an HDFS file called **table1.db** whose owner is ‘xyz’ user.
166
163
167
-
Now consider, the user ‘abc’ is used while launching the Spark Sql session. In this session of user ‘abc’, if you try to write anything to **table1**, it is bound to fail since the table owner is ‘xyz’.
168
-
In such case, it is recommended to use the same user in Hive and Spark SQL for updating the table and that user should have sufficient privileges to perform update operations.
169
-
170
-
171
-
164
+
- Now consider, the user ‘abc’ is used while launching the Spark Sql session. In this session of user ‘abc’, if you try to write anything to **table1**, it is bound to fail since the table owner is ‘xyz’.
165
+
- In such case, it is recommended to use the same user in Hive and Spark SQL for updating the table and that user should have sufficient privileges to perform update operations.
172
166
173
-
**Scenario 2**: Using existing Ranger database (with existing policies) while creating HDInsight 5.1 Spark cluster
167
+
**Scenario 2**: Using existing Ranger database (with existing policies) while creating HDInsight 5.1 Spark cluster.
174
168
175
-
In this case when the HDI 5.1 cluster is created using existing Ranger database then, new Ranger repo gets created again on this database with the name of the new cluster in this format - <hive_and_spark>.
169
+
-In this case when the HDI 5.1 cluster is created using existing Ranger database then, new Ranger repo gets created again on this database with the name of the new cluster in this format - <hive_and_spark>.
176
170
177
171
178
172
:::image type="content" source="./media/ranger-policies-for-spark/new-repo-old-ranger-database.png" alt-text="Screenshot shows new repo old ranger database." lightbox="./media/ranger-policies-for-spark/new-repo-old-ranger-database.png":::
@@ -182,27 +176,27 @@ Points to consider:
182
176
> [!NOTE]
183
177
> Config updates can be performed by the user with Ambari admin privileges.
184
178
185
-
1. Open Ambari UI from your new HDInsight 5.1 cluster
179
+
1. Open Ambari UI from your new HDInsight 5.1 cluster.
186
180
187
-
1. Go to Spark 3 service -> Configs
181
+
1. Go to Spark 3 service -> Configs.
188
182
189
183
1. Open “ranger-spark-security” security config.
190
184
191
185
192
186
193
-
Or Open “ranger-spark-security” security config in /etc/spark3/conf using SSH
187
+
Or Open “ranger-spark-security” security config in /etc/spark3/conf using SSH.
1. Edit two configurations “ranger.plugin.spark.service.name“ and “ranger.plugin.spark.policy.cache.dir “ to point to old policy repo “oldclustername_hive” and “Save” the configurations.
200
194
201
-
Ambari
195
+
Ambari:
202
196
203
197
:::image type="content" source="./media/ranger-policies-for-spark/config-update-service-name-ambari.png" alt-text="Screenshot shows config update service name Ambari." lightbox="./media/ranger-policies-for-spark/config-update-service-name-ambari.png":::
0 commit comments