Skip to content

Commit cf2e91f

Browse files
authored
Merge pull request #104361 from dagiro/ts_esp5
ts_esp5
2 parents c151f8d + 8d07871 commit cf2e91f

File tree

2 files changed

+164
-0
lines changed

2 files changed

+164
-0
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -852,6 +852,8 @@
852852
href: ./domain-joined/hdinsight-use-oozie-domain-joined-clusters.md
853853
- name: Concepts
854854
items:
855+
- name: Enterprise security general guidelines
856+
href: ./domain-joined/general-guidelines.md
855857
- name: Plan for ESP clusters
856858
href: ./domain-joined/apache-domain-joined-architecture.md
857859
- name: HDInsight virtual network architecture
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: Enterprise security general guidelines in Azure HDInsight
3+
description: Some best practices that should make Enterprise Security Package deployment and management easier.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: conceptual
9+
ms.date: 02/13/2020
10+
---
11+
12+
# Enterprise security general information and guidelines in Azure HDInsight
13+
14+
When deploying a secure HDInsight cluster, there are some best practices that should make the deployment and cluster management easier. Some general information and guidelines are discussed here.
15+
16+
## Use of secure cluster
17+
18+
### Recommended
19+
20+
* Cluster will be used by multiple users at the same time.
21+
* Users have different levels of access to the same data.
22+
23+
### Not necessary
24+
25+
* You're going to run only automated jobs (like single user account), a standard cluster is good enough.
26+
* You can do the data import using a standard cluster and use the same storage account on a different secure cluster where users can run analytics jobs.
27+
28+
## Use of local account
29+
30+
* If you use a shared user account or a local account, then it will be difficult to identify who used the account to change the config or service.
31+
* Using local accounts is problematic when users are no longer part of the organization.
32+
33+
## Ranger
34+
35+
### Policies
36+
37+
* By default, Ranger uses **Deny** as the policy.
38+
39+
* When data access is made through a service where authorization is enabled:
40+
* Ranger authorization plugin is invoked and given the context of the request.
41+
* Ranger applies the policies configured for the service. If the Ranger policies fail, the access check is deferred to the file system. Some services like MapReduce only check if the file / folder being owned by the same user who is submitting the request. Services like Hive, check for either ownership match or appropriate filesystem permissions (`rwx`).
42+
43+
* For Hive, in addition to having the permissions to do Create / Update / Delete permissions, the user should have `rwx`permissions on the directory on storage and all sub directories.
44+
45+
* Policies can be applied to groups (preferable) instead of individuals.
46+
47+
* Ranger authorizer will evaluate all Ranger policies for that service for each request. This evaluation could have an impact on the time take to accept the job or query.
48+
49+
### Storage access
50+
51+
* If the storage type is WASB, then no OAuth token is involved.
52+
* If Ranger has performed the authorization, then the storage access happens using the Managed Identity.
53+
* If Ranger didn't perform any authorization, then the storage access happens using the user's OAuth token.
54+
55+
### Hierarchical name space
56+
57+
When hierarchical name space in not enabled:
58+
59+
* There are no inherited permissions.
60+
* Only filesystem permission that works is **Storage Data XXXX** RBAC role, to be assigned to the user directly in Azure portal.
61+
62+
### Default HDFS permissions
63+
64+
* By default, users don't have access to the **/** folder on HDFS (they need to be in the storage blob owner role for access to succeed).
65+
* For the staging directory for mapreduce and others, a user-specific directory is created and provided `sticky _wx` permissions. Users can create files and folders underneath, but can't look at other items.
66+
67+
### URL auth
68+
69+
If the url auth is enabled:
70+
71+
* The config will contain what prefixes are covered in the url auth (like `adl://`).
72+
* If the access is for this url, then Ranger will check if the user is in the allow list.
73+
* Ranger won't check any of the fine grained policies.
74+
75+
## Resource groups
76+
77+
Use a new resource group for each cluster so that you can distinguish between cluster resources.
78+
79+
## NSGs, firewalls, and internal gateway
80+
81+
* Use network security groups (NSGs) to lock down virtual networks.
82+
* Use firewall to handle outbound access policies.
83+
* Use the internal gateway that isn't open to the public internet.
84+
85+
## Azure Active Directory
86+
87+
[Azure Active Directory](../../active-directory/fundamentals/active-directory-whatis.md) (Azure AD) is Microsoft's cloud-based identity and access management service.
88+
89+
### Policies
90+
91+
* Disable conditional access policy using the IP address based policy. This requires service endpoints to be enabled on the VNETs where the clusters are deployed. If you use an external service for MFA (something other than AAD), the IP address based policy won't work
92+
93+
* `AllowCloudPasswordValidation` policy is required for federated users. Since HDInsight uses the username / password directly to get tokens from Azure AD, this policy has to be enabled for all federated users.
94+
95+
* Enable service endpoints if you require conditional access bypass using Trusted IPs.
96+
97+
### Groups
98+
99+
* Always deploy clusters with a group.
100+
* Use Azure AD to manage group memberships (easier than trying to manage the individual services in the cluster).
101+
102+
### User accounts
103+
104+
* Use a unique user account for each scenario. For example, use an account for import, use another for query or other processing jobs.
105+
* Use group-based Ranger policies instead of individual policies.
106+
* Have a plan on how to manage users who shouldn't have access to clusters anymore.
107+
108+
## Azure Active Directory Domain Services
109+
110+
[Azure Active Directory Domain Services](../../active-directory-domain-services/overview.md) (Azure AD DS) provides managed domain services such as domain join, group policy, lightweight directory access protocol (LDAP), and Kerberos / NTLM authentication that is fully compatible with Windows Server Active Directory.
111+
112+
Azure AD DS is required for secure clusters to join a domain.
113+
HDInsight can't depend on on-premise domain controllers or custom domain controllers, as it introduces too many fault points, credential sharing, DNS permissions, and so on. For more information, see [Azure AD DS FAQs](../../active-directory-domain-services/faqs.md).
114+
115+
### Azure AD DS instance
116+
117+
* Create the instance with the `.onmicrosoft.com domain`. This way, there won’t be multiple DNS servers serving the domain.
118+
* Create a self-signed certificate for the LDAPS and upload it to Azure AD DS.
119+
* Use a peered virtual network for deploying clusters (when you have a number of teams deploying HDInsight ESP clusters, this will be helpful). This ensures that you don't need to open up ports (NSGs) on the virtual network with domain controller.
120+
* Configure the DNS for the virtual network properly (the Azure AD DS domain name should resolve without any hosts file entries).
121+
* If you're restricting outbound traffic, make sure that you have read through the [firewall support in HDInsight](../hdinsight-restrict-outbound-traffic.md)
122+
123+
### Properties synced from Azure AD to Azure AD DS
124+
125+
* Azure AD connect syncs from on-premise to Azure AD.
126+
* Azure AD DS syncs from Azure AD.
127+
128+
Azure AD DS syncs objects from Azure AD periodically. The Azure AD DS blade on the Azure portal displays the sync status. During each stage of sync, unique properties may get into conflict and renamed. Pay attention to the property mapping from Azure AD to Azure AD DS.
129+
130+
For more information, see [Azure AD UserPrincipalName population](../../active-directory/hybrid/plan-connect-userprincipalname.md), and [How Azure AD DS synchronization works](../../active-directory-domain-services/synchronization.md).
131+
132+
### Password hash sync
133+
134+
* Passwords are synced differently from other object types. Only non-reversible password hashes are synced in Azure AD and Azure AD DS
135+
* On-premise to Azure AD has to be enabled through AD Connect
136+
* Azure AD to Azure AD DS sync is automatic (latencies are under 20 minutes).
137+
* Password hashes are synced only when there's a changed password. When you enable password hash sync, all existing passwords don't get synced automatically as they're stored irreversibly. When you change the password, password hashes get synced.
138+
139+
### Computer objects location
140+
141+
Each cluster is associated with a single OU. An internal user is provisioned in the OU. All the nodes are domain joined into the same OU.
142+
143+
### Active Directory administrative tools
144+
145+
For steps on how to install the Active Directory administrative tools on a Windows Server VM, see [Install management tools](../../active-directory-domain-services/tutorial-create-management-vm.md).
146+
147+
## Troubleshooting
148+
149+
### Cluster creation fails repeatedly
150+
151+
Most common reasons:
152+
153+
* DNS configuration isn't correct, domain join of cluster nodes fail.
154+
* NSGs are too restrictive, preventing domain join.
155+
* Managed Identity doesn't have sufficient permissions.
156+
* Cluster name isn't unique on the first six characters (either with another live cluster, or with a deleted cluster).
157+
158+
## Next steps
159+
160+
* [Enterprise Security Package configurations with Azure Active Directory Domain Services in HDInsight](./apache-domain-joined-configure-using-azure-adds.md)
161+
162+
* [Synchronize Azure Active Directory users to an HDInsight cluster](../hdinsight-sync-aad-users-to-cluster.md).

0 commit comments

Comments
 (0)