Skip to content

Commit 80b2ac1

Browse files
committed
Bringing even with upstream branch
2 parents 0a5b39b + ffa7e75 commit 80b2ac1

File tree

13 files changed

+144
-25
lines changed

13 files changed

+144
-25
lines changed

articles/hdinsight/domain-joined/apache-domain-joined-configure-using-azure-adds.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Configure a HDInsight cluster with Enterprise Security Package by using Azure AD DS
2+
title: Configure a HDInsight cluster with Enterprise Security Package by using Azure AD-DS
33
description: Learn how to set up and configure a HDInsight Enterprise Security Package cluster by using Azure Active Directory Domain Services
44
services: hdinsight
55
ms.service: hdinsight
@@ -13,16 +13,16 @@ ms.date: 09/24/2018
1313

1414
Enterprise Security Package (ESP) clusters provide multi-user access on Azure HDInsight clusters. HDInsight clusters with ESP are connected to a domain so that domain users can use their domain credentials to authenticate with the clusters and run big data jobs.
1515

16-
In this article, you learn how to configure a HDInsight cluster with ESP by using Azure Active Directory Domain Services (Azure AD DS).
16+
In this article, you learn how to configure a HDInsight cluster with ESP by using Azure Active Directory Domain Services (Azure AD-DS).
1717

18-
## Enable Azure AD DS
18+
## Enable Azure AD-DS
1919

20-
Enabling Azure AD DS is a prerequisite before you can create a HDInsight cluster with ESP. For more information, see [Enable Azure Active Directory Domain Services using the Azure portal](../../active-directory-domain-services/active-directory-ds-getting-started.md).
20+
Enabling Azure AD-DS is a prerequisite before you can create a HDInsight cluster with ESP. For more information, see [Enable Azure Active Directory Domain Services using the Azure portal](../../active-directory-domain-services/active-directory-ds-getting-started.md).
2121

2222
> [!NOTE]
23-
> Only tenant administrators have the privileges to create an Azure AD DS instance. If you use Azure Data Lake Storage Gen1 as the default storage for HDInsight, make sure that the default Azure AD tenant for Data Lake Storage Gen1 is same as the domain for the HDInsight cluster. Because Hadoop relies on Kerberos and basic authentication, multi-factor authentication needs to be disabled for users who will access the cluster.
23+
> Only tenant administrators have the privileges to create an Azure AD-DS instance. If you use Azure Data Lake Storage Gen1 as the default storage for HDInsight, make sure that the default Azure AD tenant for Data Lake Storage Gen1 is same as the domain for the HDInsight cluster. Because Hadoop relies on Kerberos and basic authentication, multi-factor authentication needs to be disabled for users who will access the cluster.
2424
25-
After you provision the Azure AD DS instance, create a service account in Azure Active Directory (Azure AD) with the right permissions. If this service account already exists, reset its password and wait until it syncs to Azure AD DS. This reset will result in the creation of the Kerberos password hash, and it might take up to 30 minutes to sync to Azure AD DS.
25+
After you provision the Azure AD-DS instance, create a service account in Azure Active Directory (Azure AD) with the right permissions. If this service account already exists, reset its password and wait until it syncs to Azure AD-DS. This reset will result in the creation of the Kerberos password hash, and it might take up to 30 minutes to sync to Azure AD-DS.
2626

2727
The service account needs the following privileges:
2828

@@ -32,36 +32,39 @@ The service account needs the following privileges:
3232
> [!NOTE]
3333
> Because Apache Zeppelin uses the domain name to authenticate the administrative service account, the service account *must* have the same domain name as its UPN suffix for Apache Zeppelin to function properly.
3434
35-
To learn more about OUs and how to manage them, see [Create an OU on an Azure AD DS managed domain](../../active-directory-domain-services/active-directory-ds-admin-guide-create-ou.md).
35+
To learn more about OUs and how to manage them, see [Create an OU on an Azure AD-DS managed domain](../../active-directory-domain-services/active-directory-ds-admin-guide-create-ou.md).
3636

37-
Secure LDAP is for an Azure AD DS managed domain. For more information, see [Configure secure LDAP for an Azure AD DS managed domain](../../active-directory-domain-services/active-directory-ds-admin-guide-configure-secure-ldap.md).
37+
Secure LDAP is for an Azure AD-DS managed domain. For more information, see [Configure secure LDAP for an Azure AD-DS managed domain](../../active-directory-domain-services/active-directory-ds-admin-guide-configure-secure-ldap.md).
3838

3939
## Create a HDInsight cluster with ESP
4040

41-
The next step is to create the HDInsight cluster by using Azure AD DS and the service account that you created in the previous section.
41+
The next step is to create the HDInsight cluster with ESP enabled using Azure AD-DS and the service account that you created in the previous section.
4242

43-
It's easier to place both the Azure AD DS instance and the HDInsight cluster in the same Azure virtual network. If you choose to put them in different virtual networks, you must peer those virtual networks so that HDInsight VMs have a line of sight to the domain controller for joining the VMs. For more information, see [Virtual network peering](../../virtual-network/virtual-network-peering-overview.md).
43+
It's easier to place both the Azure AD-DS instance and the HDInsight cluster in the same Azure virtual network. If you choose to put them in different virtual networks, you must peer those virtual networks so that HDInsight VMs have a line of sight to the domain controller for joining the VMs. For more information, see [Virtual network peering](../../virtual-network/virtual-network-peering-overview.md).
4444

45-
When you create a HDInsight cluster with ESP, you must supply the following parameters:
45+
When you create an HDInsight cluster, you have the option to enable Enterprise Security Package to connect your cluster with Azure AD-DS. ESP is only available in HDI 3.6+ for Spark, Interactive, Hadoop, and HBase cluster types.
4646

47-
- **Domain name**: The domain name that's associated with Azure AD DS. An example is contoso.onmicrosoft.com.
47+
![Azure HDInsight Security and networking](./media/apache-domain-joined-configure-using-azure-adds/hdinsight-create-cluster-security-networking.png)
4848

49-
- **Domain user name**: The service account in the Azure ADDS DC managed domain that you created in the previous section. An example is [email protected]. This domain user will be the administrator of this HDInsight cluster.
49+
Once you enable ESP, common misconfigurations related to Azure AD-DS will be automatically detected and validated.
5050

51-
- **Domain password**: The password of the service account.
51+
![Azure HDInsight Enterprise security package domain validation](./media/apache-domain-joined-configure-using-azure-adds/hdinsight-create-cluster-esp-domain-validate.png)
5252

53-
- **Organizational unit**: The distinguished name of the OU that you want to use with the HDInsight cluster. An example is OU=HDInsightOU,DC=contoso,DC=onmicrosoft,DC=com. If this OU does not exist, the HDInsight cluster tries to create the OU by using the privileges that the service account has. For example, if the service account is in the Azure AD DS Administrators group, it has the right permissions to create an OU. Otherwise, you need to create the OU first and give the service account full control over that OU. For more information, see [Create an OU on an Azure AD DS managed domain](../../active-directory-domain-services/active-directory-ds-admin-guide-create-ou.md).
53+
Early detection saves time by allowing you to fix errors before creating the cluster.
5454

55-
> [!IMPORTANT]
56-
> Include all of the DCs, separated by commas, after the OU (for example, OU=HDInsightOU,DC=contoso,DC=onmicrosoft,DC=com).
55+
![Azure HDInsight Enterprise security package failed domain validation](./media/apache-domain-joined-configure-using-azure-adds/hdinsight-create-cluster-esp-domain-validate-failed.png)
56+
57+
When you create a HDInsight cluster with ESP, you must supply the following parameters:
58+
59+
- **Cluster admin user**: Choose an admin for your cluster from your list of Active Directory users.
60+
61+
- **Cluster access groups**: The security groups whose users you want to sync to the cluster. For example, HiveUsers. If you want to specify multiple user groups, separate them by semicolon ‘;’. The group(s) must exist in the directory prior to provisioning. For more information, see [Create a group and add members in Azure Active Directory](../../active-directory/fundamentals/active-directory-groups-create-azure-portal.md). If the group does not exist, an error occurs: "Group HiveUsers not found in the Active Directory."
5762

5863
- **LDAPS URL**: An example is ldaps://contoso.onmicrosoft.com:636.
5964

6065
> [!IMPORTANT]
6166
> Enter the complete URL, including "ldaps://" and the port number (:636).
6267
63-
- **Access user group**: The security groups whose users you want to sync to the cluster. For example, HiveUsers. If you want to specify multiple user groups, separate them by semicolon ‘;’. The group(s) must exist in the directory prior to provisioning. For more information, see [Create a group and add members in Azure Active Directory](../../active-directory/fundamentals/active-directory-groups-create-azure-portal.md). If the group does not exist, an error occurs: "Group HiveUsers not found in the Active Directory."
64-
6568
The following screenshot shows the configurations in the Azure portal:
6669

6770
![Azure HDInsight ESP Active Directory Domain Services configuration](./media/apache-domain-joined-configure-using-azure-adds/hdinsight-domain-joined-configuration-azure-aads-portal.png).
Loading
Loading
Loading

articles/hdinsight/hadoop/TOC.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010
- name: Hadoop components on HDInsight
1111
href: ../hdinsight-component-versioning.md
1212
maintainContext: true
13+
- name: HDInsight 4.0
14+
href: ../hdinsight-version-release.md
15+
maintainContext: true
1316
- name: Quickstart
1417
items:
1518
- name: Create Hadoop cluster - Portal

articles/hdinsight/hbase/TOC.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
- name: Hadoop components on HDInsight
88
href: ../hdinsight-component-versioning.md
99
maintainContext: true
10+
- name: HDInsight 4.0
11+
href: ../hdinsight-version-release.md
12+
maintainContext: true
1013
- name: Get started
1114
items:
1215
- name: Start with HBase & NoSQL

articles/hdinsight/hdinsight-authorize-users-to-ambari.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Authorize users for Ambari Views - Azure HDInsight
3-
description: 'How to manage Ambari user and group permissions for domain-joined HDInsight clusters.'
3+
description: 'How to manage Ambari user and group permissions for HDInsight clusters with ESP enabled.'
44
services: hdinsight
55
author: maxluk
66
ms.reviewer: jasonh
@@ -13,14 +13,14 @@ ms.author: maxluk
1313
---
1414
# Authorize users for Ambari Views
1515

16-
[Domain-joined HDInsight clusters](./domain-joined/apache-domain-joined-introduction.md) provide enterprise-grade capabilities, including Azure Active Directory-based authentication. You can [synchronize new users](hdinsight-sync-aad-users-to-cluster.md) added to Azure AD groups that have been provided access to the cluster, allowing those specific users to perform certain actions. Working with users, groups, and permissions in Ambari is supported for both domain-joined HDInsight cluster and standard HDInsight cluster.
16+
[Enterprise Security Package (ESP) enabled HDInsight clusters](./domain-joined/apache-domain-joined-introduction.md) provide enterprise-grade capabilities, including Azure Active Directory-based authentication. You can [synchronize new users](hdinsight-sync-aad-users-to-cluster.md) added to Azure AD groups that have been provided access to the cluster, allowing those specific users to perform certain actions. Working with users, groups, and permissions in Ambari is supported for both ESP HDInsight clusters and standard HDInsight clusters.
1717

1818
Active Directory users can log on to the cluster nodes using their domain credentials. They can also use their domain credentials to authenticate cluster interactions with other approved endpoints like Hue, Ambari Views, ODBC, JDBC, PowerShell, and REST APIs.
1919

2020
> [!WARNING]
2121
> Do not change the password of the Ambari watchdog (hdinsightwatchdog) on your Linux-based HDInsight cluster. Changing the password breaks the ability to use script actions or perform scaling operations with your cluster.
2222
23-
If you have not already done so, follow [these instructions](./domain-joined/apache-domain-joined-configure.md) to provision a new domain-joined cluster.
23+
If you have not already done so, follow [these instructions](./domain-joined/apache-domain-joined-configure.md) to provision a new ESP cluster.
2424

2525
## Access the Ambari management page
2626

@@ -113,7 +113,7 @@ The List view provides quick editing capabilities in two categories: Users and G
113113

114114
![Roles list view - users](./media/hdinsight-authorize-users-to-ambari/roles-list-view-users.png)
115115

116-
* The Groups category of the List view displays all groups, and the role assigned to each group. In our example, the list of groups is synchronized from the Azure AD groups specified in the **Access user group** property of the cluster's Domain settings. See [Create a Domain-joined HDInsight cluster](./domain-joined/apache-domain-joined-configure-using-azure-adds.md#create-a-domain-joined-hdinsight-cluster).
116+
* The Groups category of the List view displays all groups, and the role assigned to each group. In our example, the list of groups is synchronized from the Azure AD groups specified in the **Access user group** property of the cluster's Domain settings. See [Create a HDInsight cluster with ESP enabled](./domain-joined/apache-domain-joined-configure-using-azure-adds.md#create-a-hdinsight-cluster-with-esp).
117117

118118
![Roles list view - groups](./media/hdinsight-authorize-users-to-ambari/roles-list-view-groups.png)
119119

@@ -133,7 +133,7 @@ We have assigned our Azure AD domain user "hiveuser2" to the *Cluster User* role
133133

134134
## Next steps
135135

136-
* [Configure Hive policies in Domain-joined HDInsight](./domain-joined/apache-domain-joined-run-hive.md)
137-
* [Manage Domain-joined HDInsight clusters](./domain-joined/apache-domain-joined-manage.md)
136+
* [Configure Hive policies in HDInsight with ESP](./domain-joined/apache-domain-joined-run-hive.md)
137+
* [Manage ESP HDInsight clusters](./domain-joined/apache-domain-joined-manage.md)
138138
* [Use the Hive View with Hadoop in HDInsight](hadoop/apache-hadoop-use-hive-ambari-view.md)
139139
* [Synchronize Azure AD users to the cluster](hdinsight-sync-aad-users-to-cluster.md)
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: HDInsight 4.0 overview (Preview) - Azure
3+
description: Compare HDInsight 3.6 to HDInsight 4.0 features, limitations, and upgrade recommendations.
4+
ms.service: hdinsight
5+
author: mamccrea
6+
ms.author: mamccrea
7+
ms.reviewer: mamccrea
8+
ms.topic: overview
9+
ms.date: 09/24/2018
10+
---
11+
12+
# HDInsight 4.0 overview (Preview)
13+
14+
Azure HDInsight is one of the most popular services among enterprise customers for open-source Hadoop and Spark analytics on Azure. HDInsight (HDI) 4.0 is a cloud distribution of the Hadoop components from the [Hortonworks Data Platform (HDP) 3.0](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/release-notes/content/relnotes.html). This article provides information about the most recent Azure HDInsight release and how to upgrade.
15+
16+
## What's new in HDI 4.0?
17+
18+
### Hive 3.0 and LLAP
19+
20+
Hive low-latency analytical processing (LLAP) uses persistent query servers and in-memory caching to deliver quick SQL query results on data in remote cloud storage. Hive LLAP leverages a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is similar to Hive without LLAP, with worker tasks running inside LLAP daemons instead of containers.
21+
22+
Benefits of Hive LLAP include:
23+
24+
* Ability to perform deep SQL analytics, such as complex joins, subqueries, windowing functions, sorting, user-defined functions, and complex aggregations, without sacrificing performance and scalability.
25+
26+
* Interactive queries against data in the same storage where data is prepared, eliminating the need to move data from storage to another engine for analytical processing.
27+
28+
* Caching query results allows previously computed query results to be reused, which saves time and resources spent running the cluster tasks required for the query.
29+
30+
### Hive dynamic materialized views
31+
32+
Hive now supports dynamic materialized views, or pre-computation of relevant summaries, used to accelerate query processing in data warehouses. Materialized views can be stored natively in Hive, and can seamlessly use LLAP acceleration.
33+
34+
### Hive transactional tables
35+
36+
HDI 4.0 includes Apache Hive 3, which requires atomicity, consistency, isolation, and durability (ACID) compliance for transactional tables that reside in the Hive warehouse. ACID-compliant tables and table data are accessed and managed by Hive. Data in create, retrieve, update, and delete (CRUD) tables must be in Optimized Row Column (ORC) file format, but insert-only tables support all file formats.
37+
38+
* ACID v2 has performance improvements in both storage format and the execution engine.
39+
40+
* ACID is enabled by default to allow full support for data updates.
41+
42+
* Improved ACID capabilities allow you to update and delete at row level.
43+
44+
* No Performance overhead.
45+
46+
* No Bucketing required.
47+
48+
* Spark can read and write to Hive ACID tables via Hive Warehouse Connector.
49+
50+
Learn more about [Apache Hive 3](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/hive-overview/content/hive_whats_new_in_this_release_hive.html).
51+
52+
### Apache Spark
53+
54+
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames for streaming reads and writes into transactional and streaming Hive tables from Spark.
55+
56+
Spark executors can connect directly to Hive LLAP daemons to retrieve and update data in a transactional manner, allowing Hive to keep control of the data.
57+
58+
Apache Spark on HDInsight 4.0 supports the following scenarios:
59+
60+
* Run machine learning model training over the same transactional table used for reporting.
61+
* Use ACID transactions to safely add columns from Spark ML to a Hive table.
62+
* Run a Spark streaming job on the change feed from a Hive streaming table.
63+
* Create ORC files directly from a Spark Structured Streaming job.
64+
65+
You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark, resulting in inconsistent results, duplicate data, or data corruption. In HDI 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.
66+
67+
Learn more about [Apache Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/spark-overview/content/analyzing_data_with_apache_spark.html).
68+
69+
70+
### Oozie
71+
72+
Apache Oozie 4.3.1 is included in HDI 4.0 with the following changes:
73+
74+
* Oozie no longer runs Hive actions. Hive CLI has been removed and replaced with BeeLine.
75+
76+
* You can exclude unwanted dependencies from share lib by including an exclude pattern in your **job.properties** file.
77+
78+
Learn more about [Apache Oozie](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/release-notes/content/patch_oozie.html).
79+
80+
## How to upgrade to HDI 4.0
81+
82+
As with any major release, it's important to thoroughly test your components before implementing the latest version in a production environment. HDI 4.0 is available for you to begin the upgrade process, but HDI 3.6 is the default option to prevent accidental mishaps.
83+
84+
There is no supported upgrade path from previous versions of HDI to HDI 4.0. Because Metastore and blob data formats have changed, HDI 4.0 is not compatible with previous versions. It is important that you keep your new HDI 4.0 environment separate from your current production environment. If you deploy HDI 4.0 to your current environment, your Metastore will be upgraded and cannot be reversed.
85+
86+
## Limitations
87+
88+
* HDI 4.0 does not support MapReduce. Use Tez instead. Learn more about [Apache Tez](https://tez.apache.org/).
89+
90+
* Hive View is no longer available in HDI 4.0.
91+
92+
## Next steps
93+
94+
* [Azure HDInsight Documentation](index.yml)
95+
* [Release Notes](hdinsight-release-notes.md)

0 commit comments

Comments
 (0)