Skip to content

Commit bda2bed

Browse files
authored
Merge pull request #281523 from sreekzz/ADLS-Gen1-Change
Removed ADLS Gen1 text
2 parents 1fbb253 + 57e6cc6 commit bda2bed

33 files changed

+101
-643
lines changed

articles/hdinsight/.openpublishing.redirection.hdinsight.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,21 @@
99
"redirect_url": "/azure/hdinsight/kafka/apache-kafka-introduction",
1010
"redirect_document_id": false
1111
},
12+
{
13+
"source_path_from_root": "/articles/hdinsight/spark/apache-spark-use-with-data-lake-store.md",
14+
"redirect_url": "/azure/hdinsight/overview-data-lake-storage-gen2",
15+
"redirect_document_id": false
16+
},
17+
{
18+
"source_path_from_root": "/articles/hdinsight/overview-data-lake-storage-gen1.md",
19+
"redirect_url": "/azure/hdinsight/overview-data-lake-storage-gen2",
20+
"redirect_document_id": false
21+
},
22+
{
23+
"source_path_from_root": "/articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen1.md",
24+
"redirect_url": "/azure/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2",
25+
"redirect_document_id": false
26+
},
1227
{
1328
"source_path_from_root": "/articles/hdinsight/hdinsight-50-component-versioning.md",
1429
"redirect_url": "/azure/hdinsight/hdinsight-5x-component-versioning",

articles/hdinsight/TOC.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,6 @@ items:
9090
items:
9191
- name: Azure Storage overview
9292
href: overview-azure-storage.md
93-
- name: Azure Data Lake Storage Gen1 overview
94-
href: overview-data-lake-storage-gen1.md
9593
- name: Azure Data Lake Storage Gen2 overview
9694
href: overview-data-lake-storage-gen2.md
9795
- name: How-to guides
@@ -129,8 +127,6 @@ items:
129127
href: hdinsight-hadoop-use-data-lake-storage-gen2-portal.md
130128
- name: Use Data Lake Storage Gen2 with Azure CLI
131129
href: hdinsight-hadoop-use-data-lake-storage-gen2-azure-cli.md
132-
- name: Use Data Lake Storage Gen1
133-
href: hdinsight-hadoop-use-data-lake-storage-gen1.md
134130
- name: Extend clusters
135131
items:
136132
- name: Install HDInsight apps
@@ -483,8 +479,6 @@ items:
483479
href: ./spark/apache-spark-zeppelin-notebook.md
484480
- name: Use with other Azure services
485481
items:
486-
- name: Use with Data Lake Storage
487-
href: ./spark/apache-spark-use-with-data-lake-store.md
488482
- name: Connect to Azure SQL Database
489483
href: ./spark/apache-spark-connect-to-sql-database.md
490484
- name: Run Azure ML workloads with AutoML

articles/hdinsight/domain-joined/domain-joined-authentication-issues.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@ title: Authentication issues in Azure HDInsight
33
description: Authentication issues in Azure HDInsight
44
ms.service: hdinsight
55
ms.topic: troubleshooting
6-
ms.date: 05/09/2024
6+
ms.date: 07/09/2024
77
---
88

99
# Authentication issues in Azure HDInsight
1010

1111
This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.
1212

13-
On secure clusters backed by Azure Data Lake (Gen1 or Gen2), when domain users sign in to the cluster services through HDI Gateway (like signing in to the Apache Ambari portal), HDI Gateway tries to obtain an OAuth token from Microsoft Entra first, and then get a Kerberos ticket from Microsoft Entra Domain Services. Authentication can fail in either of these stages. This article is aimed at debugging some of those issues.
13+
On secure clusters backed by Azure Data Lake Gen2, when domain users sign in to the cluster services through HDI Gateway (like signing in to the Apache Ambari portal), HDI Gateway tries to obtain an OAuth token from Microsoft Entra first, and then get a Kerberos ticket from Microsoft Entra Domain Services. Authentication can fail in either of these stages. This article is aimed at debugging some of those issues.
1414

15-
When the authentication fails, you gets prompted for credentials. If you cancel this dialog, the error message is printed. Here are some of the common error messages:
15+
When the authentication fails, you get prompted for credentials. If you cancel this dialog, the error message is printed. Here are some of the common error messages:
1616

1717
## invalid_grant or unauthorized_client, 50126
1818

@@ -118,7 +118,7 @@ Sign in denied.
118118

119119
### Cause
120120

121-
To get to this stage, your OAuth authentication isn't an issue, but Kerberos authentication is. If this cluster is backed by ADLS, OAuth sign in has succeeded before Kerberos auth is attempted. On WASB clusters, OAuth sign in isn't attempted. There could be many reasons for Kerberos failure - like password hashes are out of sync, user account locked out in Microsoft Entra Domain Services, and so on. Password hashes sync only when the user changes password. When you create the Microsoft Entra Domain Services instance, it will start syncing passwords that are changed after the creation. It can't retroactively sync passwords that were set before its inception.
121+
To get to this stage, your OAuth authentication isn't an issue, but Kerberos authentication is. If this cluster backed by ADLS, OAuth sign-in succeeded before Kerberos auth is attempted. On WASB clusters, OAuth sign-in isn't attempted. There could be many reasons for Kerberos failure - like password hashes are out of sync, user account locked out in Microsoft Entra Domain Services, and so on. Password hashes sync only when the user changes password. When you create the Microsoft Entra Domain Services instance, it will start syncing passwords that are changed after the creation. It can't retroactively sync passwords that were set before its inception.
122122

123123
### Resolution
124124

@@ -128,7 +128,7 @@ Try to SSH into a You need to try to authenticate (kinit) using the same user cr
128128

129129
---
130130

131-
## kinit fails
131+
## Kinit fails
132132

133133
### Issue
134134

@@ -154,7 +154,7 @@ Ways to find `sAMAccountName`:
154154

155155
---
156156

157-
## kinit fails with Preauthentication failure
157+
## Kinit fails with Preauthentication failure
158158

159159
### Issue
160160

@@ -194,13 +194,13 @@ User receives error message `Error fetching access token`.
194194

195195
### Cause
196196

197-
This error occurs intermittently when users try to access the ADLS Gen2 using ACLs and the Kerberos token has expired.
197+
This error occurs intermittently when users try to access the ADLS Gen2 using ACLs and the Kerberos token expired.
198198

199199
### Resolution
200200

201201
* For Azure Data Lake Storage Gen1, clean browser cache and log into Ambari again.
202202

203-
* For Azure Data Lake Storage Gen2, Run `/usr/lib/hdinsight-common/scripts/RegisterKerbTicketAndOAuth.sh <upn>` for the user the user is trying to login as
203+
* For Azure Data Lake Storage Gen2, Run `/usr/lib/hdinsight-common/scripts/RegisterKerbTicketAndOAuth.sh <upn>` user is trying to log in as
204204

205205
---
206206

articles/hdinsight/domain-joined/hdinsight-security-overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Overview of enterprise security in Azure HDInsight
33
description: Learn the various methods to ensure enterprise security in Azure HDInsight.
44
ms.service: hdinsight
55
ms.topic: overview
6-
ms.date: 06/15/2024
6+
ms.date: 07/23/2024
77
#Customer intent: As a user of Azure HDInsight, I want to learn the means that Azure HDInsight offers to ensure security for the enterprise.
88
---
99

@@ -67,7 +67,7 @@ The following table provides links to resources for each type of security soluti
6767

6868
| Security area | Solutions available | Responsible party |
6969
|---|---|---|
70-
| Data Access Security | Configure [access control lists ACLs](../../storage/blobs/data-lake-storage-access-control.md) for Azure Data Lake Storage Gen1 and Gen2 | Customer |
70+
| Data Access Security | Configure [access control lists ACLs](../../storage/blobs/data-lake-storage-access-control.md) for Azure Data Lake Storage Gen2 | Customer |
7171
| | Enable the ["Secure transfer required"](../../storage/common/storage-require-secure-transfer.md) property on storage accounts. | Customer |
7272
| | Configure [Azure Storage firewalls](../../storage/common/storage-network-security.md) and virtual networks | Customer |
7373
| | Configure [Azure virtual network service endpoints](../../virtual-network/virtual-network-service-endpoints-overview.md) for Azure Cosmos DB and [Azure SQL DB](/azure/azure-sql/database/vnet-service-endpoint-rule-overview) | Customer |

articles/hdinsight/hadoop/apache-hadoop-introduction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ description: An introduction to HDInsight, and the Apache Hadoop technology stac
44
ms.service: hdinsight
55
ms.topic: overview
66
ms.custom: hdinsightactive, mvc
7-
ms.date: 05/09/2024
7+
ms.date: 07/23/2024
88
#Customer intent: As a data analyst, I want understand what is Hadoop and how it is offered in Azure HDInsight so that I can decide on using HDInsight instead of on premises clusters.
99
---
1010

1111
# What is Apache Hadoop in Azure HDInsight?
1212

1313
[Apache Hadoop](https://hadoop.apache.org/) was the original open-source framework for distributed processing and analysis of big data sets on clusters. The Hadoop ecosystem includes related software and utilities, including Apache Hive, Apache HBase, Spark, Kafka, and many others.
1414

15-
Azure HDInsight is a fully managed, full-spectrum, open-source analytics service in the cloud for enterprises. The Apache Hadoop cluster type in Azure HDInsight allows you to use the [Apache Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html), [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) resource management, and a simple [MapReduce](https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) programming model to process and analyze batch data in parallel. Hadoop clusters in HDInsight are compatible with [Azure Blob storage](../../storage/common/storage-introduction.md), [Azure Data Lake Storage Gen1](../../data-lake-store/data-lake-store-overview.md), or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md).
15+
Azure HDInsight is a fully managed, full-spectrum, open-source analytics service in the cloud for enterprises. The Apache Hadoop cluster type in Azure HDInsight allows you to use the [Apache Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html), [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) resource management, and a simple [MapReduce](https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) programming model to process and analyze batch data in parallel. Hadoop clusters in HDInsight are compatible with [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md).
1616

1717
To see available Hadoop technology stack components on HDInsight, see [Components and versions available with HDInsight](../hdinsight-component-versioning.md). To read more about Hadoop in HDInsight, see the [Azure features page for HDInsight](https://azure.microsoft.com/services/hdinsight/).
1818

articles/hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@ In this section, you create a Hadoop cluster in HDInsight using the Azure portal
4141
|Region | From the drop-down list, select a region where the cluster is created. Choose a location closer to you for better performance. |
4242
|Cluster type| Select **Select cluster type**. Then select **Hadoop** as the cluster type.|
4343
|Version|From the drop-down list, select a **version**. Use the default version if you don't know what to choose.|
44-
|Cluster login username and password | The default login name is **admin**. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ```' ` "```). Make sure you **do not provide** common passwords such as "Pass@word1".|
44+
|Cluster sign in username and password | The default sign in name is **admin**. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one nonalphanumeric character (except characters ```' ` "```). Make sure you **do not provide** common passwords such as "Pass@word1".|
4545
|Secure Shell (SSH) username | The default username is `sshuser`. You can provide another name for the SSH username. |
46-
|Use cluster login password for SSH| Select this check box to use the same password for SSH user as the one you provided for the cluster login user.|
46+
|Use cluster sign in password for SSH| Select this check box to use the same password for SSH user as the one you provided for the cluster sign in user.|
4747

4848
:::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/azure-portal-cluster-basics.png" alt-text="HDInsight Linux get started provide cluster basic values." border="true":::
4949

@@ -60,7 +60,7 @@ In this section, you create a Hadoop cluster in HDInsight using the Azure portal
6060

6161
:::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/azure-portal-cluster-storage.png" alt-text="HDInsight Linux get started provide cluster storage values." border="true":::
6262

63-
Each cluster has an [Azure Storage account](../hdinsight-hadoop-use-blob-storage.md), an [Azure Data Lake Gen1](../hdinsight-hadoop-use-data-lake-storage-gen1.md), or an [`Azure Data Lake Storage Gen2`](../hdinsight-hadoop-use-data-lake-storage-gen2.md) dependency. It's referred as the default storage account. HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
63+
Each cluster has an [Azure Storage account](../hdinsight-hadoop-use-blob-storage.md), or an [`Azure Data Lake Storage Gen2`](../hdinsight-hadoop-use-data-lake-storage-gen2.md) dependency. It's referred as the default storage account. HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
6464

6565
Select the **Review + create** tab.
6666

@@ -115,7 +115,7 @@ In this section, you create a Hadoop cluster in HDInsight using the Azure portal
115115

116116
:::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/hdinsight-linux-hive-view-save-results.png" alt-text="Save result of Apache Hive query." border="true":::
117117

118-
After you've completed a Hive job, you can [export the results to Azure SQL Database or SQL Server database](apache-hadoop-use-sqoop-mac-linux.md), you can also [visualize the results using Excel](apache-hadoop-connect-excel-power-query.md). For more information about using Hive in HDInsight, see [Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache log4j file](hdinsight-use-hive.md).
118+
After you've completed a Hive job, you can [export the results to Azure SQL Database or SQL Server database](apache-hadoop-use-sqoop-mac-linux.md), you can also [visualize the results using Excel](apache-hadoop-connect-excel-power-query.md). For more information about using Hive in HDInsight, see [Use Apache Hive and HiveQL with Apache Hadoop in HDInsight to analyze a sample Apache Log4j file](hdinsight-use-hive.md).
119119

120120
## Clean up resources
121121

@@ -130,7 +130,7 @@ After you complete the quickstart, you may want to delete the cluster. With HDIn
130130

131131
:::image type="content" source="./media/apache-hadoop-linux-create-cluster-get-started-portal/hdinsight-delete-cluster.png" alt-text="Azure HDInsight delete cluster." border="true":::
132132

133-
2. If you want to delete the cluster as well as the default storage account, select the resource group name (highlighted in the previous screenshot) to open the resource group page.
133+
2. If you want to delete the cluster and the default storage account, select the resource group name (highlighted in the previous screenshot) to open the resource group page.
134134

135135
3. Select **Delete resource group** to delete the resource group, which contains the cluster and the default storage account. Note deleting the resource group deletes the storage account. If you want to keep the storage account, choose to delete the cluster only.
136136

articles/hdinsight/hadoop/apache-hadoop-linux-tutorial-get-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Two Azure resources are defined in the template:
6666

6767
## Review deployed resources
6868

69-
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Blob Storage](../hdinsight-hadoop-use-blob-storage.md) account, an [Azure Data Lake Storage Gen1](../hdinsight-hadoop-use-data-lake-storage-gen1.md), or an [`Azure Data Lake Storage Gen2`](../hdinsight-hadoop-use-data-lake-storage-gen2.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
69+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Blob Storage](../hdinsight-hadoop-use-blob-storage.md) account, or an [`Azure Data Lake Storage Gen2`](../hdinsight-hadoop-use-data-lake-storage-gen2.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
7070

7171
> [!NOTE]
7272
> For other cluster creation methods and understanding the properties used in this quickstart, see [Create HDInsight clusters](../hdinsight-hadoop-provision-linux-clusters.md).

articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-storage.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn storage best practices for migrating on-premises Hadoop clust
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive
7-
ms.date: 05/22/2024
7+
ms.date: 07/24/2024
88
---
99

1010
# Migrate on-premises Apache Hadoop clusters to Azure HDInsight
@@ -71,15 +71,6 @@ For more information, see the following articles:
7171
- [Monitor, diagnose, and troubleshoot Microsoft Azure Storage](../../storage/common/storage-monitoring-diagnosing-troubleshooting.md)
7272
- [Monitor a storage account in the Azure portal](../../storage/common/manage-storage-analytics-logs.md)
7373

74-
### Azure Data Lake Storage Gen1
75-
76-
Azure Data Lake Storage Gen1 implements HDFS and POSIX style access control model. It provides first class integration with Microsoft Entra ID for fine grained access control. There are no limits to the size of data that it can store, or its ability to run massively parallel analytics.
77-
78-
For more information, see the following articles:
79-
80-
- [Create HDInsight clusters with Data Lake Storage Gen1 using the Azure portal](../../data-lake-store/data-lake-store-hdinsight-hadoop-use-portal.md)
81-
- [Use Data Lake Storage Gen1 with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-storage-gen1.md)
82-
8374
### Azure Data Lake Storage Gen2
8475

8576
Azure Data Lake Storage Gen2 is the latest storage offering. It unifies the core capabilities from the first generation of Azure Data Lake Storage Gen1 with a Hadoop compatible file system endpoint directly integrated into Azure Blob Storage. This enhancement combines the scale and cost benefits of object storage with the reliability and performance typically associated only with on-premises file systems.

0 commit comments

Comments
 (0)