You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-multiple-clusters-data-lake-store.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,15 @@ description: Learn how to use more than one HDInsight cluster with a single Data
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
6
ms.custom: hdinsightactive
7
-
ms.date: 08/16/2022
7
+
ms.date: 09/15/2023
8
8
---
9
9
10
10
# Use multiple HDInsight clusters with an Azure Data Lake Storage account
11
11
12
12
Starting with HDInsight version 3.5, you can create HDInsight clusters with Azure Data Lake Storage accounts as the default filesystem.
13
13
Data Lake Storage supports unlimited storage that makes it ideal not only for hosting large amounts of data; but also for hosting multiple HDInsight clusters that share a single Data Lake Storage Account. For instructions on how to create an HDInsight cluster with Data Lake Storage as the storage, see [Quickstart: Set up clusters in HDInsight](./hdinsight-hadoop-provision-linux-clusters.md).
14
14
15
-
This article provides recommendations to the Data Lake Storage administrator for setting up a single and shared Data Lake Storage Account that can be used across multiple **active** HDInsight clusters. These recommendations apply to hosting multiple secure as well as non-secure Apache Hadoop clusters on a shared Data Lake Storage account.
15
+
This article provides recommendations to the Data Lake Storage administrator for setting up a single and shared Data Lake Storage Account that can be used across multiple **active** HDInsight clusters. These recommendations apply to host multiple secure and non-secure Apache Hadoop clusters on a shared Data Lake Storage account.
16
16
17
17
## Data Lake Storage file and folder level ACLs
18
18
@@ -28,7 +28,7 @@ To enable this folder structure to be effectively used by HDInsight clusters, th
|/clusters/finance | rwxr-x--t |admin |FINGRP |Service principal |`rwx`|- |- |
32
32
33
33
In the table,
34
34
@@ -55,31 +55,31 @@ We recommend that input data to a job, and the outputs from a job be stored in a
55
55
56
56
## Limit on clusters sharing a single storage account
57
57
58
-
The limit on the number of clusters that can share a single Data Lake Storage account depends on the workload being run on those clusters. Having too many clusters or very heavy workloads on the clusters that share a storage account might cause the storage account ingress/egress to get throttled.
58
+
The limit on the number of clusters that can share a single Data Lake Storage account depends on the workload being run on those clusters. Having too many clusters or heavy workloads on the clusters that share a storage account might cause the storage account ingress/egress to get throttled.
59
59
60
60
## Support for Default-ACLs
61
61
62
-
When creating a Service Principal with named-user access (as shown in the table above), we recommend **not** adding the named-user with a default-ACL. Provisioning named-user access using default-ACLs results in the assignment of 770 permissions for owning-user, owning-group, and others. While this default value of 770 doesn't take away permissions from owning-user (7) or owning-group (7), it takes away all permissions for others (0). This results in a known issue with one particular use-case that is discussed in detail in the [Known issues and workarounds](#known-issues-and-workarounds) section.
62
+
When creating a Service Principal with named-user access (as shown in the table), we recommend **not** adding the named-user with a default-ACL. Provisioning named-user access using default-ACLs results in the assignment of 770 permissions for owning-user, owning-group, and others. While this default value of 770 doesn't take away permissions from owning-user (7) or owning-group (7), it takes away all permissions for others (0). This results in a known issue with one particular use-case that is discussed in detail in the [Known issues and workarounds](#known-issues-and-workarounds) section.
63
63
64
64
## Known issues and workarounds
65
65
66
66
This section lists the known issues for using HDInsight with Data Lake Storage, and their workarounds.
When a new Azure Data Lake Storage account is created, the root directory is automatically provisioned with Access-ACL permission bits set to 770. The root folder’s owning user is set to the user that created the account (the Data Lake Storage admin) and the owning group is set to the primary group of the user that created the account. No access is provided for "others".
70
+
When a new Azure Data Lake Storage account is created, the root directory is automatically provisioned with Access-ACL permission bits set to 770. The root folder’s owning user is set to the user that created the account (the Data Lake Storage admin) and the owning group is set to the primary group of the user that created the account. No access is provided for "others."
71
71
72
-
These settings are known to affect one specific HDInsight use-case captured in [YARN 247](https://hwxmonarch.atlassian.net/browse/YARN-247). Job submissions could fail with an error message similar to this:
72
+
These settings are known to affect one specific HDInsight use-case captured in [YARN 247](https://hwxmonarch.atlassian.net/browse/YARN-247). Job submissions could fail with an error message:
73
73
74
74
```output
75
75
Resource XXXX is not publicly accessible and as such cannot be part of the public cache.
76
76
```
77
77
78
-
As stated in the YARN JIRA linked earlier, while localizing public resources, the localizer validates that all the requested resources are indeed public by checking their permissions on the remote file-system. Any LocalResource that doesn't fit that condition is rejected for localization. The check for permissions, includes read-access to the file for "others". This scenario doesn't work out-of-the-box when hosting HDInsight clusters on Azure Data Lake, since Azure Data Lake denies all access to "others" at root folder level.
78
+
As stated in the YARN JIRA linked earlier, while localizing public resources, the localizer validates that all the requested resources are indeed public by checking their permissions on the remote file-system. Any LocalResource that doesn't fit that condition is rejected for localization. The check for permissions includes read-access to the file for "others." This scenario doesn't work out-of-the-box when hosting HDInsight clusters on Azure Data Lake, since Azure Data Lake denies all access to "others" at root folder level.
79
79
80
80
#### Workaround
81
81
82
-
Set read-execute permissions for **others** through the hierarchy, for example, at **/**, **/clusters** and **/clusters/finance** as shown in the table above.
82
+
Set read-execute permissions for **others** through the hierarchy, for example, at **/**, **/clusters** and **/clusters/finance** as shown in the table.
0 commit comments