Skip to content

Commit 96dbe91

Browse files
authored
Update hdinsight-multiple-clusters-data-lake-store.md
1 parent 12df6cd commit 96dbe91

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/hdinsight/hdinsight-multiple-clusters-data-lake-store.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/15/2023
1212
Starting with HDInsight version 3.5, you can create HDInsight clusters with Azure Data Lake Storage accounts as the default filesystem.
1313
Data Lake Storage supports unlimited storage that makes it ideal not only for hosting large amounts of data; but also for hosting multiple HDInsight clusters that share a single Data Lake Storage Account. For instructions on how to create an HDInsight cluster with Data Lake Storage as the storage, see [Quickstart: Set up clusters in HDInsight](./hdinsight-hadoop-provision-linux-clusters.md).
1414

15-
This article provides recommendations to the Data Lake Storage administrator for setting up a single and shared Data Lake Storage Account that can be used across multiple **active** HDInsight clusters. These recommendations apply to hosting multiple secure as well as non-secure Apache Hadoop clusters on a shared Data Lake Storage account.
15+
This article provides recommendations to the Data Lake Storage administrator for setting up a single and shared Data Lake Storage Account that can be used across multiple **active** HDInsight clusters. These recommendations apply to host multiple secure and non-secure Apache Hadoop clusters on a shared Data Lake Storage account.
1616

1717
## Data Lake Storage file and folder level ACLs
1818

@@ -28,7 +28,7 @@ To enable this folder structure to be effectively used by HDInsight clusters, th
2828
|---------|---------|---------|---------|---------|---------|---------|---------|
2929
|/ | rwxr-x--x |admin |admin |Service principal |--x |FINGRP |r-x |
3030
|/clusters | rwxr-x--x |admin |admin |Service principal |--x |FINGRP |r-x |
31-
|/clusters/finance | rwxr-x--t |admin |FINGRP |Service principal |rwx |- |- |
31+
|/clusters/finance | rwxr-x--t |admin |FINGRP |Service principal | `rwx` |- |- |
3232

3333
In the table,
3434

@@ -55,31 +55,31 @@ We recommend that input data to a job, and the outputs from a job be stored in a
5555

5656
## Limit on clusters sharing a single storage account
5757

58-
The limit on the number of clusters that can share a single Data Lake Storage account depends on the workload being run on those clusters. Having too many clusters or very heavy workloads on the clusters that share a storage account might cause the storage account ingress/egress to get throttled.
58+
The limit on the number of clusters that can share a single Data Lake Storage account depends on the workload being run on those clusters. Having too many clusters or heavy workloads on the clusters that share a storage account might cause the storage account ingress/egress to get throttled.
5959

6060
## Support for Default-ACLs
6161

62-
When creating a Service Principal with named-user access (as shown in the table above), we recommend **not** adding the named-user with a default-ACL. Provisioning named-user access using default-ACLs results in the assignment of 770 permissions for owning-user, owning-group, and others. While this default value of 770 doesn't take away permissions from owning-user (7) or owning-group (7), it takes away all permissions for others (0). This results in a known issue with one particular use-case that is discussed in detail in the [Known issues and workarounds](#known-issues-and-workarounds) section.
62+
When creating a Service Principal with named-user access (as shown in the table), we recommend **not** adding the named-user with a default-ACL. Provisioning named-user access using default-ACLs results in the assignment of 770 permissions for owning-user, owning-group, and others. While this default value of 770 doesn't take away permissions from owning-user (7) or owning-group (7), it takes away all permissions for others (0). This results in a known issue with one particular use-case that is discussed in detail in the [Known issues and workarounds](#known-issues-and-workarounds) section.
6363

6464
## Known issues and workarounds
6565

6666
This section lists the known issues for using HDInsight with Data Lake Storage, and their workarounds.
6767

6868
### Publicly visible localized Apache Hadoop YARN resources
6969

70-
When a new Azure Data Lake Storage account is created, the root directory is automatically provisioned with Access-ACL permission bits set to 770. The root folder’s owning user is set to the user that created the account (the Data Lake Storage admin) and the owning group is set to the primary group of the user that created the account. No access is provided for "others".
70+
When a new Azure Data Lake Storage account is created, the root directory is automatically provisioned with Access-ACL permission bits set to 770. The root folder’s owning user is set to the user that created the account (the Data Lake Storage admin) and the owning group is set to the primary group of the user that created the account. No access is provided for "others."
7171

72-
These settings are known to affect one specific HDInsight use-case captured in [YARN 247](https://hwxmonarch.atlassian.net/browse/YARN-247). Job submissions could fail with an error message similar to this:
72+
These settings are known to affect one specific HDInsight use-case captured in [YARN 247](https://hwxmonarch.atlassian.net/browse/YARN-247). Job submissions could fail with an error message:
7373

7474
```output
7575
Resource XXXX is not publicly accessible and as such cannot be part of the public cache.
7676
```
7777

78-
As stated in the YARN JIRA linked earlier, while localizing public resources, the localizer validates that all the requested resources are indeed public by checking their permissions on the remote file-system. Any LocalResource that doesn't fit that condition is rejected for localization. The check for permissions, includes read-access to the file for "others". This scenario doesn't work out-of-the-box when hosting HDInsight clusters on Azure Data Lake, since Azure Data Lake denies all access to "others" at root folder level.
78+
As stated in the YARN JIRA linked earlier, while localizing public resources, the localizer validates that all the requested resources are indeed public by checking their permissions on the remote file-system. Any LocalResource that doesn't fit that condition is rejected for localization. The check for permissions includes read-access to the file for "others." This scenario doesn't work out-of-the-box when hosting HDInsight clusters on Azure Data Lake, since Azure Data Lake denies all access to "others" at root folder level.
7979

8080
#### Workaround
8181

82-
Set read-execute permissions for **others** through the hierarchy, for example, at **/**, **/clusters** and **/clusters/finance** as shown in the table above.
82+
Set read-execute permissions for **others** through the hierarchy, for example, at **/**, **/clusters** and **/clusters/finance** as shown in the table.
8383

8484
## See also
8585

0 commit comments

Comments
 (0)