You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md
+94-14Lines changed: 94 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.custom: hdinsightactive
9
9
ms.topic: conceptual
10
-
ms.date: 08/27/2019
10
+
ms.date: 11/04/2019
11
11
---
12
12
13
13
# Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters
@@ -29,13 +29,13 @@ To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, fol
29
29
30
30
### Create a user-assigned managed identity
31
31
32
-
Create a user-assigned managed identity, if you don’t already have one.
32
+
Create a user-assigned managed identity, if you don’t already have one.
33
33
34
34
1. Sign in to the [Azure portal](https://portal.azure.com).
35
35
1. In the upper-left click **Create a resource**.
36
36
1. In the search box, type **user assigned** and click **User Assigned Managed Identity**.
37
37
1. Click **Create**.
38
-
1. Enter a name for your managed identity, select the correct subscription, resource group and location.
38
+
1. Enter a name for your managed identity, select the correct subscription, resource group, and location.
39
39
1. Click **Create**.
40
40
41
41
For more information on how managed identities work in Azure HDInsight, see [Managed identities in Azure HDInsight](hdinsight-managed-identities.md).
@@ -44,7 +44,7 @@ For more information on how managed identities work in Azure HDInsight, see [Man
44
44
45
45
### Create a Data Lake Storage Gen2 account
46
46
47
-
Create an Azure Data Lake Storage Gen2 storage account.
47
+
Create an Azure Data Lake Storage Gen2 storage account.
48
48
49
49
1. Sign in to the [Azure portal](https://portal.azure.com).
50
50
1. In the upper-left click **Create a resource**.
@@ -68,28 +68,28 @@ Assign the managed identity to the **Storage Blob Data Owner** role on the stora
68
68
69
69
1. In the [Azure portal](https://portal.azure.com), go to your storage account.
70
70
1. Select your storage account, then select **Access control (IAM)** to display the access control settings for the account. Select the **Role assignments** tab to see the list of role assignments.
71
-
71
+
72
72

73
-
73
+
74
74
1. Select the **+ Add role assignment** button to add a new role.
75
75
1. In the **Add role assignment** window, select the **Storage Blob Data Owner** role. Then, select the subscription that has the managed identity and storage account. Next, search to locate the user-assigned managed identity that you created previously. Finally, select the managed identity, and it will be listed under **Selected members**.
76
-
76
+
77
77

78
-
78
+
79
79
1. Select **Save**. The user-assigned identity that you selected is now listed under the selected role.
80
80
1. After this initial setup is complete, you can create a cluster through the portal. The cluster must be in the same Azure region as the storage account. In the **Storage** section of the cluster creation menu, select the following options:
81
-
81
+
82
82
* For **Primary storage type**, select **Azure Data Lake Storage Gen2**.
83
83
* Under **Select a Storage account**, search for and select the newly created Data Lake Storage Gen2 storage account.
84
-
84
+
85
85

86
-
86
+
87
87
* Under **Identity**, select the correct subscription and the newly created user-assigned managed identity.
88
88
89
89

90
90
91
91
> [!Note]
92
-
> To add a secondary Data Lake Storage Gen2 account, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 storage account that you wish to add.Please be advised that adding a secondary Data Lake Storage Gen2 account via the "Additional storage accounts" blade on HDInsight is not supported.
92
+
> To add a secondary Data Lake Storage Gen2 account, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 storage account that you wish to add.Please be advised that adding a secondary Data Lake Storage Gen2 account via the "Additional storage accounts" blade on HDInsight is not supported.
93
93
94
94
## Create a cluster with Data Lake Storage Gen2 through the Azure CLI
95
95
@@ -108,10 +108,10 @@ The code snippet below does the following initial steps:
108
108
109
109
1. Logs in to your Azure account.
110
110
1. Sets the active subscription where the create operations will be done.
111
-
1. Creates a new resource group for the new deployment activities.
111
+
1. Creates a new resource group for the new deployment activities.
112
112
1. Creates a user-assigned managed identity.
113
113
1. Adds an extension to the Azure CLI to use features for Data Lake Storage Gen2.
114
-
1. Creates a new Data Lake Storage Gen2 account by using the `--hierarchical-namespace true` flag.
114
+
1. Creates a new Data Lake Storage Gen2 account by using the `--hierarchical-namespace true` flag.
115
115
116
116
```azurecli
117
117
az login
@@ -166,7 +166,87 @@ The lifecycle of a user-assigned identity is managed separately from the lifecyc
166
166
167
167
To set permissions for users to query data, use Azure AD security groups as the assigned principal in ACLs. Don't directly assign file-access permissions to individual users or service principals. When you use Azure AD security groups to control the flow of permissions, you can add and remove users or service principals without reapplying ACLs to an entire directory structure. You only have to add or remove the users from the appropriate Azure AD security group. ACLs aren't inherited, so reapplying ACLs requires updating the ACL on every file and subdirectory.
168
168
169
+
## Access files from the cluster
170
+
171
+
There are several ways you can access the files in Data Lake Storage Gen2 from an HDInsight cluster.
172
+
173
+
***Using the fully qualified name**. With this approach, you provide the full path to the file that you want to access.
* **Using the shortened path format**. With this approach, you replace the path up to the cluster root with:
180
+
181
+
```
182
+
abfs:///<file.path>/
183
+
```
184
+
185
+
* **Using the relative path**. With this approach, you only provide the relative path to the file that you want to access.
186
+
187
+
```
188
+
/<file.path>/
189
+
```
190
+
191
+
### Data access examples
192
+
193
+
Examples are based on an [ssh connection](./hdinsight-hadoop-linux-use-ssh-unix.md) to the head node of the cluster. The examples use all three URI schemes. Replace `CONTAINERNAME` and `STORAGEACCOUNT` with the relevant values
*[Azure HDInsight integration with Data Lake Storage Gen2 preview - ACL and security update](https://azure.microsoft.com/blog/azure-hdinsight-integration-with-data-lake-storage-gen-2-preview-acl-and-security-update/)
172
251
*[Introduction to Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-introduction.md)
252
+
*[Tutorial: Extract, transform, and load data using Interactive Query in Azure HDInsight](./interactive-query/interactive-query-tutorial-analyze-flight-data.md)
0 commit comments