Skip to content

Commit f850648

Browse files
authored
Merge pull request #94713 from dagiro/storage3
storage3
2 parents 445d9d2 + 771c956 commit f850648

File tree

1 file changed

+94
-14
lines changed

1 file changed

+94
-14
lines changed

articles/hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md

Lines changed: 94 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.custom: hdinsightactive
99
ms.topic: conceptual
10-
ms.date: 08/27/2019
10+
ms.date: 11/04/2019
1111
---
1212

1313
# Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters
@@ -29,13 +29,13 @@ To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, fol
2929

3030
### Create a user-assigned managed identity
3131

32-
Create a user-assigned managed identity, if you don’t already have one.
32+
Create a user-assigned managed identity, if you don’t already have one.
3333

3434
1. Sign in to the [Azure portal](https://portal.azure.com).
3535
1. In the upper-left click **Create a resource**.
3636
1. In the search box, type **user assigned** and click **User Assigned Managed Identity**.
3737
1. Click **Create**.
38-
1. Enter a name for your managed identity, select the correct subscription, resource group and location.
38+
1. Enter a name for your managed identity, select the correct subscription, resource group, and location.
3939
1. Click **Create**.
4040

4141
For more information on how managed identities work in Azure HDInsight, see [Managed identities in Azure HDInsight](hdinsight-managed-identities.md).
@@ -44,7 +44,7 @@ For more information on how managed identities work in Azure HDInsight, see [Man
4444

4545
### Create a Data Lake Storage Gen2 account
4646

47-
Create an Azure Data Lake Storage Gen2 storage account.
47+
Create an Azure Data Lake Storage Gen2 storage account.
4848

4949
1. Sign in to the [Azure portal](https://portal.azure.com).
5050
1. In the upper-left click **Create a resource**.
@@ -68,28 +68,28 @@ Assign the managed identity to the **Storage Blob Data Owner** role on the stora
6868

6969
1. In the [Azure portal](https://portal.azure.com), go to your storage account.
7070
1. Select your storage account, then select **Access control (IAM)** to display the access control settings for the account. Select the **Role assignments** tab to see the list of role assignments.
71-
71+
7272
![Screenshot showing storage access control settings](./media/hdinsight-hadoop-use-data-lake-storage-gen2/portal-access-control.png)
73-
73+
7474
1. Select the **+ Add role assignment** button to add a new role.
7575
1. In the **Add role assignment** window, select the **Storage Blob Data Owner** role. Then, select the subscription that has the managed identity and storage account. Next, search to locate the user-assigned managed identity that you created previously. Finally, select the managed identity, and it will be listed under **Selected members**.
76-
76+
7777
![Screenshot showing how to assign an RBAC role](./media/hdinsight-hadoop-use-data-lake-storage-gen2/add-rbac-role3-window.png)
78-
78+
7979
1. Select **Save**. The user-assigned identity that you selected is now listed under the selected role.
8080
1. After this initial setup is complete, you can create a cluster through the portal. The cluster must be in the same Azure region as the storage account. In the **Storage** section of the cluster creation menu, select the following options:
81-
81+
8282
* For **Primary storage type**, select **Azure Data Lake Storage Gen2**.
8383
* Under **Select a Storage account**, search for and select the newly created Data Lake Storage Gen2 storage account.
84-
84+
8585
![Storage settings for using Data Lake Storage Gen2 with Azure HDInsight](./media/hdinsight-hadoop-use-data-lake-storage-gen2/primary-storage-type-adls-gen2.png)
86-
86+
8787
* Under **Identity**, select the correct subscription and the newly created user-assigned managed identity.
8888

8989
![Identity settings for using Data Lake Storage Gen2 with HDInsight](./media/hdinsight-hadoop-use-data-lake-storage-gen2/managed-identity-cluster-creation.png)
9090

9191
> [!Note]
92-
> To add a secondary Data Lake Storage Gen2 account, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 storage account that you wish to add.Please be advised that adding a secondary Data Lake Storage Gen2 account via the "Additional storage accounts" blade on HDInsight is not supported.
92+
> To add a secondary Data Lake Storage Gen2 account, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 storage account that you wish to add.Please be advised that adding a secondary Data Lake Storage Gen2 account via the "Additional storage accounts" blade on HDInsight is not supported.
9393
9494
## Create a cluster with Data Lake Storage Gen2 through the Azure CLI
9595

@@ -108,10 +108,10 @@ The code snippet below does the following initial steps:
108108

109109
1. Logs in to your Azure account.
110110
1. Sets the active subscription where the create operations will be done.
111-
1. Creates a new resource group for the new deployment activities.
111+
1. Creates a new resource group for the new deployment activities.
112112
1. Creates a user-assigned managed identity.
113113
1. Adds an extension to the Azure CLI to use features for Data Lake Storage Gen2.
114-
1. Creates a new Data Lake Storage Gen2 account by using the `--hierarchical-namespace true` flag.
114+
1. Creates a new Data Lake Storage Gen2 account by using the `--hierarchical-namespace true` flag.
115115

116116
```azurecli
117117
az login
@@ -166,7 +166,87 @@ The lifecycle of a user-assigned identity is managed separately from the lifecyc
166166

167167
To set permissions for users to query data, use Azure AD security groups as the assigned principal in ACLs. Don't directly assign file-access permissions to individual users or service principals. When you use Azure AD security groups to control the flow of permissions, you can add and remove users or service principals without reapplying ACLs to an entire directory structure. You only have to add or remove the users from the appropriate Azure AD security group. ACLs aren't inherited, so reapplying ACLs requires updating the ACL on every file and subdirectory.
168168

169+
## Access files from the cluster
170+
171+
There are several ways you can access the files in Data Lake Storage Gen2 from an HDInsight cluster.
172+
173+
* **Using the fully qualified name**. With this approach, you provide the full path to the file that you want to access.
174+
175+
```
176+
abfs://<containername>@<accountname>.dfs.core.windows.net/<file.path>/
177+
```
178+
179+
* **Using the shortened path format**. With this approach, you replace the path up to the cluster root with:
180+
181+
```
182+
abfs:///<file.path>/
183+
```
184+
185+
* **Using the relative path**. With this approach, you only provide the relative path to the file that you want to access.
186+
187+
```
188+
/<file.path>/
189+
```
190+
191+
### Data access examples
192+
193+
Examples are based on an [ssh connection](./hdinsight-hadoop-linux-use-ssh-unix.md) to the head node of the cluster. The examples use all three URI schemes. Replace `CONTAINERNAME` and `STORAGEACCOUNT` with the relevant values
194+
195+
#### A few hdfs commands
196+
197+
1. Create a simple file on local storage.
198+
199+
```bash
200+
touch testFile.txt
201+
```
202+
203+
1. Create directories on cluster storage.
204+
205+
```bash
206+
hdfs dfs -mkdir abfs://[email protected]/sampledata1/
207+
hdfs dfs -mkdir abfs:///sampledata2/
208+
hdfs dfs -mkdir /sampledata3/
209+
```
210+
211+
1. Copy data from local storage to cluster storage.
212+
213+
```bash
214+
hdfs dfs -copyFromLocal testFile.txt abfs://[email protected]/sampledata1/
215+
hdfs dfs -copyFromLocal testFile.txt abfs:///sampledata2/
216+
hdfs dfs -copyFromLocal testFile.txt /sampledata3/
217+
```
218+
219+
1. List directory contents on cluster storage.
220+
221+
```bash
222+
hdfs dfs -ls abfs://[email protected]/sampledata1/
223+
hdfs dfs -ls abfs:///sampledata2/
224+
hdfs dfs -ls /sampledata3/
225+
```
226+
227+
#### Creating a Hive table
228+
229+
Three file locations are shown for illustrative purposes. For actual execution, use only one of the `LOCATION` entries.
230+
231+
```hql
232+
DROP TABLE myTable;
233+
CREATE EXTERNAL TABLE myTable (
234+
t1 string,
235+
t2 string,
236+
t3 string,
237+
t4 string,
238+
t5 string,
239+
t6 string,
240+
t7 string)
241+
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
242+
STORED AS TEXTFILE
243+
LOCATION 'abfs://[email protected]/example/data/';
244+
LOCATION 'abfs:///example/data/';
245+
LOCATION '/example/data/';
246+
```
247+
169248
## Next steps
170249

171250
* [Azure HDInsight integration with Data Lake Storage Gen2 preview - ACL and security update](https://azure.microsoft.com/blog/azure-hdinsight-integration-with-data-lake-storage-gen-2-preview-acl-and-security-update/)
172251
* [Introduction to Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-introduction.md)
252+
* [Tutorial: Extract, transform, and load data using Interactive Query in Azure HDInsight](./interactive-query/interactive-query-tutorial-analyze-flight-data.md)

0 commit comments

Comments
 (0)