Skip to content

Commit aa99cc1

Browse files
authored
Merge pull request #112657 from dagiro/freshness_c41
freshness_c41
2 parents 8b8c2c9 + de200d0 commit aa99cc1

File tree

1 file changed

+18
-21
lines changed

1 file changed

+18
-21
lines changed

articles/hdinsight/hdinsight-hadoop-use-data-lake-store.md

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive,hdiseo17may2017
10-
ms.date: 03/01/2020
10+
ms.date: 04/24/2020
1111
---
1212

1313
# Use Data Lake Storage Gen1 with Azure HDInsight clusters
1414

1515
> [!Note]
1616
> Deploy new HDInsight clusters using [Azure Data Lake Storage Gen2](hdinsight-hadoop-use-data-lake-storage-gen2.md) for improved performance and new features.
1717
18-
To analyze data in HDInsight cluster, you can store the data either in [Azure Storage](../storage/common/storage-introduction.md), [Azure Data Lake Storage Gen 1](../data-lake-store/data-lake-store-overview.md), or [Azure Data Lake Storage Gen 2](../storage/blobs/data-lake-storage-introduction.md). All storage options enable you to safely delete HDInsight clusters that are used for computation without losing user data.
18+
To analyze data in HDInsight cluster, you can store the data either in [`Azure Storage`](../storage/common/storage-introduction.md), [Azure Data Lake Storage Gen 1](../data-lake-store/data-lake-store-overview.md), or [Azure Data Lake Storage Gen 2](../storage/blobs/data-lake-storage-introduction.md). All storage options enable you to safely delete HDInsight clusters that are used for computation without losing user data.
1919

2020
In this article, you learn how Data Lake Storage Gen1 works with HDInsight clusters. To learn how Azure Storage works with HDInsight clusters, see [Use Azure Storage with Azure HDInsight clusters](hdinsight-hadoop-use-blob-storage.md). For more information about creating an HDInsight cluster, see [Create Apache Hadoop clusters in HDInsight](hdinsight-hadoop-provision-linux-clusters.md).
2121

@@ -26,20 +26,20 @@ In this article, you learn how Data Lake Storage Gen1 works with HDInsight clust
2626

2727
## Availability for HDInsight clusters
2828

29-
Apache Hadoop supports a notion of the default file system. The default file system implies a default scheme and authority. It can also be used to resolve relative paths. During the HDInsight cluster creation process, you can specify a blob container in Azure Storage as the default file system, or with HDInsight 3.5 and newer versions, you can select either Azure Storage or Azure Data Lake Storage Gen1 as the default files system with a few exceptions. Note that the cluster and the storage account must be hosted in the same region.
29+
Apache Hadoop supports a notion of the default file system. The default file system implies a default scheme and authority. It can also be used to resolve relative paths. During the HDInsight cluster creation process, specify a blob container in Azure Storage as the default file system. Or with HDInsight 3.5 and newer versions, you can select either Azure Storage or Azure Data Lake Storage Gen1 as the default files system with a few exceptions. The cluster and the storage account must be hosted in the same region.
3030

3131
HDInsight clusters can use Data Lake Storage Gen1 in two ways:
3232

3333
* As the default storage
3434
* As additional storage, with Azure Storage Blob as default storage.
3535

36-
As of now, only some of the HDInsight cluster types/versions support using Data Lake Storage Gen1 as default storage and additional storage accounts:
36+
Currently, only some of the HDInsight cluster types/versions support using Data Lake Storage Gen1 as default storage and additional storage accounts:
3737

3838
| HDInsight cluster type | Data Lake Storage Gen1 as default storage | Data Lake Storage Gen1 as additional storage| Notes |
3939
|------------------------|------------------------------------|---------------------------------------|------|
4040
| HDInsight version 4.0 | No | No |ADLS Gen1 isn't supported with HDInsight 4.0 |
41-
| HDInsight version 3.6 | Yes | Yes | With the exception of HBase|
42-
| HDInsight version 3.5 | Yes | Yes | With the exception of HBase|
41+
| HDInsight version 3.6 | Yes | Yes | Except HBase|
42+
| HDInsight version 3.5 | Yes | Yes | Except HBase|
4343
| HDInsight version 3.4 | No | Yes | |
4444
| HDInsight version 3.3 | No | No | |
4545
| HDInsight version 3.2 | No | Yes | |
@@ -48,7 +48,7 @@ As of now, only some of the HDInsight cluster types/versions support using Data
4848
> [!WARNING]
4949
> HDInsight HBase is not supported with Azure Data Lake Storage Gen1
5050
51-
Using Data Lake Storage Gen1 as an additional storage account doesn't affect performance or the ability to read or write to Azure storage from the cluster.
51+
Using Data Lake Storage Gen1 as an additional storage account doesn't affect performance. Or the ability to read or write to Azure storage from the cluster.
5252

5353
## Use Data Lake Storage Gen1 as default storage
5454

@@ -57,9 +57,9 @@ When HDInsight is deployed with Data Lake Storage Gen1 as default storage, the c
5757
* Cluster1 can use the path `adl://mydatalakestore/cluster1storage`
5858
* Cluster2 can use the path `adl://mydatalakestore/cluster2storage`
5959

60-
Notice that both the clusters use the same Data Lake Storage Gen1 account **mydatalakestore**. Each cluster has access to its own root filesystem in Data Lake Storage. The Azure portal deployment experience in particular prompts you to use a folder name such as **/clusters/\<clustername>** for the root path.
60+
Notice that both the clusters use the same Data Lake Storage Gen1 account **mydatalakestore**. Each cluster has access to its own root filesystem in Data Lake Storage. The Azure portal deployment experience prompts you to use a folder name such as **/clusters/\<clustername>** for the root path.
6161

62-
To be able to use Data Lake Storage Gen1 as default storage, you must grant the service principal access to the following paths:
62+
To use Data Lake Storage Gen1 as default storage, you must grant the service principal access to the following paths:
6363

6464
* The Data Lake Storage Gen1 account root. For example: adl://mydatalakestore/.
6565
* The folder for all cluster folders. For example: adl://mydatalakestore/clusters.
@@ -69,7 +69,7 @@ For more information for creating service principal and grant access, see Config
6969

7070
### Extracting a certificate from Azure Keyvault for use in cluster creation
7171

72-
If you want to set up Azure Data Lake Storage Gen1 as your default storage for a new cluster and the certificate for your service principal is stored in Azure Key Vault, there are a few additional steps required to convert the certificate to the correct format. The following code snippets show how to perform the conversion.
72+
If the certificate for your service principal is stored in Azure Key Vault, you must convert the certificate to the correct format. The following code snippets show how to do the conversion.
7373

7474
First, download the certificate from Key Vault and extract the `SecretValueText`.
7575

@@ -103,21 +103,21 @@ New-AzResourceGroupDeployment `
103103

104104
## Use Data Lake Storage Gen1 as additional storage
105105

106-
You can use Data Lake Storage Gen1 as additional storage for the cluster as well. In such cases, the cluster default storage can either be an Azure Storage Blob or a Data Lake Storage account. If you're running HDInsight jobs against the data stored in Data Lake Storage as additional storage, you must use the fully qualified path to the files. For example:
106+
You can use Data Lake Storage Gen1 as additional storage for the cluster as well. In such cases, the cluster default storage can either be an Azure Storage Blob or a Data Lake Storage account. When running HDInsight jobs against the data stored in Data Lake Storage as additional storage, use the fully qualified path. For example:
107107

108108
adl://mydatalakestore.azuredatalakestore.net/<file_path>
109109

110-
Note that there's no **cluster_root_path** in the URL now. That's because Data Lake Storage isn't a default storage in this case so all you need to do is provide the path to the files.
110+
There's no **cluster_root_path** in the URL now. That's because Data Lake Storage isn't a default storage in this case. So all you need to do is provide the path to the files.
111111

112-
To be able to use a Data Lake Storage Gen1 as additional storage, you only need to grant the service principal access to the paths where your files are stored. For example:
112+
To use a Data Lake Storage Gen1 as additional storage, grant the service principal access to the paths where your files are stored. For example:
113113

114114
adl://mydatalakestore.azuredatalakestore.net/<file_path>
115115

116116
For more information for creating service principal and grant access, see Configure Data Lake Storage access.
117117

118118
## Use more than one Data Lake Storage accounts
119119

120-
Adding a Data Lake Storage account as additional and adding more than one Data Lake Storage accounts are accomplished by giving the HDInsight cluster permission on data in one ore more Data Lake Storage accounts. See Configure Data Lake Storage access.
120+
Adding a Data Lake Storage account as additional and adding more than one Data Lake Storage accounts can be done. Give the HDInsight cluster permission on data in one or more Data Lake Storage accounts. See Configure Data Lake Storage access.
121121

122122
## Configure Data Lake Storage access
123123

@@ -126,7 +126,7 @@ To configure Data Lake Storage access from your HDInsight cluster, you must have
126126
> [!NOTE]
127127
> If you are going to use Azure Data Lake Storage Gen1 as additional storage for HDInsight cluster, we strongly recommend that you do this while you create the cluster as described in this article. Adding Azure Data Lake Storage Gen1 as additional storage to an existing HDInsight cluster is not a supported scenario.
128128
129-
For more information on the basics of the access control model for Data Lake Storage Gen1, see [Access control in Azure Data Lake Storage Gen1](../data-lake-store/data-lake-store-access-control.md).
129+
For more information on the access control model, see [Access control in Azure Data Lake Storage Gen1](../data-lake-store/data-lake-store-access-control.md).
130130

131131
## Access files from the cluster
132132

@@ -156,7 +156,7 @@ Examples are based on an [ssh connection](./hdinsight-hadoop-linux-use-ssh-unix.
156156
157157
#### A few hdfs commands
158158
159-
1. Create a simple file on local storage.
159+
1. Create a file on local storage.
160160
161161
```bash
162162
touch testFile.txt
@@ -222,7 +222,7 @@ Use the following links for detailed instructions on how to create HDInsight clu
222222

223223
## Refresh the HDInsight certificate for Data Lake Storage Gen1 access
224224

225-
The following example PowerShell code reads a certificate from a local file or Azure Key Vault, and updates your HDInsight cluster with the new certificate to access Azure Data Lake Storage Gen1. Provide your own HDInsight cluster name, resource group name, subscription ID, app ID, local path to the certificate. Type in the password when prompted.
225+
The following example PowerShell code reads a certificate from a local file or Azure Key Vault, and updates your HDInsight cluster with the new certificate to access Azure Data Lake Storage Gen1. Provide your own HDInsight cluster name, resource group name, subscription ID, `app ID`, local path to the certificate. Type in the password when prompted.
226226

227227
```powershell-interactive
228228
$clusterName = '<clustername>'
@@ -296,14 +296,11 @@ Invoke-AzResourceAction `
296296

297297
## Next steps
298298

299-
In this article, you learned how to use HDFS-compatible Azure Data Lake Storage Gen1 with HDInsight. This allows you to build scalable, long-term, archiving data acquisition solutions and use HDInsight to unlock the information inside the stored structured and unstructured data.
299+
In this article, you learned how to use HDFS-compatible Azure Data Lake Storage Gen1 with HDInsight. This storage allows you to build adaptable, long-term, archiving data acquisition solutions. And use HDInsight to unlock the information inside the stored structured and unstructured data.
300300

301301
For more information, see:
302302

303-
* [Get started with Azure HDInsight](hadoop/apache-hadoop-linux-tutorial-get-started.md)
304303
* [Quickstart: Set up clusters in HDInsight](../storage/data-lake-storage/quickstart-create-connect-hdi-cluster.md)
305304
* [Create an HDInsight cluster to use Data Lake Storage Gen1 using the Azure PowerShell](../data-lake-store/data-lake-store-hdinsight-hadoop-use-powershell.md)
306305
* [Upload data to HDInsight](hdinsight-upload-data.md)
307-
* [Use Apache Hive with HDInsight](hadoop/hdinsight-use-hive.md)
308306
* [Use Azure Storage Shared Access Signatures to restrict access to data with HDInsight](hdinsight-storage-sharedaccesssignature-permissions.md)
309-
* [Tutorial: Extract, transform, and load data using Interactive Query in Azure HDInsight](./interactive-query/interactive-query-tutorial-analyze-flight-data.md)

0 commit comments

Comments
 (0)