Skip to content

Commit e75beb1

Browse files
Merge pull request #101786 from dagiro/addStorage1
addStorage1
2 parents 0fcdcb1 + bd175fd commit e75beb1

File tree

2 files changed

+43
-122
lines changed

2 files changed

+43
-122
lines changed

articles/hdinsight/hdinsight-hadoop-add-storage.md

Lines changed: 43 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
2-
title: Add additional Azure storage accounts to HDInsight
3-
description: Learn how to add additional Azure storage accounts to an existing HDInsight cluster.
2+
title: Add additional Azure Storage accounts to HDInsight
3+
description: Learn how to add additional Azure Storage accounts to an existing HDInsight cluster.
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 10/31/2019
9+
ms.date: 01/21/2020
1010
---
1111

1212
# Add additional storage accounts to HDInsight
1313

14-
Learn how to use script actions to add additional Azure storage *accounts* to HDInsight. The steps in this document add a storage *account* to an existing Linux-based HDInsight cluster. This article applies to storage *accounts* (not the default cluster storage account), and not additional storage such as [Azure Data Lake Storage Gen1](hdinsight-hadoop-use-data-lake-store.md) and [Azure Data Lake Storage Gen2](hdinsight-hadoop-use-data-lake-storage-gen2.md).
14+
Learn how to use script actions to add additional Azure Storage *accounts* to HDInsight. The steps in this document add a storage *account* to an existing HDInsight cluster. This article applies to storage *accounts* (not the default cluster storage account), and not additional storage such as [Azure Data Lake Storage Gen1](hdinsight-hadoop-use-data-lake-store.md) and [Azure Data Lake Storage Gen2](hdinsight-hadoop-use-data-lake-storage-gen2.md).
1515

1616
> [!IMPORTANT]
1717
> The information in this document is about adding additional storage account(s) to a cluster after it has been created. For information on adding storage accounts during cluster creation, see [Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more](hdinsight-hadoop-provision-linux-clusters.md).
@@ -20,21 +20,10 @@ Learn how to use script actions to add additional Azure storage *accounts* to HD
2020

2121
* A Hadoop cluster on HDInsight. See [Get Started with HDInsight on Linux](./hadoop/apache-hadoop-linux-tutorial-get-started.md).
2222
* Storage account name and key. See [Manage storage account access keys](../storage/common/storage-account-keys-manage.md).
23-
* [Correctly cased cluster name](hdinsight-hadoop-manage-ambari-rest-api.md#identify-correctly-cased-cluster-name).
2423
* If using PowerShell, you'll need the AZ module. See [Overview of Azure PowerShell](https://docs.microsoft.com/powershell/azure/overview).
25-
* If you haven't installed the Azure CLI, see [Azure Command-Line Interface (CLI)](https://docs.microsoft.com/cli/azure/?view=azure-cli-latest).
26-
* If using bash or a windows command prompt, you'll also need **jq**, a command-line JSON processor. See [https://stedolan.github.io/jq/](https://stedolan.github.io/jq/). For bash on Ubuntu on Windows 10 see [Windows Subsystem for Linux Installation Guide for Windows 10](https://docs.microsoft.com/windows/wsl/install-win10).
2724

2825
## How it works
2926

30-
This script takes the following parameters:
31-
32-
* __Azure storage account name__: The name of the storage account to add to the HDInsight cluster. After running the script, HDInsight can read and write data stored in this storage account.
33-
34-
* __Azure storage account key__: A key that grants access to the storage account.
35-
36-
* __-p__ (optional): If specified, the key isn't encrypted and is stored in the core-site.xml file as plain text.
37-
3827
During processing, the script performs the following actions:
3928

4029
* If the storage account already exists in the core-site.xml configuration for the cluster, the script exits and no further actions are performed.
@@ -50,80 +39,38 @@ During processing, the script performs the following actions:
5039
> [!WARNING]
5140
> Using a storage account in a different location than the HDInsight cluster is not supported.
5241
53-
## The script
54-
55-
__Script location__: [https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh](https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh)
56-
57-
__Requirements__: The script must be applied on the __Head nodes__. You don't need to mark this script as __Persisted__, as it directly updates the Ambari configuration for the cluster.
58-
59-
## To use the script
60-
61-
This script can be used from the Azure PowerShell, Azure CLI, or the Azure portal.
62-
63-
### PowerShell
64-
65-
Using [Submit-AzHDInsightScriptAction](https://docs.microsoft.com/powershell/module/az.hdinsight/submit-azhdinsightscriptaction). Replace `CLUSTERNAME`, `ACCOUNTNAME`, and `ACCOUNTKEY` with the appropriate values.
66-
67-
```powershell
68-
# Update these parameters
69-
$clusterName = "CLUSTERNAME"
70-
$parameters = "ACCOUNTNAME ACCOUNTKEY"
71-
72-
$scriptActionName = "addStorage"
73-
$scriptActionUri = "https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh"
74-
75-
# Execute script
76-
Submit-AzHDInsightScriptAction `
77-
-ClusterName $clusterName `
78-
-Name $scriptActionName `
79-
-Uri $scriptActionUri `
80-
-NodeTypes "headnode" `
81-
-Parameters $parameters
82-
```
42+
## Add storage account
8343

84-
### Azure CLI
85-
86-
Using [az hdinsight script-action execute](https://docs.microsoft.com/cli/azure/hdinsight/script-action?view=azure-cli-latest#az-hdinsight-script-action-execute). Replace `CLUSTERNAME`, `RESOURCEGROUP`, `ACCOUNTNAME`, and `ACCOUNTKEY` with the appropriate values.
87-
88-
```cli
89-
az hdinsight script-action execute ^
90-
--name CLUSTERNAME ^
91-
--resource-group RESOURCEGROUP ^
92-
--roles headnode ^
93-
--script-action-name addStorage ^
94-
--script-uri "https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh" ^
95-
--script-parameters "ACCOUNTNAME ACCOUNTKEY"
96-
```
44+
Use [Script Action](hdinsight-hadoop-customize-cluster-linux.md#apply-a-script-action-to-a-running-cluster) to apply the changes with the following considerations:
9745

98-
### Azure portal
46+
|Property | Value |
47+
|---|---|
48+
|Bash script URI|`https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh`|
49+
|Node type(s)|Head|
50+
|Parameters|`ACCOUNTNAME` `ACCOUNTKEY` `-p` (optional)|
9951

100-
See [Apply a script action to a running cluster](hdinsight-hadoop-customize-cluster-linux.md#apply-a-script-action-to-a-running-cluster).
52+
* `ACCOUNTNAME` is the name of the storage account to add to the HDInsight cluster.
53+
* `ACCOUNTKEY` is the access key for `ACCOUNTNAME`.
54+
* `-p` is optional. If specified, the key isn't encrypted and is stored in the core-site.xml file as plain text.
10155

102-
## Known issues
103-
104-
### Storage firewall
105-
106-
If you choose to secure your storage account with the **Firewalls and virtual networks** restrictions on **Selected networks**, be sure to enable the exception **Allow trusted Microsoft services...** so that HDInsight can access your storage account.
56+
## Verification
10757

108-
### Storage accounts not displayed in Azure portal or tools
58+
When viewing the HDInsight cluster in the Azure portal, selecting the __Storage Accounts__ entry under __Properties__ doesn't display storage accounts added through this script action. Azure PowerShell and Azure CLI don't display the additional storage account either. The storage information isn't displayed because the script only modifies the `core-site.xml` configuration for the cluster. This information isn't used when retrieving the cluster information using Azure management APIs.
10959

110-
When viewing the HDInsight cluster in the Azure portal, selecting the __Storage Accounts__ entry under __Properties__ doesn't display storage accounts added through this script action. Azure PowerShell and Azure CLI don't display the additional storage account either.
60+
To verify the additional storage use one of the methods shown below:
11161

112-
The storage information isn't displayed because the script only modifies the core-site.xml configuration for the cluster. This information isn't used when retrieving the cluster information using Azure management APIs.
62+
### Powershell
11363

114-
To view storage account information added to the cluster using this script, use the Ambari REST API. Use the following commands to retrieve this information for your cluster:
115-
116-
### PowerShell
117-
118-
Replace `CLUSTERNAME` with the properly cased cluster name. Replace `ACCOUNTNAME` with the actual names. When prompted, enter the cluster login password.
64+
The script will return the Storage Account name(s) associated with the given cluster. Replace `CLUSTERNAME` with the actual cluster name, and then run the script.
11965

12066
```powershell
12167
# Update values
12268
$clusterName = "CLUSTERNAME"
123-
$accountName = "ACCOUNTNAME"
12469
12570
$creds = Get-Credential -UserName "admin" -Message "Enter the cluster login credentials"
12671
72+
$clusterName = $clusterName.ToLower();
73+
12774
# getting service_config_version
12875
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName`?fields=Clusters/desired_service_config_versions/HDFS" `
12976
-Credential $creds -UseBasicParsing
@@ -134,79 +81,53 @@ $configVersion=$respObj.Clusters.desired_service_config_versions.HDFS.service_co
13481
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/configurations/service_config_versions?service_name=HDFS&service_config_version=$configVersion" `
13582
-Credential $creds
13683
$respObj = ConvertFrom-Json $resp.Content
137-
$respObj.items.configurations.properties."fs.azure.account.key.$accountName.blob.core.windows.net"
84+
85+
# extract account names
86+
$value = ($respObj.items.configurations | Where type -EQ "core-site").properties | Get-Member -membertype properties | Where Name -Like "fs.azure.account.key.*"
87+
foreach ($name in $value ) { $name.Name.Split(".")[4]}
13888
```
13989

140-
### bash
90+
### Apache Ambari
14191

142-
Replace `CLUSTERNAME` with the properly cased cluster name. Replace `PASSWORD` with the cluster admin password. Replace `STORAGEACCOUNT` with the actual storage account name.
92+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
14393

144-
```bash
145-
export clusterName="CLUSTERNAME"
146-
export password='PASSWORD'
147-
export storageAccount="STORAGEACCOUNT"
94+
1. Navigate to **HDFS** > **Configs** > **Advanced** > **Custom core-site**.
14895

149-
export ACCOUNTNAME='"'fs.azure.account.key.$storageAccount.blob.core.windows.net'"'
96+
1. Observe the keys that begin with `fs.azure.account.key`. The account name will be a part of the key as seen in this sample image:
15097

151-
export configVersion=$(curl --silent -u admin:$password -G "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName?fields=Clusters/desired_service_config_versions/HDFS" \
152-
| jq ".Clusters.desired_service_config_versions.HDFS[].service_config_version")
98+
![verification through Apache Ambari](./media/hdinsight-hadoop-add-storage/apache-ambari-verification.png)
15399

154-
curl --silent -u admin:$password -G "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/configurations/service_config_versions?service_name=HDFS&service_config_version=$configVersion" \
155-
| jq ".items[].configurations[].properties[$ACCOUNTNAME] | select(. != null)"
156-
```
100+
## Remove storage account
157101

158-
### cmd
102+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
159103

160-
Replace `CLUSTERNAME` with the properly cased cluster name in both scripts. First identify the service config version in use by entering the command below:
104+
1. Navigate to **HDFS** > **Configs** > **Advanced** > **Custom core-site**.
161105

162-
```cmd
163-
curl --silent -u admin -G "https://CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME?fields=Clusters/desired_service_config_versions/HDFS" | ^
164-
jq-win64 ".Clusters.desired_service_config_versions.HDFS[].service_config_version"
165-
```
106+
1. Remove the following keys:
107+
* `fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net`
108+
* `fs.azure.account.keyprovider.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net`
166109

167-
Replace `ACCOUNTNAME` with the actual storage account name. Then replace `4` with the actual service config version and enter the command:
110+
After removing these keys and saving the configuration, you need to restart Oozie, Yarn, MapReduce2, HDFS, and Hive one by one.
168111

169-
```cmd
170-
curl --silent -u admin -G "https://CLUSTERNAME.azurehdinsight.net/api/v1/clusters/CLUSTERNAME/configurations/service_config_versions?service_name=HDFS&service_config_version=4" | ^
171-
jq-win64 ".items[].configurations[].properties["""fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net"""] | select(. != null)"
172-
```
173-
174-
---
175-
176-
Information returned from this command appears similar to the following text:
112+
## Known issues
177113

178-
"MIIB+gYJKoZIhvcNAQcDoIIB6zCCAecCAQAxggFaMIIBVgIBADA+MCoxKDAmBgNVBAMTH2RiZW5jcnlwdGlvbi5henVyZWhkaW5zaWdodC5uZXQCEA6GDZMW1oiESKFHFOOEgjcwDQYJKoZIhvcNAQEBBQAEggEATIuO8MJ45KEQAYBQld7WaRkJOWqaCLwFub9zNpscrquA2f3o0emy9Vr6vu5cD3GTt7PmaAF0pvssbKVMf/Z8yRpHmeezSco2y7e9Qd7xJKRLYtRHm80fsjiBHSW9CYkQwxHaOqdR7DBhZyhnj+DHhODsIO2FGM8MxWk4fgBRVO6CZ5eTmZ6KVR8wYbFLi8YZXb7GkUEeSn2PsjrKGiQjtpXw1RAyanCagr5vlg8CicZg1HuhCHWf/RYFWM3EBbVz+uFZPR3BqTgbvBhWYXRJaISwssvxotppe0ikevnEgaBYrflB2P+PVrwPTZ7f36HQcn4ifY1WRJQ4qRaUxdYEfzCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAhRdscgRV3wmYBg3j/T1aEnO3wLWCRpgZa16MWqmfQPuansKHjLwbZjTpeirqUAQpZVyXdK/w4gKlK+t1heNsNo1Wwqu+Y47bSAX1k9Ud7+Ed2oETDI7724IJ213YeGxvu4Ngcf2eHW+FRK"
114+
### Storage firewall
179115

180-
This text is an example of an encrypted key, which is used to access the storage account.
116+
If you choose to secure your storage account with the **Firewalls and virtual networks** restrictions on **Selected networks**, be sure to enable the exception **Allow trusted Microsoft services...** so that HDInsight can access your storage account.
181117

182118
### Unable to access storage after changing key
183119

184120
If you change the key for a storage account, HDInsight can no longer access the storage account. HDInsight uses a cached copy of key in the core-site.xml for the cluster. This cached copy must be updated to match the new key.
185121

186122
Running the script action again does __not__ update the key, as the script checks to see if an entry for the storage account already exists. If an entry already exists, it doesn't make any changes.
187123

188-
To work around this problem, you must remove the existing entry for the storage account. Use the following steps to remove the existing entry:
124+
To work around this problem:
125+
1. Remove the storage account.
126+
1. Add the storage account.
189127

190128
> [!IMPORTANT]
191129
> Rotating the storage key for the primary storage account attached to a cluster is not supported.
192130
193-
1. In a web browser, open the Ambari Web UI for your HDInsight cluster. The URI is `https://CLUSTERNAME.azurehdinsight.net`. Replace `CLUSTERNAME` with the name of your cluster.
194-
195-
When prompted, enter the HTTP login user and password for your cluster.
196-
197-
2. From the list of services on the left of the page, select __HDFS__. Then select the __Configs__ tab in the center of the page.
198-
199-
3. In the __Filter...__ field, enter a value of __fs.azure.account__. This returns entries for any additional storage accounts that have been added to the cluster. There are two types of entries; __keyprovider__ and __key__. Both contain the name of the storage account as part of the key name.
200-
201-
The following are example entries for a storage account named __mystorage__:
202-
203-
fs.azure.account.keyprovider.mystorage.blob.core.windows.net
204-
fs.azure.account.key.mystorage.blob.core.windows.net
205-
206-
4. After you've identified the keys for the storage account you need to remove, use the red '-' icon to the right of the entry to delete it. Then use the __Save__ button to save your changes.
207-
208-
5. After changes have been saved, use the script action to add the storage account and new key value to the cluster.
209-
210131
### Poor performance
211132

212133
If the storage account is in a different region than the HDInsight cluster, you may experience poor performance. Accessing data in a different region sends network traffic outside the regional Azure data center and across the public internet, which can introduce latency.
308 KB
Loading

0 commit comments

Comments
 (0)