You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Azure Storage solutions for ML Services on HDInsight - Azure
2
+
title: Azure storage solutions for ML Services on HDInsight - Azure
3
3
description: Learn about the different storage options available with ML Services on HDInsight
4
4
ms.service: hdinsight
5
5
author: hrasheed-msft
6
6
ms.author: hrasheed
7
7
ms.reviewer: jasonh
8
8
ms.custom: hdinsightactive
9
9
ms.topic: conceptual
10
-
ms.date: 06/27/2018
10
+
ms.date: 01/02/2020
11
11
---
12
-
# Azure Storage solutions for ML Services on Azure HDInsight
13
12
14
-
ML Services on HDInsight can use a variety of storage solutions to persist data, code, or objects that contain results from analysis. These include the following options:
13
+
# Azure storage solutions for ML Services on Azure HDInsight
14
+
15
+
ML Services on HDInsight can use different storage solutions to persist data, code, or objects that contain results from analysis. These solutions include the following options:
You also have the option of accessing multiple Azure storage accounts or containers with your HDInsight cluster. Azure File storage is a convenient data storage option for use on the edge node that enables you to mount an Azure Storage file share to, for example, the Linux file system. But Azure File shares can be mounted and used by any system that has a supported operating system such as Windows or Linux.
21
+
You also have the option of accessing multiple Azure storage accounts or containers with your HDInsight cluster. Azure File storage is a convenient data storage option for use on the edge node that enables you to mount an Azure storage file share to, for example, the Linux file system. But Azure File shares can be mounted and used by any system that has a supported operating system such as Windows or Linux.
21
22
22
-
When you create an Apache Hadoop cluster in HDInsight, you specify either an **Azure storage** account or **Data Lake Storage**. A specific storage container from that account holds the file system for the cluster that you create (for example, the Hadoop Distributed File System). For more information and guidance, see:
23
+
When you create an Apache Hadoop cluster in HDInsight, you specify either an **Azure Storage** account or **Data Lake Storage**. A specific storage container from that account holds the file system for the cluster that you create (for example, the Hadoop Distributed File System). For more information and guidance, see:
23
24
24
-
-[Use Azure storage with HDInsight](../hdinsight-hadoop-use-blob-storage.md)
25
+
-[Use Azure Storage with HDInsight](../hdinsight-hadoop-use-blob-storage.md)
25
26
-[Use Data Lake Storage with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-store.md)
26
27
27
28
## Use Azure Blob storage accounts with ML Services cluster
@@ -35,73 +36,81 @@ If you specified more than one storage account when creating your ML Services cl
35
36
36
37
1. Using an SSH client, connect to the edge node of your cluster. For information on using SSH with HDInsight clusters, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
37
38
38
-
2. Copy a sample file, mysamplefile.csv, to the /share directory.
39
+
2. Copy a sample file, mysamplefile.csv, to the /share directory.
39
40
40
-
hadoop fs –mkdir /share
41
-
hadoop fs –copyFromLocal mycsv.scv /share
41
+
```bash
42
+
hadoop fs –mkdir /share
43
+
hadoop fs –copyFromLocal mycsv.scv /share
44
+
```
42
45
43
46
3. Switch to R Studio or another R console, and write R code to set the name node to **default** and location of the file you want to access.
All the directory and file references point to the storage account `wasb://[email protected]`. This is the **default storage account** that's associated with the HDInsight cluster.
68
+
All the directory and file references point to the storage account `wasbs://[email protected]`. This is the **default storage account** that's associated with the HDInsight cluster.
64
69
65
70
### Use the additional storage with ML Services on HDInsight
66
71
67
72
Now, suppose you want to process a file called mysamplefile1.csv that's located in the /private directory of **container2**in**storage2**.
68
73
69
74
In your R code, point the name node reference to the **storage2** storage account.
## Use Azure Data Lake Storage with ML Services cluster
106
+
## Use Azure Data Lake Storage with ML Services cluster
99
107
100
108
To use Data Lake Storage with your HDInsight cluster, you need to give your cluster access to each Azure Data Lake Storage that you want to use. For instructions on how to use the Azure portal to create a HDInsight cluster with an Azure Data Lake Storage account as the default storage or as additional storage, see [Create an HDInsight cluster with Data Lake Storage using Azure portal](../../data-lake-store/data-lake-store-hdinsight-hadoop-use-portal.md).
101
109
102
110
You then use the storage in your R script much like you did a secondary Azure storage account as described in the previous procedure.
103
111
104
112
### Add cluster access to your Azure Data Lake Storage
113
+
105
114
You access Data Lake Storage by using an Azure Active Directory (Azure AD) Service Principal that's associated with your HDInsight cluster.
106
115
107
116
1. When you create your HDInsight cluster, select **Cluster AAD Identity** from the **Data Source** tab.
@@ -110,58 +119,58 @@ You access Data Lake Storage by using an Azure Active Directory (Azure AD) Servi
110
119
111
120
After you give the Service Principal a name and create a password for it, click **Manage ADLS Access** to associate the Service Principal with your Data Lake Storage.
112
121
113
-
It’s also possible to add cluster access to one or more Data Lake Storage accounts following cluster creation. Open the Azure portal entry for a Data Lake Storage and go to **Data Explorer > Access > Add**.
122
+
It's also possible to add cluster access to one or more Data Lake storage accounts following cluster creation. Open the Azure portal entry for a Data Lake Storage and go to **Data Explorer > Access > Add**.
114
123
115
124
### How to access Data Lake Storage Gen1 from ML Services on HDInsight
116
125
117
-
Once you’ve given access to Data Lake Storage Gen1, you can use the storage in ML Services cluster on HDInsight the way you would a secondary Azure storage account. The only difference is that the prefix **wasb://** changes to **adl://** as follows:
118
-
126
+
Once you've given access to Data Lake Storage Gen1, you can use the storage in ML Services cluster on HDInsight the way you would a secondary Azure storage account. The only difference is that the prefix **wasbs://** changes to **adl://** as follows:
The following commands are used to configure the Data Lake Storage Gen1 account with the RevoShare directory and add the sample .csv file from the previous example:
## Use Azure File storage with ML Services on HDInsight
154
164
155
-
There is also a convenient data storage option for use on the edge node called [Azure Files](https://azure.microsoft.com/services/storage/files/). It enables you to mount an Azure Storage file share to the Linux file system. This option can be handy for storing data files, R scripts, and result objects that might be needed later, especially when it makes sense to use the native file system on the edge node rather than HDFS.
165
+
There's also a convenient data storage option for use on the edge node called [Azure Files](https://azure.microsoft.com/services/storage/files/). It enables you to mount an Azure Storage file share to the Linux file system. This option can be handy for storing data files, R scripts, and result objects that might be needed later, especially when it makes sense to use the native file system on the edge node rather than HDFS.
156
166
157
167
A major benefit of Azure Files is that the file shares can be mounted and used by any system that has a supported OS such as Windows or Linux. For example, it can be used by another HDInsight cluster that you or someone on your team has, by an Azure VM, or even by an on-premises system. For more information, see:
158
168
159
169
- [How to use Azure File storage with Linux](../../storage/files/storage-how-to-use-files-linux.md)
160
170
- [How to use Azure File storage on Windows](../../storage/files/storage-dotnet-how-to-use-files.md)
161
171
162
-
163
172
## Next steps
164
173
165
-
*[Overview of ML Services cluster on HDInsight](r-server-overview.md)
166
-
*[Compute context options for ML Services cluster on HDInsight](r-server-compute-contexts.md)
167
-
*[Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-storage-gen2.md)
174
+
- [Overview of ML Services cluster on HDInsight](r-server-overview.md)
175
+
- [Compute context options for ML Services cluster on HDInsight](r-server-compute-contexts.md)
176
+
- [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-storage-gen2.md)
0 commit comments