Merge pull request #100037 from dagiro/freshness161

TedA-M · web-flow · commit 3b05ec993376 · 2020-01-03T10:28:04.000-08:00
freshness161
diff --git a/articles/hdinsight/r-server/r-server-storage.md b/articles/hdinsight/r-server/r-server-storage.md
@@ -1,27 +1,28 @@
 ---
-title: Azure Storage solutions for ML Services on HDInsight - Azure 
+title: Azure storage solutions for ML Services on HDInsight - Azure 
 description: Learn about the different storage options available with ML Services on HDInsight
 ms.service: hdinsight
 author: hrasheed-msft
 ms.author: hrasheed
 ms.reviewer: jasonh
 ms.custom: hdinsightactive
 ms.topic: conceptual
-ms.date: 06/27/2018
+ms.date: 01/02/2020
 ---
-# Azure Storage solutions for ML Services on Azure HDInsight
 
-ML Services on HDInsight can use a variety of storage solutions to persist data, code, or objects that contain results from analysis. These include the following options:
+# Azure storage solutions for ML Services on Azure HDInsight
+
+ML Services on HDInsight can use different storage solutions to persist data, code, or objects that contain results from analysis. These solutions include the following options:
 
 - [Azure Blob](https://azure.microsoft.com/services/storage/blobs/)
 - [Azure Data Lake Storage](https://azure.microsoft.com/services/storage/data-lake-storage/)
 - [Azure File storage](https://azure.microsoft.com/services/storage/files/)
 
-You also have the option of accessing multiple Azure storage accounts or containers with your HDInsight cluster. Azure File storage is a convenient data storage option for use on the edge node that enables you to mount an Azure Storage file share to, for example, the Linux file system. But Azure File shares can be mounted and used by any system that has a supported operating system such as Windows or Linux. 
+You also have the option of accessing multiple Azure storage accounts or containers with your HDInsight cluster. Azure File storage is a convenient data storage option for use on the edge node that enables you to mount an Azure storage file share to, for example, the Linux file system. But Azure File shares can be mounted and used by any system that has a supported operating system such as Windows or Linux.
 
-When you create an Apache Hadoop cluster in HDInsight, you specify either an **Azure storage** account or **Data Lake Storage**. A specific storage container from that account holds the file system for the cluster that you create (for example, the Hadoop Distributed File System). For more information and guidance, see:
+When you create an Apache Hadoop cluster in HDInsight, you specify either an **Azure Storage** account or **Data Lake Storage**. A specific storage container from that account holds the file system for the cluster that you create (for example, the Hadoop Distributed File System). For more information and guidance, see:
 
-- [Use Azure storage with HDInsight](../hdinsight-hadoop-use-blob-storage.md)
+- [Use Azure Storage with HDInsight](../hdinsight-hadoop-use-blob-storage.md)
 - [Use Data Lake Storage with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-store.md)
 
 ## Use Azure Blob storage accounts with ML Services cluster
@@ -35,73 +36,81 @@ If you specified more than one storage account when creating your ML Services cl
 
 1. Using an SSH client, connect to the edge node of your cluster. For information on using SSH with HDInsight clusters, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
   
-2. Copy a sample file, mysamplefile.csv, to the /share directory. 
+2. Copy a sample file, mysamplefile.csv, to the /share directory.
 
-        hadoop fs –mkdir /share
-        hadoop fs –copyFromLocal mycsv.scv /share  
+    ```bash
+    hadoop fs –mkdir /share
+    hadoop fs –copyFromLocal mycsv.scv /share
+    ```
 
 3. Switch to R Studio or another R console, and write R code to set the name node to **default** and location of the file you want to access.  
 
-        myNameNode <- "default"
-        myPort <- 0
+    ```R
+    myNameNode <- "default"
+    myPort <- 0
 
-    	#Location of the data:  
-        bigDataDirRoot <- "/share"  
+    #Location of the data:  
+    bigDataDirRoot <- "/share"  
 
-    	#Define Spark compute context:
-        mySparkCluster <- RxSpark(nameNode=myNameNode, consoleOutput=TRUE)
+    #Define Spark compute context:
+    mySparkCluster <- RxSpark(nameNode=myNameNode, consoleOutput=TRUE)
 
-    	#Set compute context:
-        rxSetComputeContext(mySparkCluster)
+    #Set compute context:
+    rxSetComputeContext(mySparkCluster)
 
-    	#Define the Hadoop Distributed File System (HDFS) file system:
-        hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
+    #Define the Hadoop Distributed File System (HDFS) file system:
+    hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
 
-    	#Specify the input file to analyze in HDFS:
-        inputFile <-file.path(bigDataDirRoot,"mysamplefile.csv")
+    #Specify the input file to analyze in HDFS:
+    inputFile <-file.path(bigDataDirRoot,"mysamplefile.csv")
+    ```
 
-All the directory and file references point to the storage account `wasb://container1@storage1.blob.core.windows.net`. This is the **default storage account** that's associated with the HDInsight cluster.
+All the directory and file references point to the storage account `wasbs://container1@storage1.blob.core.windows.net`. This is the **default storage account** that's associated with the HDInsight cluster.
 
 ### Use the additional storage with ML Services on HDInsight
 
 Now, suppose you want to process a file called mysamplefile1.csv that's located in the  /private directory of **container2** in **storage2**.
 
 In your R code, point the name node reference to the **storage2** storage account.
 
-	myNameNode <- "wasb://container2@storage2.blob.core.windows.net"
-	myPort <- 0
-
-	#Location of the data:
-	bigDataDirRoot <- "/private"
+```R
+myNameNode <- "wasbs://container2@storage2.blob.core.windows.net"
+myPort <- 0
 
-	#Define Spark compute context:
-	mySparkCluster <- RxSpark(consoleOutput=TRUE, nameNode=myNameNode, port=myPort)
+#Location of the data:
+bigDataDirRoot <- "/private"
 
-	#Set compute context:
-	rxSetComputeContext(mySparkCluster)
+#Define Spark compute context:
+mySparkCluster <- RxSpark(consoleOutput=TRUE, nameNode=myNameNode, port=myPort)
 
-	#Define HDFS file system:
-	hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
+#Set compute context:
+rxSetComputeContext(mySparkCluster)
 
-	#Specify the input file to analyze in HDFS:
-	inputFile <-file.path(bigDataDirRoot,"mysamplefile1.csv")
+#Define HDFS file system:
+hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
 
-All of the directory and file references now point to the storage account `wasb://container2@storage2.blob.core.windows.net`. This is the **Name Node** that you’ve specified.
+#Specify the input file to analyze in HDFS:
+inputFile <-file.path(bigDataDirRoot,"mysamplefile1.csv")
+```
 
-You have to configure the `/user/RevoShare/<SSH username>` directory on **storage2** as follows:
+All of the directory and file references now point to the storage account `wasbs://container2@storage2.blob.core.windows.net`. This is the **Name Node** that you’ve specified.
 
+Configure the `/user/RevoShare/<SSH username>` directory on **storage2** as follows:
 
-	hadoop fs -mkdir wasb://container2@storage2.blob.core.windows.net/user
-	hadoop fs -mkdir wasb://container2@storage2.blob.core.windows.net/user/RevoShare
-	hadoop fs -mkdir wasb://container2@storage2.blob.core.windows.net/user/RevoShare/<RDP username>
+```bash
+hadoop fs -mkdir wasbs://container2@storage2.blob.core.windows.net/user
+hadoop fs -mkdir wasbs://container2@storage2.blob.core.windows.net/user/RevoShare
+hadoop fs -mkdir wasbs://container2@storage2.blob.core.windows.net/user/RevoShare/<RDP username>
+```
 
-## Use Azure Data Lake Storage with ML Services cluster 
+## Use Azure Data Lake Storage with ML Services cluster
 
 To use Data Lake Storage with your HDInsight cluster, you need to give your cluster access to each Azure Data Lake Storage that you want to use. For instructions on how to use the Azure portal to create a HDInsight cluster with an Azure Data Lake Storage account as the default storage or as additional storage, see [Create an HDInsight cluster with Data Lake Storage using Azure portal](../../data-lake-store/data-lake-store-hdinsight-hadoop-use-portal.md).
 
 You then use the storage in your R script much like you did a secondary Azure storage account as described in the previous procedure.
 
 ### Add cluster access to your Azure Data Lake Storage
+
 You access Data Lake Storage by using an Azure Active Directory (Azure AD) Service Principal that's associated with your HDInsight cluster.
 
 1. When you create your HDInsight cluster, select **Cluster AAD Identity** from the **Data Source** tab.
@@ -110,58 +119,58 @@ You access Data Lake Storage by using an Azure Active Directory (Azure AD) Servi
 
 After you give the Service Principal a name and create a password for it, click **Manage ADLS Access** to associate the Service Principal with your Data Lake Storage.
 
-It’s also possible to add cluster access to one or more Data Lake Storage accounts following cluster creation. Open the Azure portal entry for a Data Lake Storage and go to **Data Explorer > Access > Add**. 
+It's also possible to add cluster access to one or more Data Lake storage accounts following cluster creation. Open the Azure portal entry for a Data Lake Storage and go to **Data Explorer > Access > Add**.
 
 ### How to access Data Lake Storage Gen1 from ML Services on HDInsight
 
-Once you’ve given access to Data Lake Storage Gen1, you can use the storage in ML Services cluster on HDInsight the way you would a secondary Azure storage account. The only difference is that the prefix **wasb://** changes to **adl://** as follows:
-
+Once you've given access to Data Lake Storage Gen1, you can use the storage in ML Services cluster on HDInsight the way you would a secondary Azure storage account. The only difference is that the prefix **wasbs://** changes to **adl://** as follows:
 
-	# Point to the ADL Storage (e.g. ADLtest)
-	myNameNode <- "adl://rkadl1.azuredatalakestore.net"
-	myPort <- 0
+```R
+# Point to the ADL Storage (e.g. ADLtest)
+myNameNode <- "adl://rkadl1.azuredatalakestore.net"
+myPort <- 0
 
-	# Location of the data (assumes a /share directory on the ADL account)
-	bigDataDirRoot <- "/share"  
+# Location of the data (assumes a /share directory on the ADL account)
+bigDataDirRoot <- "/share"  
 
-	# Define Spark compute context
-	mySparkCluster <- RxSpark(consoleOutput=TRUE, nameNode=myNameNode, port=myPort)
+# Define Spark compute context
+mySparkCluster <- RxSpark(consoleOutput=TRUE, nameNode=myNameNode, port=myPort)
 
-	# Set compute context
-	rxSetComputeContext(mySparkCluster)
+# Set compute context
+rxSetComputeContext(mySparkCluster)
 
-	# Define HDFS file system
-	hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
+# Define HDFS file system
+hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
 
-	# Specify the input file in HDFS to analyze
-	inputFile <-file.path(bigDataDirRoot,"mysamplefile.csv")
+# Specify the input file in HDFS to analyze
+inputFile <-file.path(bigDataDirRoot,"mysamplefile.csv")
+```
 
 The following commands are used to configure the Data Lake Storage Gen1 account with the RevoShare directory and add the sample .csv file from the previous example:
 
+```bash
+hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user
+hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user/RevoShare
+hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user/RevoShare/<user>
 
-	hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user
-	hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user/RevoShare
-	hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/user/RevoShare/<user>
+hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/share
 
-	hadoop fs -mkdir adl://rkadl1.azuredatalakestore.net/share
-
-	hadoop fs -copyFromLocal /usr/lib64/R Server-7.4.1/library/RevoScaleR/SampleData/mysamplefile.csv adl://rkadl1.azuredatalakestore.net/share
-
-	hadoop fs –ls adl://rkadl1.azuredatalakestore.net/share
+hadoop fs -copyFromLocal /usr/lib64/R Server-7.4.1/library/RevoScaleR/SampleData/mysamplefile.csv adl://rkadl1.azuredatalakestore.net/share
 
+hadoop fs –ls adl://rkadl1.azuredatalakestore.net/share
+```
 
 ## Use Azure File storage with ML Services on HDInsight
 
-There is also a convenient data storage option for use on the edge node called [Azure Files](https://azure.microsoft.com/services/storage/files/). It enables you to mount an Azure Storage file share to the Linux file system. This option can be handy for storing data files, R scripts, and result objects that might be needed later, especially when it makes sense to use the native file system on the edge node rather than HDFS. 
+There's also a convenient data storage option for use on the edge node called [Azure Files](https://azure.microsoft.com/services/storage/files/). It enables you to mount an Azure Storage file share to the Linux file system. This option can be handy for storing data files, R scripts, and result objects that might be needed later, especially when it makes sense to use the native file system on the edge node rather than HDFS.
 
 A major benefit of Azure Files is that the file shares can be mounted and used by any system that has a supported OS such as Windows or Linux. For example, it can be used by another HDInsight cluster that you or someone on your team has, by an Azure VM, or even by an on-premises system. For more information, see:
 
 - [How to use Azure File storage with Linux](../../storage/files/storage-how-to-use-files-linux.md)
 - [How to use Azure File storage on Windows](../../storage/files/storage-dotnet-how-to-use-files.md)
 
-
 ## Next steps
 
-* [Overview of ML Services cluster on HDInsight](r-server-overview.md)
-* [Compute context options for ML Services cluster on HDInsight](r-server-compute-contexts.md)
-* [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-storage-gen2.md)
+- [Overview of ML Services cluster on HDInsight](r-server-overview.md)
+- [Compute context options for ML Services cluster on HDInsight](r-server-compute-contexts.md)
+- [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../hdinsight-hadoop-use-data-lake-storage-gen2.md)