Merge pull request #78926 from dagiro/freshness115

rjagiewich · web-flow · commit e670ea70b5ae · 2019-06-10T00:31:56.000-07:00
freshness115
diff --git a/articles/hdinsight/hdinsight-using-spark-query-hbase.md b/articles/hdinsight/hdinsight-using-spark-query-hbase.md
@@ -7,19 +7,20 @@ ms.reviewer: jasonh
 ms.service: hdinsight
 ms.custom: hdinsightactive
 ms.topic: conceptual
-ms.date: 03/12/2019
+ms.date: 06/06/2019
 ---
+
 # Use Apache Spark to read and write Apache HBase data
 
 Apache HBase is typically queried either with its low-level API (scans, gets, and puts) or with a SQL syntax using Apache Phoenix. Apache also provides the Apache Spark HBase Connector, which is a convenient and performant alternative to query and modify data stored by HBase.
 
 ## Prerequisites
 
-* Two separate HDInsight clusters, one HBase, and one Spark with at least Spark 2.1 (HDInsight 3.6) installed.
-* The Spark cluster needs to communicate directly with the HBase cluster with minimal latency, so the recommended configuration is deploying both clusters in the same virtual network. For more information, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
+* Two separate HDInsight clusters deployed in the same virtual network. One HBase, and one Spark with at least Spark 2.1 (HDInsight 3.6) installed. For more information, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
+
 * An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](hdinsight-hadoop-linux-use-ssh-unix.md).
-* The [URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be wasb:// for Azure Blob Storage, abfs:// for Azure Data Lake Storage Gen2 or adl:// for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage or Data Lake Storage Gen2, the URI would be wasbs:// or abfss://, respectively  See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
 
+* The [URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be wasb:// for Azure Blob Storage, abfs:// for Azure Data Lake Storage Gen2 or adl:// for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage or Data Lake Storage Gen2, the URI would be wasbs:// or abfss://, respectively  See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
 
 ## Overall process
 
@@ -34,38 +35,47 @@ The high-level process for enabling your Spark cluster to query your HDInsight c
 
 ## Prepare sample data in Apache HBase
 
-In this step, you create and populate a simple table in Apache HBase that you can then query using Spark.
+In this step, you create and populate a table in Apache HBase that you can then query using Spark.
 
-1. Connect to the head node of your HBase cluster using SSH. For more information, see [Connect to HDInsight using SSH](hdinsight-hadoop-linux-use-ssh-unix.md).  Edit the command below by replacing `HBASECLUSTER` with the name of your HBase cluster,  `sshuser` with the ssh user account name, and then enter the command.
+1. Use the `ssh` command to connect to your HBase cluster. Edit the command below by replacing `HBASECLUSTER` with the name of your HBase cluster, and then enter the command:
 
-    ```
+    ```cmd
     ssh sshuser@HBASECLUSTER-ssh.azurehdinsight.net
     ```
 
-2. Enter the command below to start the HBase shell:
-
-        hbase shell
+2. Use the `hbase shell` command to start the HBase interactive shell. Enter the following command in your SSH connection:
 
-3. Enter the command below to create a `Contacts` table with the column families `Personal` and `Office`:
+    ```bash
+    hbase shell
+    ```
 
-        create 'Contacts', 'Personal', 'Office'
+3. Use the `create` command to create an HBase table with two-column families. Enter the following command:
 
-4. Enter the commands below to load a few sample rows of data:
+    ```hbase
+    create 'Contacts', 'Personal', 'Office'
+    ```
 
-        put 'Contacts', '1000', 'Personal:Name', 'John Dole'
-        put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
-        put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
-        put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
-        put 'Contacts', '8396', 'Personal:Name', 'Calvin Raji'
-        put 'Contacts', '8396', 'Personal:Phone', '230-555-0191'
-        put 'Contacts', '8396', 'Office:Phone', '230-555-0191'
-        put 'Contacts', '8396', 'Office:Address', '5415 San Gabriel Dr.'
+4. Use the `put` command to insert values at a specified column in a specified row in a particular table. Enter the following command:
+
+    ```hbase
+    put 'Contacts', '1000', 'Personal:Name', 'John Dole'
+    put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
+    put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
+    put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
+    put 'Contacts', '8396', 'Personal:Name', 'Calvin Raji'
+    put 'Contacts', '8396', 'Personal:Phone', '230-555-0191'
+    put 'Contacts', '8396', 'Office:Phone', '230-555-0191'
+    put 'Contacts', '8396', 'Office:Address', '5415 San Gabriel Dr.'
+    ```
 
-5. Enter the command below to exit the HBase shell:
+5. Use the `exit` command to stop the HBase interactive shell. Enter the following command:
 
-        exit 
+    ```hbase
+    exit
+    ```
 
 ## Copy hbase-site.xml to Spark cluster
+
 Copy the hbase-site.xml from local storage to the root of your Spark cluster's default storage.  Edit the command below to reflect your configuration.  Then, from your open SSH session to the HBase cluster, enter the command:
 
 | Syntax value | New value|
@@ -74,23 +84,27 @@ Copy the hbase-site.xml from local storage to the root of your Spark cluster's d
 |`SPARK_STORAGE_CONTAINER`|Replace with the default storage container name used for the Spark cluster.|
 |`SPARK_STORAGE_ACCOUNT`|Replace with the default storage account name used for the Spark cluster.|
 
-```
+```bash
 hdfs dfs -copyFromLocal /etc/hbase/conf/hbase-site.xml wasbs://SPARK_STORAGE_CONTAINER@SPARK_STORAGE_ACCOUNT.blob.core.windows.net/
 ```
 
+Then exit your ssh connection to your HBase cluster.
+
 ## Put hbase-site.xml on your Spark cluster
 
 1. Connect to the head node of your Spark cluster using SSH.
 
 2. Enter the command below to copy `hbase-site.xml` from your Spark cluster's default storage to the Spark 2 configuration folder on the cluster's local storage:
 
-        sudo hdfs dfs -copyToLocal /hbase-site.xml /etc/spark2/conf
+    ```bash
+    sudo hdfs dfs -copyToLocal /hbase-site.xml /etc/spark2/conf
+    ```
 
 ## Run Spark Shell referencing the Spark HBase Connector
 
 1. From your open SSH session to the Spark cluster, enter the command below to start a spark shell:
 
-    ```
+    ```bash
     spark-shell --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories https://repo.hortonworks.com/content/groups/public/
     ```  
 
@@ -179,12 +193,14 @@ In this step, you define a catalog object that maps the schema from Apache Spark
 
 9. You should see results like these:
 
-        +-------------+--------------------+
-        | personalName|       officeAddress|
-        +-------------+--------------------+
-        |    John Dole|1111 San Gabriel Dr.|
-        |  Calvin Raji|5415 San Gabriel Dr.|
-        +-------------+--------------------+
+    ```output
+    +-------------+--------------------+
+    | personalName|       officeAddress|
+    +-------------+--------------------+
+    |    John Dole|1111 San Gabriel Dr.|
+    |  Calvin Raji|5415 San Gabriel Dr.|
+    +-------------+--------------------+
+    ```
 
 ## Insert new data
 
@@ -223,13 +239,21 @@ In this step, you define a catalog object that maps the schema from Apache Spark
 
 5. You should see output like this:
 
-        +------+--------------------+--------------+------------+--------------+
-        |rowkey|       officeAddress|   officePhone|personalName| personalPhone|
-        +------+--------------------+--------------+------------+--------------+
-        |  1000|1111 San Gabriel Dr.|1-425-000-0002|   John Dole|1-425-000-0001|
-        | 16891|        40 Ellis St.|  674-555-0110|John Jackson|  230-555-0194|
-        |  8396|5415 San Gabriel Dr.|  230-555-0191| Calvin Raji|  230-555-0191|
-        +------+--------------------+--------------+------------+--------------+
+    ```output
+    +------+--------------------+--------------+------------+--------------+
+    |rowkey|       officeAddress|   officePhone|personalName| personalPhone|
+    +------+--------------------+--------------+------------+--------------+
+    |  1000|1111 San Gabriel Dr.|1-425-000-0002|   John Dole|1-425-000-0001|
+    | 16891|        40 Ellis St.|  674-555-0110|John Jackson|  230-555-0194|
+    |  8396|5415 San Gabriel Dr.|  230-555-0191| Calvin Raji|  230-555-0191|
+    +------+--------------------+--------------+------------+--------------+
+    ```
+
+6. Close the spark shell by entering the following command:
+
+    ```scala
+    :q
+    ```
 
 ## Next steps