Skip to content

Commit f9e7e42

Browse files
authored
Improved Acrolinx Score
Improved Acrolinx Score
1 parent 57a347f commit f9e7e42

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/hdinsight/hdinsight-using-spark-query-hbase.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Use the Spark HBase Connector to read and write data from a Spark c
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive,seoapr2020
7-
ms.date: 04/01/2022
7+
ms.date: 12/09/2022
88
---
99

1010
# Use Apache Spark to read and write Apache HBase data
@@ -31,7 +31,7 @@ The high-level process for enabling your Spark cluster to query your HBase clust
3131

3232
In this step, you create and populate a table in Apache HBase that you can then query using Spark.
3333

34-
1. Use the `ssh` command to connect to your HBase cluster. Edit the command below by replacing `HBASECLUSTER` with the name of your HBase cluster, and then enter the command:
34+
1. Use the `ssh` command to connect to your HBase cluster. Edit the command by replacing `HBASECLUSTER` with the name of your HBase cluster, and then enter the command:
3535

3636
```cmd
3737
@@ -70,14 +70,14 @@ In this step, you create and populate a table in Apache HBase that you can then
7070
7171
## Run scripts to set up connection between clusters
7272
73-
To set up the communication between clusters, follow the below steps to run two scripts on your clusters. These scripts will automate the process of file copying described in 'Set up communication manually' section below.
73+
To set up the communication between clusters, follow the steps to run two scripts on your clusters. These scripts will automate the process of file copying described in 'Set up communication manually' section.
7474
7575
* The script you run from the HBase cluster will upload `hbase-site.xml` and HBase IP-mapping information to the default storage attached to your Spark cluster.
7676
* The script that you run from the Spark cluster sets up two cron jobs to run two helper scripts periodically:
7777
1. HBase cron job – download new `hbase-site.xml` files and HBase IP mapping from Spark default storage account to local node
7878
2. Spark cron job – checks if a Spark scaling occurred and if cluster is secure. If so, edit `/etc/hosts` to include HBase IP mapping stored locally
7979
80-
__NOTE__: Before proceeding, make sure you have added the Spark cluster’s storage account to your HBase cluster as secondary storage account. Make sure you the scripts in order as indicated below.
80+
__NOTE__: Before proceeding, make sure you've added the Spark cluster’s storage account to your HBase cluster as secondary storage account. Make sure you the scripts in order as indicated.
8181
8282
8383
1. Use [Script Action](hdinsight-hadoop-customize-cluster-linux.md#script-action-to-a-running-cluster) on your HBase cluster to apply the changes with the following considerations:
@@ -104,19 +104,19 @@ __NOTE__: Before proceeding, make sure you have added the Spark cluster’s stor
104104
105105
106106
* You can specify how often you want this cluster to automatically check if update. Default: -s “*/1 * * * *” -h 0 (In this example, the Spark cron runs every minute, while the HBase cron doesn't run)
107-
* Since HBase cron is not set up by default, you need to rerun this script when perform scaling to your HBase cluster. If your HBase cluster scales often, you may choose to set up HBase cron job automatically. For example: `-h "*/30 * * * *"` configures the script to perform checks every 30 minutes. This will run HBase cron schedule periodically to automate downloading of new HBase information on the common storage account to local node.
107+
* Since HBase cron isn't set up by default, you need to rerun this script when perform scaling to your HBase cluster. If your HBase cluster scales often, you may choose to set up HBase cron job automatically. For example: `-h "*/30 * * * *"` configures the script to perform checks every 30 minutes. This will run HBase cron schedule periodically to automate downloading of new HBase information on the common storage account to local node.
108108
109109
110110
111111
## Set up communication manually (Optional, if provided script in above step fails)
112112
113113
__NOTE:__ These steps need to perform every time one of the clusters undergoes a scaling activity.
114114
115-
1. Copy the hbase-site.xml from local storage to the root of your Spark cluster's default storage. Edit the command below to reflect your configuration. Then, from your open SSH session to the HBase cluster, enter the command:
115+
1. Copy the hbase-site.xml from local storage to the root of your Spark cluster's default storage. Edit the command to reflect your configuration. Then, from your open SSH session to the HBase cluster, enter the command:
116116
117117
| Syntax value | New value|
118118
|---|---|
119-
|[URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) | Modify to reflect your storage. The syntax below is for blob storage with secure transfer enabled.|
119+
|[URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) | Modify to reflect your storage. The syntax is for blob storage with secure transfer enabled.|
120120
|`SPARK_STORAGE_CONTAINER`|Replace with the default storage container name used for the Spark cluster.|
121121
|`SPARK_STORAGE_ACCOUNT`|Replace with the default storage account name used for the Spark cluster.|
122122
@@ -131,13 +131,13 @@ __NOTE:__ These steps need to perform every time one of the clusters undergoes a
131131
```
132132
133133
134-
3. Connect to the head node of your Spark cluster using SSH. Edit the command below by replacing `SPARKCLUSTER` with the name of your Spark cluster, and then enter the command:
134+
3. Connect to the head node of your Spark cluster using SSH. Edit the command by replacing `SPARKCLUSTER` with the name of your Spark cluster, and then enter the command:
135135
136136
```cmd
137137
138138
```
139139
140-
4. Enter the command below to copy `hbase-site.xml` from your Spark cluster's default storage to the Spark 2 configuration folder on the cluster's local storage:
140+
4. Enter the command to copy `hbase-site.xml` from your Spark cluster's default storage to the Spark 2 configuration folder on the cluster's local storage:
141141
142142
```bash
143143
sudo hdfs dfs -copyToLocal /hbase-site.xml /etc/spark2/conf
@@ -159,7 +159,7 @@ As an example, the following table lists two versions and the corresponding comm
159159
160160
2. Keep this Spark shell instance open and continue to [Define a catalog and query](#define-a-catalog-and-query). If you don't find the jars that correspond to your versions in the SHC Core repository, continue reading.
161161
162-
For subsequent combinations of Spark and HBase versions, these artifacts are no longer published at above repo. You can build the jars directly from the [spark-hbase-connector](https://github.com/hortonworks-spark/shc) GitHub branch. For example, if you are running with Spark 2.4 and HBase 2.1, complete these steps:
162+
For subsequent combinations of Spark and HBase versions, these artifacts are no longer published at above repo. You can build the jars directly from the [spark-hbase-connector](https://github.com/hortonworks-spark/shc) GitHub branch. For example, if you're running with Spark 2.4 and HBase 2.1, complete these steps:
163163
164164
1. Clone the repo:
165165
@@ -224,7 +224,7 @@ In this step, you define a catalog object that maps the schema from Apache Spark
224224
1. Identifies the rowkey as `key`, and map the column names used in Spark to the column family, column name, and column type as used in HBase.
225225
1. Defines the rowkey in detail as a named column (`rowkey`), which has a specific column family `cf` of `rowkey`.
226226
227-
1. Enter the command below to define a method that provides a DataFrame around your `Contacts` table in HBase:
227+
1. Enter the command to define a method that provides a DataFrame around your `Contacts` table in HBase:
228228
229229
```scala
230230
def withCatalog(cat: String): DataFrame = {

0 commit comments

Comments
 (0)