You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-using-spark-query-hbase.md
+63-39Lines changed: 63 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,19 +7,20 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.custom: hdinsightactive
9
9
ms.topic: conceptual
10
-
ms.date: 03/12/2019
10
+
ms.date: 06/06/2019
11
11
---
12
+
12
13
# Use Apache Spark to read and write Apache HBase data
13
14
14
15
Apache HBase is typically queried either with its low-level API (scans, gets, and puts) or with a SQL syntax using Apache Phoenix. Apache also provides the Apache Spark HBase Connector, which is a convenient and performant alternative to query and modify data stored by HBase.
15
16
16
17
## Prerequisites
17
18
18
-
* Two separate HDInsight clusters, one HBase, and one Spark with at least Spark 2.1 (HDInsight 3.6) installed.
19
-
* The Spark cluster needs to communicate directly with the HBase cluster with minimal latency, so the recommended configuration is deploying both clusters in the same virtual network. For more information, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
19
+
* Two separate HDInsight clusters deployed in the same virtual network. One HBase, and one Spark with at least Spark 2.1 (HDInsight 3.6) installed. For more information, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
20
+
20
21
* An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](hdinsight-hadoop-linux-use-ssh-unix.md).
21
-
* The [URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be wasb:// for Azure Blob Storage, abfs:// for Azure Data Lake Storage Gen2 or adl:// for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage or Data Lake Storage Gen2, the URI would be wasbs:// or abfss://, respectively See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
22
22
23
+
* The [URI scheme](hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be wasb:// for Azure Blob Storage, abfs:// for Azure Data Lake Storage Gen2 or adl:// for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage or Data Lake Storage Gen2, the URI would be wasbs:// or abfss://, respectively See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
23
24
24
25
## Overall process
25
26
@@ -34,38 +35,47 @@ The high-level process for enabling your Spark cluster to query your HDInsight c
34
35
35
36
## Prepare sample data in Apache HBase
36
37
37
-
In this step, you create and populate a simple table in Apache HBase that you can then query using Spark.
38
+
In this step, you create and populate a table in Apache HBase that you can then query using Spark.
38
39
39
-
1.Connect to the head node of your HBase cluster using SSH. For more information, see [Connect to HDInsight using SSH](hdinsight-hadoop-linux-use-ssh-unix.md). Edit the command below by replacing `HBASECLUSTER` with the name of your HBase cluster, `sshuser` with the ssh user account name, and then enter the command.
40
+
1.Use the `ssh` command to connect to your HBase cluster. Edit the command below by replacing `HBASECLUSTER` with the name of your HBase cluster, and then enter the command:
2. Enter the command below to start the HBase shell:
46
-
47
-
hbase shell
46
+
2. Use the `hbase shell` command to start the HBase interactive shell. Enter the following command in your SSH connection:
48
47
49
-
3. Enter the command below to create a `Contacts` table with the column families `Personal` and `Office`:
48
+
```bash
49
+
hbase shell
50
+
```
50
51
51
-
create 'Contacts', 'Personal', 'Office'
52
+
3. Use the `create` command to create an HBase table with two-column families. Enter the following command:
52
53
53
-
4. Enter the commands below to load a few sample rows of data:
54
+
```hbase
55
+
create 'Contacts', 'Personal', 'Office'
56
+
```
54
57
55
-
put 'Contacts', '1000', 'Personal:Name', 'John Dole'
56
-
put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
57
-
put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
58
-
put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
59
-
put 'Contacts', '8396', 'Personal:Name', 'Calvin Raji'
60
-
put 'Contacts', '8396', 'Personal:Phone', '230-555-0191'
61
-
put 'Contacts', '8396', 'Office:Phone', '230-555-0191'
62
-
put 'Contacts', '8396', 'Office:Address', '5415 San Gabriel Dr.'
58
+
4. Use the `put` command to insert values at a specified column in a specified row in a particular table. Enter the following command:
59
+
60
+
```hbase
61
+
put 'Contacts', '1000', 'Personal:Name', 'John Dole'
62
+
put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
63
+
put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
64
+
put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
65
+
put 'Contacts', '8396', 'Personal:Name', 'Calvin Raji'
66
+
put 'Contacts', '8396', 'Personal:Phone', '230-555-0191'
67
+
put 'Contacts', '8396', 'Office:Phone', '230-555-0191'
68
+
put 'Contacts', '8396', 'Office:Address', '5415 San Gabriel Dr.'
69
+
```
63
70
64
-
5. Enter the command below to exit the HBase shell:
71
+
5. Use the `exit` command to stop the HBase interactive shell. Enter the following command:
65
72
66
-
exit
73
+
```hbase
74
+
exit
75
+
```
67
76
68
77
## Copy hbase-site.xml to Spark cluster
78
+
69
79
Copy the hbase-site.xml from local storage to the root of your Spark cluster's default storage. Edit the command below to reflect your configuration. Then, from your open SSH session to the HBase cluster, enter the command:
70
80
71
81
| Syntax value | New value|
@@ -74,23 +84,27 @@ Copy the hbase-site.xml from local storage to the root of your Spark cluster's d
74
84
|`SPARK_STORAGE_CONTAINER`|Replace with the default storage container name used for the Spark cluster.|
75
85
|`SPARK_STORAGE_ACCOUNT`|Replace with the default storage account name used for the Spark cluster.|
Then exit your ssh connection to your HBase cluster.
92
+
81
93
## Put hbase-site.xml on your Spark cluster
82
94
83
95
1. Connect to the head node of your Spark cluster using SSH.
84
96
85
97
2. Enter the command below to copy `hbase-site.xml` from your Spark cluster's default storage to the Spark 2 configuration folder on the cluster's local storage:
0 commit comments