Skip to content

Commit 80a5ae7

Browse files
Merge pull request #105551 from dagiro/freshness202
freshness202
2 parents bb78fb5 + 5bcbf5c commit 80a5ae7

File tree

1 file changed

+39
-35
lines changed

1 file changed

+39
-35
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-hive-beeline.md

Lines changed: 39 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 12/12/2019
9+
ms.date: 02/25/2020
1010
---
1111

1212
# Use the Apache Beeline client with Apache Hive
1313

1414
Learn how to use [Apache Beeline](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline–NewCommandLineShell) to run Apache Hive queries on HDInsight.
1515

16-
Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. To install Beeline locally, see [Install beeline client](#install-beeline-client), below. Beeline uses JDBC to connect to HiveServer2, a service hosted on your HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet. The following examples provide the most common connection strings used to connect to HDInsight from Beeline:
16+
Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. To install Beeline locally, see [Install beeline client](#install-beeline-client), below. Beeline uses JDBC to connect to HiveServer2, a service hosted on your HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet. The following examples provide the most common connection strings used to connect to HDInsight from Beeline.
1717

1818
## Types of connections
1919

@@ -54,7 +54,9 @@ Replace `<username>` with the name of an account on the domain with permissions
5454

5555
### Over public or private endpoints
5656

57-
When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using SSL:
57+
When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using SSL.
58+
59+
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
5860

5961
```bash
6062
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
@@ -66,19 +68,17 @@ or for private endpoint:
6668
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
6769
```
6870

69-
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
70-
7171
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
7272

7373
---
7474

75-
### <a id="sparksql"></a>Use Beeline with Apache Spark
75+
### Use Beeline with Apache Spark
7676

7777
Apache Spark provides its own implementation of HiveServer2, which is sometimes referred to as the Spark Thrift server. This service uses Spark SQL to resolve queries instead of Hive, and may provide better performance depending on your query.
7878

7979
#### Through public or private endpoints
8080

81-
The connection string used is slightly different. Instead of containing `httpPath=/hive2` it's `httpPath/sparkhive2`:
81+
The connection string used is slightly different. Instead of containing `httpPath=/hive2` it's `httpPath/sparkhive2`. Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
8282

8383
```bash
8484
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
@@ -90,8 +90,6 @@ or for private endpoint:
9090
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
9191
```
9292

93-
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (e.g. [email protected]). Replace `password` with the password for the cluster login account.
94-
9593
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
9694

9795
---
@@ -106,7 +104,7 @@ When connecting directly from the cluster head node, or from a resource inside t
106104

107105
---
108106

109-
## <a id="prereq"></a>Prerequisites
107+
## Prerequisites for examples
110108

111109
* A Hadoop cluster on HDInsight. See [Get Started with HDInsight on Linux](./apache-hadoop-linux-tutorial-get-started.md).
112110

@@ -116,7 +114,7 @@ When connecting directly from the cluster head node, or from a resource inside t
116114

117115
* Option 2: A local Beeline client.
118116

119-
## <a id="beeline"></a>Run a Hive query
117+
## Run a Hive query
120118

121119
This example is based on using the Beeline client from an SSH connection.
122120

@@ -183,24 +181,21 @@ This example is based on using the Beeline client from an SSH connection.
183181
t7 string)
184182
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
185183
STORED AS TEXTFILE LOCATION 'wasbs:///example/data/';
186-
SELECT t4 AS sev, COUNT(*) AS count FROM log4jLogs
187-
WHERE t4 = '[ERROR]' AND INPUT__FILE__NAME LIKE '%.log'
184+
SELECT t4 AS sev, COUNT(*) AS count FROM log4jLogs
185+
WHERE t4 = '[ERROR]' AND INPUT__FILE__NAME LIKE '%.log'
188186
GROUP BY t4;
189187
```
190188
191189
These statements do the following actions:
192190
193-
* `DROP TABLE` - If the table exists, it's deleted.
194-
195-
* `CREATE EXTERNAL TABLE` - Creates an **external** table in Hive. External tables only store the table definition in Hive. The data is left in the original location.
196-
197-
* `ROW FORMAT` - How the data is formatted. In this case, the fields in each log are separated by a space.
198-
199-
* `STORED AS TEXTFILE LOCATION` - Where the data is stored and in what file format.
200-
201-
* `SELECT` - Selects a count of all rows where column **t4** contains the value **[ERROR]**. This query returns a value of **3** as there are three rows that contain this value.
202-
203-
* `INPUT__FILE__NAME LIKE '%.log'` - Hive attempts to apply the schema to all files in the directory. In this case, the directory contains files that don't match the schema. To prevent garbage data in the results, this statement tells Hive that it should only return data from files ending in .log.
191+
|Statement |Description |
192+
|---|---|
193+
|DROP TABLE|If the table exists, it's deleted.|
194+
|CREATE EXTERNAL TABLE|Creates an **external** table in Hive. External tables only store the table definition in Hive. The data is left in the original location.|
195+
|ROW FORMAT|How the data is formatted. In this case, the fields in each log are separated by a space.|
196+
|STORED AS TEXTFILE LOCATION|Where the data is stored and in what file format.|
197+
|SELECT|Selects a count of all rows where column **t4** contains the value **[ERROR]**. This query returns a value of **3** as there are three rows that contain this value.|
198+
|INPUT__FILE__NAME LIKE '%.log'|Hive attempts to apply the schema to all files in the directory. In this case, the directory contains files that don't match the schema. To prevent garbage data in the results, this statement tells Hive that it should only return data from files ending in .log.|
204199
205200
> [!NOTE]
206201
> External tables should be used when you expect the underlying data to be updated by an external source. For example, an automated data upload process or a MapReduce operation.
@@ -231,7 +226,11 @@ This example is based on using the Beeline client from an SSH connection.
231226
+----------+--------+--+
232227
1 row selected (47.351 seconds)
233228
234-
6. To exit Beeline, use `!exit`.
229+
6. Exit Beeline:
230+
231+
```bash
232+
!exit
233+
```
235234
236235
## Run a HiveQL file
237236
@@ -243,7 +242,7 @@ This is a continuation from the prior example. Use the following steps to create
243242
nano query.hql
244243
```
245244
246-
2. Use the following text as the contents of the file. This query creates a new 'internal' table named **errorLogs**:
245+
1. Use the following text as the contents of the file. This query creates a new 'internal' table named **errorLogs**:
247246
248247
```hiveql
249248
CREATE TABLE IF NOT EXISTS errorLogs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) STORED AS ORC;
@@ -252,16 +251,18 @@ This is a continuation from the prior example. Use the following steps to create
252251
253252
These statements do the following actions:
254253
255-
* **CREATE TABLE IF NOT EXISTS** - If the table doesn't already exist, it's created. Since the **EXTERNAL** keyword isn't used, this statement creates an internal table. Internal tables are stored in the Hive data warehouse and are managed completely by Hive.
256-
* **STORED AS ORC** - Stores the data in Optimized Row Columnar (ORC) format. ORC format is a highly optimized and efficient format for storing Hive data.
257-
* **INSERT OVERWRITE ... SELECT** - Selects rows from the **log4jLogs** table that contain **[ERROR]**, then inserts the data into the **errorLogs** table.
254+
|Statement |Description |
255+
|---|---|
256+
|CREATE TABLE IF NOT EXISTS|If the table doesn't already exist, it's created. Since the **EXTERNAL** keyword isn't used, this statement creates an internal table. Internal tables are stored in the Hive data warehouse and are managed completely by Hive.|
257+
|STORED AS ORC|Stores the data in Optimized Row Columnar (ORC) format. ORC format is a highly optimized and efficient format for storing Hive data.|
258+
|INSERT OVERWRITE ... SELECT|Selects rows from the **log4jLogs** table that contain **[ERROR]**, then inserts the data into the **errorLogs** table.|
258259
259260
> [!NOTE]
260261
> Unlike external tables, dropping an internal table deletes the underlying data as well.
261262
262-
3. To save the file, use **Ctrl**+**X**, then enter **Y**, and finally **Enter**.
263+
1. To save the file, use **Ctrl**+**X**, then enter **Y**, and finally **Enter**.
263264
264-
4. Use the following to run the file using Beeline:
265+
1. Use the following to run the file using Beeline:
265266
266267
```bash
267268
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http' -i query.hql
@@ -270,7 +271,7 @@ This is a continuation from the prior example. Use the following steps to create
270271
> [!NOTE]
271272
> The `-i` parameter starts Beeline and runs the statements in the `query.hql` file. Once the query completes, you arrive at the `jdbc:hive2://headnodehost:10001/>` prompt. You can also run a file using the `-f` parameter, which exits Beeline after the query completes.
272273
273-
5. To verify that the **errorLogs** table was created, use the following statement to return all the rows from **errorLogs**:
274+
1. To verify that the **errorLogs** table was created, use the following statement to return all the rows from **errorLogs**:
274275
275276
```hiveql
276277
SELECT * from errorLogs;
@@ -305,7 +306,9 @@ Although Beeline is included on the head nodes of your HDInsight cluster, you ma
305306
sudo apt install openjdk-11-jre-headless
306307
```
307308
308-
1. Amend the bashrc file (usually found in ~/.bashrc). Open the file with `nano ~/.bashrc` and then add the following line at the end of the file:
309+
1. Open the bashrc file (usually found in ~/.bashrc): `nano ~/.bashrc`.
310+
311+
1. Amend the bashrc file. Add the following line at the end of the file:
309312
310313
```bash
311314
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
@@ -330,11 +333,12 @@ Although Beeline is included on the head nodes of your HDInsight cluster, you ma
330333
1. Further amend the bashrc file. You'll need to identify the path to where the archives were unpacked. If using the [Windows Subsystem for Linux](https://docs.microsoft.com/windows/wsl/install-win10), and you followed the steps exactly, your path would be `/mnt/c/Users/user/`, where `user` is your user name.
331334
332335
1. Open the file: `nano ~/.bashrc`
336+
333337
1. Modify the commands below with the appropriate path and then enter them at the end of the bashrc file:
334338
335339
```bash
336-
export HADOOP_HOME=/$(path_where_the_archives_were_unpacked)/hadoop-2.7.3
337-
export HIVE_HOME=/$(path_where_the_archives_were_unpacked)/apache-hive-1.2.1-bin
340+
export HADOOP_HOME=/path_where_the_archives_were_unpacked/hadoop-2.7.3
341+
export HIVE_HOME=/path_where_the_archives_were_unpacked/apache-hive-1.2.1-bin
338342
PATH=$PATH:$HIVE_HOME/bin
339343
```
340344

0 commit comments

Comments
 (0)