Skip to content

Commit 32a9289

Browse files
authored
Merge pull request #116688 from hrasheed-msft/hdi_underperf_docs
HDInsight: breaking up beeline connection details
2 parents 0630aad + fb324ee commit 32a9289

File tree

3 files changed

+165
-157
lines changed

3 files changed

+165
-157
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -772,6 +772,8 @@
772772
href: ./hadoop/hdinsight-use-hive.md
773773
- name: Use the Apache Hive View
774774
href: ./hadoop/apache-hadoop-use-hive-ambari-view.md
775+
- name: Connect to Apache Beeline
776+
href: ./hadoop/connect-install-beeline.md
775777
- name: Use Apache Hive Beeline
776778
href: ./hadoop/apache-hadoop-use-hive-beeline.md
777779
- name: Use Grafana

articles/hdinsight/hadoop/apache-hadoop-use-hive-beeline.md

Lines changed: 1 addition & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -9,107 +9,11 @@ ms.topic: conceptual
99
ms.custom: seoapr2020
1010
ms.date: 04/17/2020
1111
---
12-
1312
# Use the Apache Beeline client with Apache Hive
1413

1514
Learn how to use [Apache Beeline](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline–NewCommandLineShell) to run Apache Hive queries on HDInsight.
1615

17-
Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. To install Beeline locally, see [Install beeline client](#install-beeline-client), below. Beeline uses JDBC to connect to HiveServer2, a service hosted on your HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet. The following examples provide the most common connection strings used to connect to HDInsight from Beeline.
18-
19-
## Types of connections
20-
21-
### From an SSH session
22-
23-
When connecting from an SSH session to a cluster headnode, you can then connect to the `headnodehost` address on port `10001`:
24-
25-
```bash
26-
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
27-
```
28-
29-
---
30-
31-
### Over an Azure Virtual Network
32-
33-
When connecting from a client to HDInsight over an Azure Virtual Network, you must provide the fully qualified domain name (FQDN) of a cluster head node. Since this connection is made directly to the cluster nodes, the connection uses port `10001`:
34-
35-
```bash
36-
beeline -u 'jdbc:hive2://<headnode-FQDN>:10001/;transportMode=http'
37-
```
38-
39-
Replace `<headnode-FQDN>` with the fully qualified domain name of a cluster headnode. To find the fully qualified domain name of a headnode, use the information in the [Manage HDInsight using the Apache Ambari REST API](../hdinsight-hadoop-manage-ambari-rest-api.md#get-the-fqdn-of-cluster-nodes) document.
40-
41-
---
42-
43-
### To HDInsight Enterprise Security Package (ESP) cluster using Kerberos
44-
45-
When connecting from a client to an Enterprise Security Package (ESP) cluster joined to Azure Active Directory (AAD)-DS on a machine in same realm of the cluster, you must also specify the domain name `<AAD-Domain>` and the name of a domain user account with permissions to access the cluster `<username>`:
46-
47-
```bash
48-
kinit <username>
49-
beeline -u 'jdbc:hive2://<headnode-FQDN>:10001/default;principal=hive/_HOST@<AAD-Domain>;auth-kerberos;transportMode=http' -n <username>
50-
```
51-
52-
Replace `<username>` with the name of an account on the domain with permissions to access the cluster. Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check `/etc/krb5.conf` for the realm names if needed.
53-
54-
To find the JDBC URL from Ambari:
55-
56-
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/summary`, where `CLUSTERNAME` is the name of your cluster. Ensure that HiveServer2 is running.
57-
58-
1. Use clipboard to copy the HiveServer2 JDBC URL.
59-
60-
---
61-
62-
### Over public or private endpoints
63-
64-
When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using TLS/SSL.
65-
66-
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
67-
68-
```bash
69-
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
70-
```
71-
72-
or for private endpoint:
73-
74-
```bash
75-
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
76-
```
77-
78-
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
79-
80-
---
81-
82-
### Use Beeline with Apache Spark
83-
84-
Apache Spark provides its own implementation of HiveServer2, which is sometimes referred to as the Spark Thrift server. This service uses Spark SQL to resolve queries instead of Hive. And may provide better performance depending on your query.
85-
86-
#### Through public or private endpoints
87-
88-
The connection string used is slightly different. Instead of containing `httpPath=/hive2` it uses `httpPath/sparkhive2`. Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
89-
90-
```bash
91-
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
92-
```
93-
94-
or for private endpoint:
95-
96-
```bash
97-
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
98-
```
99-
100-
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
101-
102-
---
103-
104-
#### From cluster head or inside Azure Virtual Network with Apache Spark
105-
106-
When connecting directly from the cluster head node, or from a resource inside the same Azure Virtual Network as the HDInsight cluster, port `10002` should be used for Spark Thrift server instead of `10001`. The following example shows how to connect directly to the head node:
107-
108-
```bash
109-
/usr/hdp/current/spark2-client/bin/beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'
110-
```
111-
112-
---
16+
Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. To connect to the Beeline client installed on your HDInsight cluster, or install Beeline locally, see [Connect to or install Apache Beeline](connect-install-beeline.md). Beeline uses JDBC to connect to HiveServer2, a service hosted on your HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet. The following examples provide the most common connection strings used to connect to HDInsight from Beeline.
11317

11418
## Prerequisites for examples
11519

@@ -295,66 +199,6 @@ This example is a continuation from the prior example. Use the following steps t
295199
+---------------+---------------+---------------+---------------+---------------+---------------+---------------+--+
296200
3 rows selected (0.813 seconds)
297201
298-
## Install beeline client
299-
300-
Although Beeline is included on the head nodes, you may want to install it locally. The install steps for a local machine are based on a [Windows Subsystem for Linux](https://docs.microsoft.com/windows/wsl/install-win10).
301-
302-
1. Update package lists. Enter the following command in your bash shell:
303-
304-
```bash
305-
sudo apt-get update
306-
```
307-
308-
1. Install Java if not installed. You can check with the `which java` command.
309-
310-
1. If no java package is installed, enter the following command:
311-
312-
```bash
313-
sudo apt install openjdk-11-jre-headless
314-
```
315-
316-
1. Open the bashrc file (often found in ~/.bashrc): `nano ~/.bashrc`.
317-
318-
1. Amend the bashrc file. Add the following line at the end of the file:
319-
320-
```bash
321-
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
322-
```
323-
324-
Then press **Ctrl+X**, then **Y**, then enter.
325-
326-
1. Download Hadoop and Beeline archives, enter the following commands:
327-
328-
```bash
329-
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
330-
wget https://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
331-
```
332-
333-
1. Unpack the archives, enter the following commands:
334-
335-
```bash
336-
tar -xvzf hadoop-2.7.3.tar.gz
337-
tar -xvzf apache-hive-1.2.1-bin.tar.gz
338-
```
339-
340-
1. Further amend the bashrc file. You'll need to identify the path to where the archives were unpacked. If using the [Windows Subsystem for Linux](https://docs.microsoft.com/windows/wsl/install-win10), and you followed the steps exactly, your path would be `/mnt/c/Users/user/`, where `user` is your user name.
341-
342-
1. Open the file: `nano ~/.bashrc`
343-
344-
1. Modify the commands below with the appropriate path and then enter them at the end of the bashrc file:
345-
346-
```bash
347-
export HADOOP_HOME=/path_where_the_archives_were_unpacked/hadoop-2.7.3
348-
export HIVE_HOME=/path_where_the_archives_were_unpacked/apache-hive-1.2.1-bin
349-
PATH=$PATH:$HIVE_HOME/bin
350-
```
351-
352-
1. Then press **Ctrl+X**, then **Y**, then enter.
353-
354-
1. Close and then reopen you bash session.
355-
356-
1. Test your connection. Use the connection format from [Over public or private endpoints](#over-public-or-private-endpoints), above.
357-
358202
## Next steps
359203
360204
* For more general information on Hive in HDInsight, see [Use Apache Hive with Apache Hadoop on HDInsight](hdinsight-use-hive.md)
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: Connect to or install Apache Beeline - Azure HDInsight
3+
description: Learn how to connect to the Apache Beeline client to run Hive queries with Hadoop on HDInsight. Beeline is a utility for working with HiveServer2 over JDBC.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: conceptual
9+
ms.custom: seoapr2020
10+
ms.date: 05/27/2020
11+
---
12+
# Connect to Apache Beeline on HDInsight or install it locally
13+
14+
[Apache Beeline](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline–NewCommandLineShell) is a Hive client that is included on the head nodes of your HDInsight cluster. This article describes how to connect to the Beeline client installed on your HDInsight cluster across different types of connections. It also discusses how to [Install the Beeline client locally](#install-beeline-client).
15+
16+
## Types of connections
17+
18+
### From an SSH session
19+
20+
When connecting from an SSH session to a cluster headnode, you can then connect to the `headnodehost` address on port `10001`:
21+
22+
```bash
23+
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
24+
```
25+
26+
### Over an Azure Virtual Network
27+
28+
When connecting from a client to HDInsight over an Azure Virtual Network, you must provide the fully qualified domain name (FQDN) of a cluster head node. Since this connection is made directly to the cluster nodes, the connection uses port `10001`:
29+
30+
```bash
31+
beeline -u 'jdbc:hive2://<headnode-FQDN>:10001/;transportMode=http'
32+
```
33+
34+
Replace `<headnode-FQDN>` with the fully qualified domain name of a cluster headnode. To find the fully qualified domain name of a headnode, use the information in the [Manage HDInsight using the Apache Ambari REST API](../hdinsight-hadoop-manage-ambari-rest-api.md#get-the-fqdn-of-cluster-nodes) document.
35+
36+
### To HDInsight Enterprise Security Package (ESP) cluster using Kerberos
37+
38+
When connecting from a client to an Enterprise Security Package (ESP) cluster joined to Azure Active Directory (AAD)-DS on a machine in same realm of the cluster, you must also specify the domain name `<AAD-Domain>` and the name of a domain user account with permissions to access the cluster `<username>`:
39+
40+
```bash
41+
kinit <username>
42+
beeline -u 'jdbc:hive2://<headnode-FQDN>:10001/default;principal=hive/_HOST@<AAD-Domain>;auth-kerberos;transportMode=http' -n <username>
43+
```
44+
45+
Replace `<username>` with the name of an account on the domain with permissions to access the cluster. Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check `/etc/krb5.conf` for the realm names if needed.
46+
47+
To find the JDBC URL from Ambari:
48+
49+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/summary`, where `CLUSTERNAME` is the name of your cluster. Ensure that HiveServer2 is running.
50+
51+
1. Use clipboard to copy the HiveServer2 JDBC URL.
52+
53+
### Over public or private endpoints
54+
55+
When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using TLS/SSL.
56+
57+
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
58+
59+
```bash
60+
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
61+
```
62+
63+
or for private endpoint:
64+
65+
```bash
66+
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/hive2' -n admin -p 'password'
67+
```
68+
69+
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
70+
71+
### Use Beeline with Apache Spark
72+
73+
Apache Spark provides its own implementation of HiveServer2, which is sometimes referred to as the Spark Thrift server. This service uses Spark SQL to resolve queries instead of Hive. And may provide better performance depending on your query.
74+
75+
#### Through public or private endpoints
76+
77+
The connection string used is slightly different. Instead of containing `httpPath=/hive2` it uses `httpPath/sparkhive2`. Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
78+
79+
```bash
80+
beeline -u 'jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
81+
```
82+
83+
or for private endpoint:
84+
85+
```bash
86+
beeline -u 'jdbc:hive2://clustername-int.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2' -n admin -p 'password'
87+
```
88+
89+
Private endpoints point to a basic load balancer, which can only be accessed from the VNETs peered in the same region. See [constraints on global VNet peering and load balancers](../../virtual-network/virtual-networks-faq.md#what-are-the-constraints-related-to-global-vnet-peering-and-load-balancers) for more info. You can use the `curl` command with `-v` option to troubleshoot any connectivity problems with public or private endpoints before using beeline.
90+
91+
#### From cluster head or inside Azure Virtual Network with Apache Spark
92+
93+
When connecting directly from the cluster head node, or from a resource inside the same Azure Virtual Network as the HDInsight cluster, port `10002` should be used for Spark Thrift server instead of `10001`. The following example shows how to connect directly to the head node:
94+
95+
```bash
96+
/usr/hdp/current/spark2-client/bin/beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'
97+
```
98+
99+
## Install Beeline client
100+
101+
Although Beeline is included on the head nodes, you may want to install it locally. The install steps for a local machine are based on a [Windows Subsystem for Linux](https://docs.microsoft.com/windows/wsl/install-win10).
102+
103+
1. Update package lists. Enter the following command in your bash shell:
104+
105+
```bash
106+
sudo apt-get update
107+
```
108+
109+
1. Install Java if not installed. You can check with the `which java` command.
110+
111+
1. If no java package is installed, enter the following command:
112+
113+
```bash
114+
sudo apt install openjdk-11-jre-headless
115+
```
116+
117+
1. Open the bashrc file (often found in ~/.bashrc): `nano ~/.bashrc`.
118+
119+
1. Amend the bashrc file. Add the following line at the end of the file:
120+
121+
```bash
122+
export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64
123+
```
124+
125+
Then press **Ctrl+X**, then **Y**, then enter.
126+
127+
1. Download Hadoop and Beeline archives, enter the following commands:
128+
129+
```bash
130+
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
131+
wget https://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz
132+
```
133+
134+
1. Unpack the archives, enter the following commands:
135+
136+
```bash
137+
tar -xvzf hadoop-2.7.3.tar.gz
138+
tar -xvzf apache-hive-1.2.1-bin.tar.gz
139+
```
140+
141+
1. Further amend the bashrc file. You'll need to identify the path to where the archives were unpacked. If using the [Windows Subsystem for Linux](https://docs.microsoft.com/windows/wsl/install-win10), and you followed the steps exactly, your path would be `/mnt/c/Users/user/`, where `user` is your user name.
142+
143+
1. Open the file: `nano ~/.bashrc`
144+
145+
1. Modify the commands below with the appropriate path and then enter them at the end of the bashrc file:
146+
147+
```bash
148+
export HADOOP_HOME=/path_where_the_archives_were_unpacked/hadoop-2.7.3
149+
export HIVE_HOME=/path_where_the_archives_were_unpacked/apache-hive-1.2.1-bin
150+
PATH=$PATH:$HIVE_HOME/bin
151+
```
152+
153+
1. Then press **Ctrl+X**, then **Y**, then enter.
154+
155+
1. Close and then reopen you bash session.
156+
157+
1. Test your connection. Use the connection format from [Over public or private endpoints](#over-public-or-private-endpoints), above.
158+
159+
## Next steps
160+
161+
* For examples using the Beeline client with Apache Hive, see [Use Apache Beeline with Apache Hive](apache-hadoop-use-hive-beeline.md)
162+
* For more general information on Hive in HDInsight, see [Use Apache Hive with Apache Hadoop on HDInsight](hdinsight-use-hive.md)

0 commit comments

Comments
 (0)