Skip to content

Commit 3fb46be

Browse files
authored
Merge pull request #97395 from dagiro/freshness81
freshness81
2 parents aa6a93b + a2de446 commit 3fb46be

File tree

1 file changed

+83
-23
lines changed

1 file changed

+83
-23
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-sqoop-mac-linux.md

Lines changed: 83 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
---
22
title: Apache Sqoop with Apache Hadoop - Azure HDInsight
33
description: Learn how to use Apache Sqoop to import and export between Apache Hadoop on HDInsight and an Azure SQL Database.
4-
keywords: hadoop sqoop,sqoop
5-
64
author: hrasheed-msft
75
ms.author: hrasheed
86
ms.reviewer: jasonh
97
ms.service: hdinsight
10-
ms.custom: hdinsightactive,hdiseo17may2017
118
ms.topic: conceptual
12-
ms.date: 04/15/2019
9+
ms.custom: hdinsightactive,hdiseo17may2017
10+
ms.date: 11/28/2019
1311
---
1412

1513
# Use Apache Sqoop to import and export data between Apache Hadoop on HDInsight and SQL Database
@@ -22,58 +20,120 @@ Learn how to use Apache Sqoop to import and export between an Apache Hadoop clus
2220

2321
* Completion of [Set up test environment](./hdinsight-use-sqoop.md#create-cluster-and-sql-database) from [Use Apache Sqoop with Hadoop in HDInsight](./hdinsight-use-sqoop.md).
2422

25-
* A client to query the Azure SQL database. Consider using [SQL Server Management Studio](../../sql-database/sql-database-connect-query-ssms.md) or [Visual Studio Code](../../sql-database/sql-database-connect-query-vscode.md).
26-
2723
* An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
2824

29-
## Sqoop export
25+
* Familiarity with Sqoop. For more information, see [Sqoop User Guide](https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html).
3026

31-
From Hive to SQL Server.
27+
## Set up
3228

33-
1. Use SSH to connect to the HDInsight cluster. Replace `CLUSTERNAME` with the name of your cluster, then enter the command:
29+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
3430

3531
```cmd
3632
3733
```
3834
39-
2. Replace `MYSQLSERVER` with the name of your SQL Server. To verify that Sqoop can see your SQL Database, enter the command below in your open SSH connection. Enter the password for the SQL Server login when prompted. This command returns a list of databases.
35+
1. For ease of use, set variables. Replace `PASSWORD`, `MYSQLSERVER`, and `MYDATABASE` with the relevant values, and then enter the commands below:
4036
4137
```bash
42-
sqoop list-databases --connect jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433 --username sqluser -P
38+
export password='PASSWORD'
39+
export sqlserver="MYSQLSERVER"
40+
export database="MYDATABASE"
41+
42+
43+
export serverConnect="jdbc:sqlserver://$sqlserver.database.windows.net:1433;user=sqluser;password=$password"
44+
export serverDbConnect="jdbc:sqlserver://$sqlserver.database.windows.net:1433;user=sqluser;password=$password;database=$database"
4345
```
4446
45-
3. Replace `MYSQLSERVER` with the name of your SQL Server, and `MYDATABASE` with the name of your SQL database. To export data from the Hive `hivesampletable` table to the `mobiledata` table in SQL Database, enter the command below in your open SSH connection. Enter the password for the SQL Server login when prompted
47+
## Sqoop export
48+
49+
From Hive to SQL Server.
50+
51+
1. To verify that Sqoop can see your SQL Database, enter the command below in your open SSH connection. This command returns a list of databases.
52+
53+
```bash
54+
sqoop list-databases --connect $serverConnect
55+
```
56+
57+
1. Enter the following command to see a list of tables for the specified database:
4658
4759
```bash
48-
sqoop export --connect 'jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433;database=MYDATABASE' --username sqluser -P -table 'mobiledata' --hcatalog-table hivesampletable
60+
sqoop list-tables --connect $serverDbConnect
4961
```
5062
51-
4. To verify that data was exported, use the following queries from your SQL client to view the exported data:
63+
1. To export data from the Hive `hivesampletable` table to the `mobiledata` table in SQL Database, enter the command below in your open SSH connection:
5264
53-
```sql
54-
SELECT COUNT(*) FROM [dbo].[mobiledata] WITH (NOLOCK);
55-
SELECT TOP(25) * FROM [dbo].[mobiledata] WITH (NOLOCK);
65+
```bash
66+
sqoop export --connect $serverDbConnect \
67+
-table mobiledata \
68+
--hcatalog-table hivesampletable
69+
```
70+
71+
1. To verify that data was exported, use the following queries from your SSH connection to view the exported data:
72+
73+
```bash
74+
sqoop eval --connect $serverDbConnect \
75+
--query "SELECT COUNT(*) from dbo.mobiledata WITH (NOLOCK)"
76+
77+
78+
sqoop eval --connect $serverDbConnect \
79+
--query "SELECT TOP(10) * from dbo.mobiledata WITH (NOLOCK)"
5680
```
5781
5882
## Sqoop import
5983
6084
From SQL Server to Azure storage.
6185
62-
1. Replace `MYSQLSERVER` with the name of your SQL Server, and `MYDATABASE` with the name of your SQL database. Enter the command below in your open SSH connection to import data from the `mobiledata` table in SQL Database, to the `wasb:///tutorials/usesqoop/importeddata` directory on HDInsight. Enter the password for the SQL Server login when prompted. The fields in the data are separated by a tab character, and the lines are terminated by a new-line character.
86+
1. Enter the command below in your open SSH connection to import data from the `mobiledata` table in SQL Database, to the `wasbs:///tutorials/usesqoop/importeddata` directory on HDInsight. The fields in the data are separated by a tab character, and the lines are terminated by a new-line character.
87+
88+
```bash
89+
sqoop import --connect $serverDbConnect \
90+
--table mobiledata \
91+
--target-dir 'wasb:///tutorials/usesqoop/importeddata' \
92+
--fields-terminated-by '\t' \
93+
--lines-terminated-by '\n' -m 1
94+
```
95+
96+
1. Alternatively, you can also specify a Hive table:
6397
6498
```bash
65-
sqoop import --connect 'jdbc:sqlserver://MYSQLSERVER.database.windows.net:1433;database=MYDATABASE' --username sqluser -P --table 'mobiledata' --target-dir 'wasb:///tutorials/usesqoop/importeddata' --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1
99+
sqoop import --connect $serverDbConnect \
100+
--table mobiledata \
101+
--target-dir 'wasb:///tutorials/usesqoop/importeddata2' \
102+
--fields-terminated-by '\t' \
103+
--lines-terminated-by '\n' \
104+
--create-hive-table \
105+
--hive-table mobiledata_imported2 \
106+
--hive-import -m 1
66107
```
67108
68-
2. Once the import has completed, enter the following command in your open SSH connection to list out the data in the new directory:
109+
1. Once the import has completed, enter the following command in your open SSH connection to list out the data in the new directory:
69110
70111
```bash
71-
hdfs dfs -text /tutorials/usesqoop/importeddata/part-m-00000
112+
hadoop fs -tail /tutorials/usesqoop/importeddata/part-m-00000
72113
```
73114
115+
1. Use [beeline](./apache-hadoop-use-hive-beeline.md) to verify that the table has been created in Hive.
116+
117+
1. Connect
118+
119+
```bash
120+
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
121+
```
122+
123+
1. Execute each query below one at a time and review the output:
124+
125+
```hql
126+
show tables;
127+
describe mobiledata_imported2;
128+
SELECT COUNT(*) FROM mobiledata_imported2;
129+
SELECT * FROM mobiledata_imported2 LIMIT 10;
130+
```
131+
132+
1. Exit beeline with `!exit`.
133+
74134
## Limitations
75135
76-
* Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database does not support bulk inserts.
136+
* Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database doesn't support bulk inserts.
77137
78138
* Batching - With Linux-based HDInsight, When using the `-batch` switch when performing inserts, Sqoop makes multiple inserts instead of batching the insert operations.
79139
@@ -91,7 +151,7 @@ From SQL Server to Azure storage.
91151
92152
## Next steps
93153
94-
Now you have learned how to use Sqoop. To learn more, see:
154+
Now you've learned how to use Sqoop. To learn more, see:
95155
96156
* [Use Apache Oozie with HDInsight](../hdinsight-use-oozie-linux-mac.md): Use Sqoop action in an Oozie workflow.
97157
* [Analyze flight delay data using HDInsight](../interactive-query/interactive-query-tutorial-analyze-flight-data.md): Use Interactive Query to analyze flight delay data, and then use Sqoop to export data to an Azure SQL database.

0 commit comments

Comments
 (0)