You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/hdinsight-use-sqoop.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 04/12/2019
9
+
ms.date: 12/06/2019
10
10
---
11
11
12
12
# Use Apache Sqoop with Hadoop in HDInsight
@@ -17,7 +17,7 @@ Learn how to use Apache Sqoop in HDInsight to import and export data between an
17
17
18
18
Although Apache Hadoop is a natural choice for processing unstructured and semi-structured data, such as logs and files, there may also be a need to process structured data that is stored in relational databases.
19
19
20
-
[Apache Sqoop](https://sqoop.apache.org/docs/1.99.7/user.html) is a tool designed to transfer data between Hadoop clusters and relational databases. You can use it to import data from a relational database management system (RDBMS) such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Apache Hive, and then export the data back into an RDBMS. In this article, you are using a SQL Server database for your relational database.
20
+
[Apache Sqoop](https://sqoop.apache.org/docs/1.99.7/user.html) is a tool designed to transfer data between Hadoop clusters and relational databases. You can use it to import data from a relational database management system (RDBMS) such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Apache Hive, and then export the data back into an RDBMS. In this article, you're using a SQL Server database for your relational database.
21
21
22
22
> [!IMPORTANT]
23
23
> This article sets up a test environment to perform the data transfer. You then choose a data transfer method for this environment from one of the methods in section [Run Sqoop jobs](#run-sqoop-jobs), further below.
@@ -56,6 +56,7 @@ HDInsight cluster comes with some sample data. You use the following two samples
56
56
In this article, you use these two datasets to test Sqoop import and export.
57
57
58
58
## <aname="create-cluster-and-sql-database"></a>Set up test environment
59
+
59
60
The cluster, SQL database, and other objects are created through the Azure portal using an Azure Resource Manager template. The template can be found in [Azure quickstart templates](https://azure.microsoft.com/resources/templates/101-hdinsight-linux-with-sql-database/). The Resource Manager template calls a bacpac package to deploy the table schemas to a SQL database. The bacpac package is located in a public blob container, https://hditutorialdata.blob.core.windows.net/usesqoop/SqoopTutorial-2016-2-23-11-2.bacpac. If you want to use a private container for the bacpac files, use the following values in the template:
60
61
61
62
```json
@@ -107,12 +108,13 @@ HDInsight can run Sqoop jobs by using a variety of methods. Use the following ta
107
108
108
109
## Limitations
109
110
110
-
* Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database does not currently support bulk inserts.
111
+
* Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database doesn't currently support bulk inserts.
111
112
* Batching - With Linux-based HDInsight, When using the `-batch` switch when performing inserts, Sqoop performs multiple inserts instead of batching the insert operations.
112
113
113
114
## Next steps
114
-
Now you have learned how to use Sqoop. To learn more, see:
115
+
116
+
Now you've learned how to use Sqoop. To learn more, see:
115
117
116
118
*[Use Apache Hive with HDInsight](../hdinsight-use-hive.md)
117
-
*[Use Apache Pig with HDInsight](../hdinsight-use-pig.md)
118
119
*[Upload data to HDInsight](../hdinsight-upload-data.md): Find other methods for uploading data to HDInsight/Azure Blob storage.
120
+
*[Use Apache Sqoop to import and export data between Apache Hadoop on HDInsight and SQL Database](./apache-hadoop-use-sqoop-mac-linux.md)
0 commit comments