You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/hdinsight-use-sqoop.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how to use Azure PowerShell from a workstation to run Sqoop i
4
4
ms.service: hdinsight
5
5
ms.custom: devx-track-azurepowershell
6
6
ms.topic: how-to
7
-
ms.date: 09/14/2023
7
+
ms.date: 09/18/2023
8
8
---
9
9
10
10
# Use Apache Sqoop with Hadoop in HDInsight
@@ -18,15 +18,15 @@ Although Apache Hadoop is a natural choice for processing unstructured and semi-
18
18
[Apache Sqoop](https://sqoop.apache.org/docs/1.99.7/user.html) is a tool designed to transfer data between Hadoop clusters and relational databases. You can use it to import data from a relational database management system (RDBMS) such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Apache Hive, and then export the data back into an RDBMS. In this article, you're using Azure SQL Database for your relational database.
19
19
20
20
> [!IMPORTANT]
21
-
> This article sets up a test environment to perform the data transfer. You then choose a data transfer method for this environment from one of the methods in section [Run Sqoop jobs](#run-sqoop-jobs), further below.
21
+
> This article sets up a test environment to perform the data transfer. You then choose a data transfer method for this environment from one of the methods in section [Run Sqoop jobs](#run-sqoop-jobs).
22
22
23
23
For Sqoop versions that are supported on HDInsight clusters, see [What's new in the cluster versions provided by HDInsight?](../hdinsight-component-versioning.md)
24
24
25
25
## Understand the scenario
26
26
27
27
HDInsight cluster comes with some sample data. You use the following two samples:
28
28
29
-
* An Apache Log4j log file, which is located at `/example/data/sample.log`. The following logs are extracted from the file:
29
+
* An Apache `Log4j` log file, which is located at `/example/data/sample.log`. The following logs are extracted from the file:
30
30
31
31
```text
32
32
2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 577725851
@@ -55,7 +55,7 @@ In this article, you use these two datasets to test Sqoop import and export.
55
55
56
56
## <aname="create-cluster-and-sql-database"></a>Set up test environment
57
57
58
-
The cluster, SQL database, and other objects are created through the Azure portal using an Azure Resource Manager template. The template can be found in [Azure quickstart templates](https://azure.microsoft.com/resources/templates/hdinsight-linux-with-sql-database/). The Resource Manager template calls a bacpac package to deploy the table schemas to a SQL database. If you want to use a private container for the bacpac files, use the following values in the template:
58
+
The cluster, SQL database, and other objects are created through the Azure portal using an Azure Resource Manager template. The template can be found in [Azure quickstart templates](https://azure.microsoft.com/resources/templates/hdinsight-linux-with-sql-database/). The Resource Manager template calls a bacpac package to deploy the table schemas to an SQL database. If you want to use a private container for the bacpac files, use the following values in the template:
59
59
60
60
```json
61
61
"storageKeyType": "Primary",
@@ -88,15 +88,15 @@ The cluster, SQL database, and other objects are created through the Azure porta
88
88
|Bacpac File Name |Use the default value unless you want to use your own bacpac file.|
89
89
|Location |Use the default value.|
90
90
91
-
The [logical SQL server](/azure/azure-sql/database/logical-servers) name will be `<ClusterName>dbserver`. The database name will be `<ClusterName>db`. The default storage account name will be`e6qhezrh2pdqu`.
91
+
The [logical SQL server](/azure/azure-sql/database/logical-servers) name is `<ClusterName>dbserver`. The database name is `<ClusterName>db`. The default storage account name is`e6qhezrh2pdqu`.
92
92
93
93
3. Select **I agree to the terms and conditions stated above**.
94
94
95
95
4. Select **Purchase**. You see a new tile titled Submitting deployment for Template deployment. It takes about around 20 minutes to create the cluster and SQL database.
96
96
97
97
## Run Sqoop jobs
98
98
99
-
HDInsight can run Sqoop jobs by using a variety of methods. Use the following table to decide which method is right for you, then follow the link for a walkthrough.
99
+
HDInsight can run Sqoop jobs by using various methods. Use the following table to decide which method is right for you, then follow the link for a walkthrough.
100
100
101
101
|**Use this** if you want... | ...an **interactive** shell | ...**batch** processing | ...from this **client operating system**|
0 commit comments