Skip to content

Commit c2aae9b

Browse files
authored
Update hdinsight-use-sqoop.md
1 parent 6e836c3 commit c2aae9b

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/hdinsight/hadoop/hdinsight-use-sqoop.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to use Azure PowerShell from a workstation to run Sqoop i
44
ms.service: hdinsight
55
ms.custom: devx-track-azurepowershell
66
ms.topic: how-to
7-
ms.date: 09/14/2023
7+
ms.date: 09/18/2023
88
---
99

1010
# Use Apache Sqoop with Hadoop in HDInsight
@@ -18,15 +18,15 @@ Although Apache Hadoop is a natural choice for processing unstructured and semi-
1818
[Apache Sqoop](https://sqoop.apache.org/docs/1.99.7/user.html) is a tool designed to transfer data between Hadoop clusters and relational databases. You can use it to import data from a relational database management system (RDBMS) such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Apache Hive, and then export the data back into an RDBMS. In this article, you're using Azure SQL Database for your relational database.
1919

2020
> [!IMPORTANT]
21-
> This article sets up a test environment to perform the data transfer. You then choose a data transfer method for this environment from one of the methods in section [Run Sqoop jobs](#run-sqoop-jobs), further below.
21+
> This article sets up a test environment to perform the data transfer. You then choose a data transfer method for this environment from one of the methods in section [Run Sqoop jobs](#run-sqoop-jobs).
2222
2323
For Sqoop versions that are supported on HDInsight clusters, see [What's new in the cluster versions provided by HDInsight?](../hdinsight-component-versioning.md)
2424

2525
## Understand the scenario
2626

2727
HDInsight cluster comes with some sample data. You use the following two samples:
2828

29-
* An Apache Log4j log file, which is located at `/example/data/sample.log`. The following logs are extracted from the file:
29+
* An Apache `Log4j` log file, which is located at `/example/data/sample.log`. The following logs are extracted from the file:
3030

3131
```text
3232
2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 577725851
@@ -55,7 +55,7 @@ In this article, you use these two datasets to test Sqoop import and export.
5555

5656
## <a name="create-cluster-and-sql-database"></a>Set up test environment
5757

58-
The cluster, SQL database, and other objects are created through the Azure portal using an Azure Resource Manager template. The template can be found in [Azure quickstart templates](https://azure.microsoft.com/resources/templates/hdinsight-linux-with-sql-database/). The Resource Manager template calls a bacpac package to deploy the table schemas to a SQL database. If you want to use a private container for the bacpac files, use the following values in the template:
58+
The cluster, SQL database, and other objects are created through the Azure portal using an Azure Resource Manager template. The template can be found in [Azure quickstart templates](https://azure.microsoft.com/resources/templates/hdinsight-linux-with-sql-database/). The Resource Manager template calls a bacpac package to deploy the table schemas to an SQL database. If you want to use a private container for the bacpac files, use the following values in the template:
5959

6060
```json
6161
"storageKeyType": "Primary",
@@ -88,15 +88,15 @@ The cluster, SQL database, and other objects are created through the Azure porta
8888
|Bacpac File Name |Use the default value unless you want to use your own bacpac file.|
8989
|Location |Use the default value.|
9090

91-
The [logical SQL server](/azure/azure-sql/database/logical-servers) name will be `<ClusterName>dbserver`. The database name will be `<ClusterName>db`. The default storage account name will be `e6qhezrh2pdqu`.
91+
The [logical SQL server](/azure/azure-sql/database/logical-servers) name is `<ClusterName>dbserver`. The database name is `<ClusterName>db`. The default storage account name is `e6qhezrh2pdqu`.
9292

9393
3. Select **I agree to the terms and conditions stated above**.
9494

9595
4. Select **Purchase**. You see a new tile titled Submitting deployment for Template deployment. It takes about around 20 minutes to create the cluster and SQL database.
9696

9797
## Run Sqoop jobs
9898

99-
HDInsight can run Sqoop jobs by using a variety of methods. Use the following table to decide which method is right for you, then follow the link for a walkthrough.
99+
HDInsight can run Sqoop jobs by using various methods. Use the following table to decide which method is right for you, then follow the link for a walkthrough.
100100

101101
| **Use this** if you want... | ...an **interactive** shell | ...**batch** processing | ...from this **client operating system** |
102102
|:--- |:---:|:---:|:--- |:--- |

0 commit comments

Comments
 (0)