Skip to content

Commit 6d9c1b1

Browse files
authored
Merge pull request #101171 from dagiro/freshness176
freshness176
2 parents 155c6d3 + 454fa1f commit 6d9c1b1

File tree

1 file changed

+107
-82
lines changed

1 file changed

+107
-82
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-sqoop-dotnet-sdk.md

Lines changed: 107 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,142 @@
11
---
22
title: Run Apache Sqoop jobs by using .NET and HDInsight - Azure
3-
description: Learn how to use the HDInsight .NET SDK to run Apache Sqoop import and export between an Apache Hadoop cluster and an Azure SQL database.
4-
keywords: sqoop job
5-
ms.reviewer: jasonh
3+
description: Learn how to use the HDInsight .NET SDK to run Apache Sqoop import and export between an Apache Hadoop cluster and an Azure SQL Database.
64
author: hrasheed-msft
7-
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
87
ms.service: hdinsight
9-
ms.custom: hdinsightactive,hdiseo17may2017
108
ms.topic: conceptual
11-
ms.date: 05/16/2018
12-
ms.author: hrasheed
13-
9+
ms.custom: hdinsightactive,hdiseo17may2017
10+
ms.date: 01/14/2020
1411
---
12+
1513
# Run Apache Sqoop jobs by using .NET SDK for Apache Hadoop in HDInsight
16-
[!INCLUDE [sqoop-selector](../../../includes/hdinsight-selector-use-sqoop.md)]
1714

18-
Learn how to use the Azure HDInsight .NET SDK to run Apache Sqoop jobs in HDInsight to import and export between an HDInsight cluster and an Azure SQL database or SQL Server database.
15+
[!INCLUDE [sqoop-selector](../../../includes/hdinsight-selector-use-sqoop.md)]
1916

20-
> [!NOTE]
21-
> Although you can use the procedures in this article with either a Windows-based or Linux-based HDInsight cluster, they work only from a Windows client. To choose other methods, use the tab selector at the top of this article.
17+
Learn how to use the Azure HDInsight .NET SDK to run Apache Sqoop jobs in HDInsight to import and export between an HDInsight cluster and an Azure SQL Database or SQL Server database.
2218

2319
## Prerequisites
24-
Before you begin this article, you must have the following item:
2520

26-
* An Apache Hadoop cluster in HDInsight. For more information, see [Create a cluster and a SQL database](hdinsight-use-sqoop.md#create-cluster-and-sql-database).
21+
* Completion of [Set up test environment](./hdinsight-use-sqoop.md#create-cluster-and-sql-database) from [Use Apache Sqoop with Hadoop in HDInsight](./hdinsight-use-sqoop.md).
22+
23+
* [Visual Studio](https://visualstudio.microsoft.com/vs/community/).
24+
25+
* Familiarity with Sqoop. For more information, see [Sqoop User Guide](https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html).
2726

2827
## Use Sqoop on HDInsight clusters with the .NET SDK
29-
The HDInsight .NET SDK provides .NET client libraries, so that it's easier to work with HDInsight clusters from .NET. In this section, you create a C# console application to export the hivesampletable to the Azure SQL Database table that you created earlier in this article.
3028

31-
## Submit a Sqoop job
29+
The HDInsight .NET SDK provides .NET client libraries, so that it's easier to work with HDInsight clusters from .NET. In this section, you create a C# console application to export the `hivesampletable` to the Azure SQL Database table that you created from the prerequisites.
30+
31+
## Set up
32+
33+
1. Start Visual Studio and create a C# console application.
3234

33-
1. Create a C# console application in Visual Studio.
35+
1. Navigate to **Tools** > **NuGet Package Manager** > **Package Manager Console** and run the following command:
3436

35-
2. From the Visual Studio Package Manager console, import the package by running the following NuGet command:
36-
37-
Install-Package Microsoft.Azure.Management.HDInsight.Job
37+
```
38+
Install-Package Microsoft.Azure.Management.HDInsight.Job
39+
```
3840
39-
3. Use the following code in the Program.cs file:
40-
41-
using System.Collections.Generic;
42-
using Microsoft.Azure.Management.HDInsight.Job;
43-
using Microsoft.Azure.Management.HDInsight.Job.Models;
44-
using Hyak.Common;
45-
46-
namespace SubmitHDInsightJobDotNet
41+
## Sqoop export
42+
43+
From Hive to SQL Server. This example exports data from the Hive `hivesampletable` table to the `mobiledata` table in SQL Database.
44+
45+
1. Use the following code in the Program.cs file. Edit the code to set the values for `ExistingClusterName`, and `ExistingClusterPassword`.
46+
47+
```csharp
48+
using Microsoft.Azure.Management.HDInsight.Job;
49+
using Microsoft.Azure.Management.HDInsight.Job.Models;
50+
using Hyak.Common;
51+
52+
namespace SubmitHDInsightJobDotNet
53+
{
54+
class Program
4755
{
48-
class Program
56+
private static HDInsightJobManagementClient _hdiJobManagementClient;
57+
58+
private const string ExistingClusterName = "<Your HDInsight Cluster Name>";
59+
private const string ExistingClusterPassword = "<Cluster User Password>";
60+
private const string ExistingClusterUri = ExistingClusterName + ".azurehdinsight.net";
61+
private const string ExistingClusterUsername = "admin";
62+
63+
static void Main(string[] args)
4964
{
50-
private static HDInsightJobManagementClient _hdiJobManagementClient;
51-
52-
private const string ExistingClusterName = "<Your HDInsight Cluster Name>";
53-
private const string ExistingClusterUri = ExistingClusterName + ".azurehdinsight.net";
54-
private const string ExistingClusterUsername = "<Cluster Username>";
55-
private const string ExistingClusterPassword = "<Cluster User Password>";
56-
57-
static void Main(string[] args)
58-
{
59-
System.Console.WriteLine("The application is running ...");
60-
61-
var clusterCredentials = new BasicAuthenticationCloudCredentials { Username = ExistingClusterUsername, Password = ExistingClusterPassword };
62-
_hdiJobManagementClient = new HDInsightJobManagementClient(ExistingClusterUri, clusterCredentials);
63-
64-
SubmitSqoopJob();
65-
66-
System.Console.WriteLine("Press ENTER to continue ...");
67-
System.Console.ReadLine();
68-
}
69-
70-
private static void SubmitSqoopJob()
65+
System.Console.WriteLine("The application is running ...");
66+
67+
var clusterCredentials = new BasicAuthenticationCloudCredentials { Username = ExistingClusterUsername, Password = ExistingClusterPassword };
68+
_hdiJobManagementClient = new HDInsightJobManagementClient(ExistingClusterUri, clusterCredentials);
69+
70+
SubmitSqoopJob();
71+
72+
System.Console.WriteLine("Press ENTER to continue ...");
73+
System.Console.ReadLine();
74+
}
75+
76+
private static void SubmitSqoopJob()
77+
{
78+
var sqlDatabaseServerName = ExistingClusterName + "dbserver";
79+
var sqlDatabaseLogin = "sqluser";
80+
var sqlDatabaseLoginPassword = ExistingClusterPassword;
81+
var sqlDatabaseDatabaseName = ExistingClusterName + "db";
82+
83+
// Connection string for using Azure SQL Database; Comment if using SQL Server
84+
var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ".database.windows.net;user=" + sqlDatabaseLogin + "@" + sqlDatabaseServerName + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
85+
86+
// Connection string for using SQL Server; Uncomment if using SQL Server
87+
// var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ";user=" + sqlDatabaseLogin + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
88+
89+
//sqoop start
90+
var tableName = "mobiledata";
91+
92+
var parameters = new SqoopJobSubmissionParameters
7193
{
72-
var sqlDatabaseServerName = "<SQLDatabaseServerName>";
73-
var sqlDatabaseLogin = "<SQLDatabaseLogin>";
74-
var sqlDatabaseLoginPassword = "<SQLDatabaseLoginPassword>";
75-
var sqlDatabaseDatabaseName = "<DatabaseName>";
76-
77-
var tableName = "<TableName>";
78-
var exportDir = "/tutorials/usesqoop/data";
79-
80-
// Connection string for using Azure SQL Database.
81-
// Comment if using SQL Server
82-
var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ".database.windows.net;user=" + sqlDatabaseLogin + "@" + sqlDatabaseServerName + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
83-
// Connection string for using SQL Server.
84-
// Uncomment if using SQL Server
85-
//var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ";user=" + sqlDatabaseLogin + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
86-
87-
var parameters = new SqoopJobSubmissionParameters
88-
{
89-
Files = new List<string> { "/user/oozie/share/lib/sqoop/sqljdbc41.jar" }, // This line is required for Linux-based cluster.
90-
Command = "export --connect " + connectionString + " --table " + tableName + "_mobile --export-dir " + exportDir + "_mobile --fields-terminated-by \\t -m 1"
91-
};
92-
93-
System.Console.WriteLine("Submitting the Sqoop job to the cluster...");
94-
var response = _hdiJobManagementClient.JobManagement.SubmitSqoopJob(parameters);
95-
System.Console.WriteLine("Validating that the response is as expected...");
96-
System.Console.WriteLine("Response status code is " + response.StatusCode);
97-
System.Console.WriteLine("Validating the response object...");
98-
System.Console.WriteLine("JobId is " + response.JobSubmissionJsonResponse.Id);
99-
}
94+
Command = "export --connect " + connectionString + " --table " + tableName + " --hcatalog-table hivesampletable"
95+
};
96+
//sqoop end
97+
98+
System.Console.WriteLine("Submitting the Sqoop job to the cluster...");
99+
var response = _hdiJobManagementClient.JobManagement.SubmitSqoopJob(parameters);
100+
System.Console.WriteLine("Validating that the response is as expected...");
101+
System.Console.WriteLine("Response status code is " + response.StatusCode);
102+
System.Console.WriteLine("Validating the response object...");
103+
System.Console.WriteLine("JobId is " + response.JobSubmissionJsonResponse.Id);
100104
}
101105
}
106+
}
107+
```
108+
109+
1. To run the program, select the **F5** key.
102110
103-
4. To run the program, select the **F5** key.
111+
## Sqoop import
112+
113+
From SQL Server to Azure Storage. This example is dependent on the above export having been performed. This example imports data from the `mobiledata` table in SQL Database to the `wasb:///tutorials/usesqoop/importeddata` directory on the cluster's default Storage Account.
114+
115+
1. Replace the code above in the `//sqoop start //sqoop end` block with the following code:
116+
117+
```csharp
118+
var tableName = "mobiledata";
119+
var exportDir = "/tutorials/usesqoop/importeddata";
120+
121+
var parameters = new SqoopJobSubmissionParameters
122+
{
123+
Command = "import --connect " + connectionString + " --table " + tableName + " --target-dir " + exportDir + " --fields-terminated-by \\t --lines-terminated-by \\n -m 1"
124+
};
125+
```
126+
127+
1. To run the program, select the **F5** key.
104128
105129
## Limitations
130+
106131
Linux-based HDInsight presents the following limitations:
107132
108-
* Bulk export: The Sqoop connector that's used to export data to Microsoft SQL Server or Azure SQL Database does not currently support bulk inserts.
133+
* Bulk export: The Sqoop connector that's used to export data to Microsoft SQL Server or Azure SQL Database doesn't currently support bulk inserts.
109134
110135
* Batching: By using the `-batch` switch, Sqoop performs multiple inserts instead of batching the insert operations.
111136
112137
## Next steps
113-
Now you have learned how to use Sqoop. To learn more, see:
138+
139+
Now you've learned how to use Sqoop. To learn more, see:
114140
115141
* [Use Apache Oozie with HDInsight](../hdinsight-use-oozie-linux-mac.md): Use Sqoop action in an Oozie workflow.
116142
* [Upload data to HDInsight](../hdinsight-upload-data.md): Find other methods for uploading data to HDInsight or Azure Blob storage.
117-

0 commit comments

Comments
 (0)