You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-use-oozie-linux-mac.md
+17-24Lines changed: 17 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: omidm
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 10/30/2019
9
+
ms.date: 04/23/2020
10
10
---
11
11
12
12
# Use Apache Oozie with Apache Hadoop to define and run a workflow on Linux-based Azure HDInsight
@@ -21,7 +21,7 @@ Learn how to use Apache Oozie with Apache Hadoop on Azure HDInsight. Oozie is a
21
21
You can also use Oozie to schedule jobs that are specific to a system, like Java programs or shell scripts.
22
22
23
23
> [!NOTE]
24
-
> Another option to define workflows with HDInsight is to use Azure Data Factory. To learn more about Data Factory, see [Use Apache Pig and Apache Hive with Data Factory][azure-data-factory-pig-hive]. To use Oozie on clusters with Enterprise Security Package please see [Run Apache Oozie in HDInsight Hadoop clusters with Enterprise Security Package](domain-joined/hdinsight-use-oozie-domain-joined-clusters.md).
24
+
> Another option to define workflows with HDInsight is to use Azure Data Factory. To learn more about Data Factory, see [Use Apache Pig and Apache Hive with Data Factory](../data-factory/transform-data.md). To use Oozie on clusters with Enterprise Security Package please see [Run Apache Oozie in HDInsight Hadoop clusters with Enterprise Security Package](domain-joined/hdinsight-use-oozie-domain-joined-clusters.md).
25
25
26
26
## Prerequisites
27
27
@@ -31,7 +31,7 @@ You can also use Oozie to schedule jobs that are specific to a system, like Java
31
31
32
32
***An Azure SQL Database**. See [Create an Azure SQL database in the Azure portal](../sql-database/sql-database-get-started.md). This article uses a database named **oozietest**.
33
33
34
-
* The [URI scheme](./hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be`wasb://` for Azure Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Azure Storage, the URI would be `wasbs://`. See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
34
+
* The URI scheme for your clusters primary storage. `wasb://` for Azure Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Azure Storage, the URI would be `wasbs://`. See also, [secure transfer](../storage/common/storage-require-secure-transfer.md).
35
35
36
36
## Example workflow
37
37
@@ -49,10 +49,10 @@ The workflow used in this document contains two actions. Actions are definitions
49
49
50
50
For more information about Hive, see [Use Apache Hive with HDInsight][hdinsight-use-hive].
51
51
52
-
2. A Sqoop action exports the contents of the new Hive table to a table created in Azure SQL Database. For more information about Sqoop, see [Use Apache Sqoop with HDInsight][hdinsight-use-sqoop].
52
+
2. A Sqoop action exports the contents of the new Hive table to a table created in Azure SQL Database. For more information about Sqoop, see [Use Apache Sqoop with HDInsight](hadoop/apache-hadoop-use-sqoop-mac-linux.md).
53
53
54
54
> [!NOTE]
55
-
> For supported Oozie versions on HDInsight clusters, see [What's new in the Hadoop cluster versions provided by HDInsight][hdinsight-versions].
55
+
> For supported Oozie versions on HDInsight clusters, see [What's new in the Hadoop cluster versions provided by HDInsight](hdinsight-component-versioning.md).
56
56
57
57
## Create the working directory
58
58
@@ -84,7 +84,7 @@ Oozie expects you to store all the resources required for a job in the same dire
84
84
85
85
## Add a database driver
86
86
87
-
Because this workflow uses Sqoop to export data to the SQL database, you must provide a copy of the JDBC driver used to interact with the SQL database. To copy the JDBC driver to the working directory, use the following command from the SSH session:
87
+
This workflow uses Sqoop to export data to the SQL database. So you must provide a copy of the JDBC driver used to interact with the SQL database. To copy the JDBC driver to the working directory, use the following command from the SSH session:
@@ -269,7 +269,7 @@ Oozie workflow definitions are written in Hadoop Process Definition Language (hP
269
269
270
270
## Create the job definition
271
271
272
-
The job definition describes where to find the workflow.xml. It also describes where to find other files used by the workflow, such as `useooziewf.hql`. In addition, it defines the values for properties used within the workflow and the associated files.
272
+
The job definition describes where to find the workflow.xml. It also describes where to find other files used by the workflow, such as `useooziewf.hql`. Also, it defines the values for properties used within the workflow and the associated files.
273
273
274
274
1. To get the full address of the default storage, use the following command. This address is used in the configuration file you create in the next step.
275
275
@@ -400,7 +400,7 @@ The following steps use the Oozie command to submit and manage Oozie workflows o
400
400
export OOZIE_URL=http://HOSTNAMEt:11000/oozie
401
401
```
402
402
403
-
3. To submit the job, use the following:
403
+
3. To submit the job, use the following code:
404
404
405
405
```bash
406
406
oozie job -config job.xml -submit
@@ -471,7 +471,7 @@ For more information on the Oozie command, see [Apache Oozie command-line tool](
471
471
472
472
## Oozie REST API
473
473
474
-
With the Oozie REST API, you can build your own tools that work with Oozie. The following is HDInsight-specific information about the use of the Oozie REST API:
474
+
With the Oozie REST API, you can build your own tools that work with Oozie. The following HDInsight-specific information about the use of the Oozie REST API:
475
475
476
476
* **URI**: You can access the REST API from outside the cluster at `https://CLUSTERNAME.azurehdinsight.net/oozie`.
477
477
@@ -521,7 +521,7 @@ To access the Oozie web UI, complete the following steps:
521
521
522
522
* **Job DAG**: The DAG is a graphical overview of the data paths taken through the workflow.
7. If you select one of the actions from the **Job Info** tab, it brings up information for the action. For example, select the **RunSqoopExport** action.
527
527
@@ -648,9 +648,9 @@ With the Oozie UI, you can view Oozie logs. The Oozie UI also contains links to
648
648
649
649
3. If available, use the URL from the action to view more details, such as the JobTracker logs, for the action.
650
650
651
-
The following are specific errors you might encounter and how to resolve them.
651
+
The following are specific errors you might come across and how to resolve them.
652
652
653
-
### JA009: Cannot initialize cluster
653
+
### JA009: Can't initialize cluster
654
654
655
655
**Symptoms**: The job status changes to **SUSPENDED**. Details for the job show the `RunHiveScript` status as **START_MANUAL**. Selecting the action displays the following error message:
656
656
@@ -660,15 +660,15 @@ The following are specific errors you might encounter and how to resolve them.
660
660
661
661
**Resolution**: Change the Blob storage addresses that the job uses.
662
662
663
-
### JA002: Oozie is not allowed to impersonate <USER>
663
+
### JA002: Oozie isn't allowed to impersonate <USER>
664
664
665
665
**Symptoms**: The job status changes to **SUSPENDED**. Details for the job show the `RunHiveScript` status as **START_MANUAL**. If you selectthe action, it shows the following error message:
666
666
667
667
JA002: User: oozie is not allowed to impersonate <USER>
668
668
669
669
**Cause**: The current permission settings don't allow Oozie to impersonate the specified user account.
670
670
671
-
**Resolution**: Oozie can impersonate users in the **users** group. Use the `groups USERNAME` to see the groups that the user account is a member of. If the user isn't a member of the **users** group, use the following command to add the user to the group:
671
+
**Resolution**: Oozie can impersonate users in the **`users`** group. Use the `groups USERNAME` to see the groups that the user account is a member of. If the user isn't a member of the **`users`** group, use the following command to add the user to the group:
672
672
673
673
sudo adduser USERNAME users
674
674
@@ -703,13 +703,6 @@ For example, for the job in this document, you would use the following steps:
703
703
704
704
In this article, you learned how to define an Oozie workflow and how to run an Oozie job. To learn more about how to work with HDInsight, see the following articles:
705
705
706
-
* [Upload data for Apache Hadoop jobs in HDInsight][hdinsight-upload-data]
707
-
* [Use Apache Sqoop with Apache Hadoop in HDInsight][hdinsight-use-sqoop]
708
-
* [Use Apache Hive with Apache Hadoop on HDInsight][hdinsight-use-hive]
709
-
* [Develop Java MapReduce programs for HDInsight](hadoop/apache-hadoop-develop-deploy-java-mapreduce-linux.md)
0 commit comments