You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Secure Apache Oozie workflows using the Azure HDInsight Enterprise Security Package. Learn how to define an Oozie workflow and submit an Oozie job.
4
-
ms.service: hdinsight
5
4
author: omidm1
6
5
ms.author: omidm
7
6
ms.reviewer: jasonh
8
-
ms.custom: hdinsightactive,seodec18
7
+
ms.service: hdinsight
9
8
ms.topic: conceptual
10
-
ms.date: 02/15/2019
9
+
ms.custom: hdinsightactive,seodec18
10
+
ms.date: 12/09/2019
11
11
---
12
12
13
13
# Run Apache Oozie in HDInsight Hadoop clusters with Enterprise Security Package
14
14
15
15
Apache Oozie is a workflow and coordination system that manages Apache Hadoop jobs. Oozie is integrated with the Hadoop stack, and it supports the following jobs:
16
+
16
17
- Apache MapReduce
17
18
- Apache Pig
18
19
- Apache Hive
@@ -22,51 +23,58 @@ You can also use Oozie to schedule jobs that are specific to a system, like Java
22
23
23
24
## Prerequisite
24
25
25
-
-An Azure HDInsight Hadoop cluster with Enterprise Security Package (ESP). See [Configure HDInsight clusters with ESP](./apache-domain-joined-configure-using-azure-adds.md).
26
+
An Azure HDInsight Hadoop cluster with Enterprise Security Package (ESP). See [Configure HDInsight clusters with ESP](./apache-domain-joined-configure-using-azure-adds.md).
26
27
27
-
> [!NOTE]
28
-
> For detailed instructions on using Oozie on non-ESP clusters, see [Use Apache Oozie workflows in Linux-based Azure HDInsight](../hdinsight-use-oozie-linux-mac.md).
28
+
> [!NOTE]
29
+
> For detailed instructions on how to use Oozie on non-ESP clusters, see [Use Apache Oozie workflows in Linux-based Azure HDInsight](../hdinsight-use-oozie-linux-mac.md).
29
30
30
31
## Connect to an ESP cluster
31
32
32
33
For more information on Secure Shell (SSH), see [Connect to HDInsight (Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
A status response code of **200 OK** indicates successful registration. Check the username and password if an unauthorized response is received, such as 401.
47
50
48
51
## Define the workflow
52
+
49
53
Oozie workflow definitions are written in Apache Hadoop Process Definition Language (hPDL). hPDL is an XML process definition language. Take the following steps to define the workflow:
@@ -161,19 +169,21 @@ Oozie workflow definitions are written in Apache Hadoop Process Definition Langu
161
169
</kill>
162
170
<end name="end" />
163
171
</workflow-app>
164
-
```
165
-
4. Replace `clustername` with the name of the cluster.
172
+
```
173
+
174
+
4. Replace `clustername` with the name of the cluster.
166
175
167
-
5. To save the file, selectCtrl+X. Enter `Y`. Then select**Enter**.
176
+
5. To save the file, select**Ctrl+X**. Enter **Y**. Then select**Enter**.
168
177
169
178
The workflow is divided into two parts:
170
-
***Credential section.** This section takes in the credentials that are used for authenticating Oozie actions:
179
+
180
+
- **Credential.** This section takes in the credentials that are used for authenticating Oozie actions:
171
181
172
182
This example uses authentication for Hive actions. To learn more, see [Action Authentication](https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html).
173
183
174
184
The credential service allows Oozie actions to impersonate the user for accessing Hadoop services.
175
185
176
-
***Action section.** This section has three actions: map-reduce, Hive server 2, and Hive server 1:
186
+
-**Action.** This section has three actions: map-reduce, Hive server 2, and Hive server 1:
177
187
178
188
- The map-reduce action runs an example from an Oozie package for map-reduce that outputs the aggregated word count.
179
189
@@ -182,43 +192,44 @@ Oozie workflow definitions are written in Apache Hadoop Process Definition Langu
182
192
The Hive actions use the credentials defined in the credentials section forauthentication by using the keyword `cred`in the action element.
183
193
184
194
6. Use the following command to copy the `workflow.xml` file to `/user/<domainuser>/examples/apps/map-reduce/workflow.xml`:
* Use the `adl://home` URI for the `nameNode` property if you have Azure Data Lake Storage Gen1 as your primary cluster storage. If you are using Azure Blob Storage, then change this to `wasb://home`. If you are using Azure Data Lake Storage Gen2, then change this to `abfs://home`.
218
-
* Replace `domainuser` with your username for the domain.
219
-
* Replace `ClusterShortName` with the short name for the cluster. For example, if the cluster name is https:// *[example link]* sechadoopcontoso.azurehdisnight.net, the `clustershortname` is the first six characters of the cluster: **sechad**.
220
-
* Replace `jdbcurlvalue` with the JDBC URL from the Hive configuration. An example is jdbc:hive2://headnodehost:10001/;transportMode=http.
221
-
* To save the file, selectCtrl+X, enter `Y`, and thenselect**Enter**.
228
+
- Use the `adl://home` URI for the `nameNode` property if you have Azure Data Lake Storage Gen1 as your primary cluster storage. If you're using Azure Blob Storage, then change this to `wasb://home`. If you're using Azure Data Lake Storage Gen2, then change this to `abfs://home`.
229
+
- Replace `domainuser` with your username for the domain.
230
+
- Replace `ClusterShortName` with the short name for the cluster. For example, if the cluster name is https:// *[example link]* sechadoopcontoso.azurehdisnight.net, the `clustershortname` is the first six characters of the cluster: **sechad**.
231
+
- Replace `jdbcurlvalue` with the JDBC URL from the Hive configuration. An example is jdbc:hive2://headnodehost:10001/;transportMode=http.
232
+
- To save the file, selectCtrl+X, enter `Y`, and thenselect**Enter**.
222
233
223
234
This properties file needs to be present locally when running Oozie jobs.
224
235
@@ -228,38 +239,44 @@ You can create the two Hive scripts for Hive server 1 and Hive server 2 as shown
228
239
229
240
### Hive server 1 file
230
241
231
-
1. Create and edit a file for Hive server 1 actions:
242
+
1. Create and edit a file for Hive server 1 actions:
@@ -271,39 +288,38 @@ Submitting Oozie jobs for ESP clusters is like submitting Oozie jobs in non-ESP
271
288
For more information, see [Use Apache Oozie with Apache Hadoop to define and run a workflow on Linux-based Azure HDInsight](../hdinsight-use-oozie-linux-mac.md).
272
289
273
290
## Results from an Oozie job submission
274
-
Oozie jobs are run for the user. So both Apache Hadoop YARN and Apache Ranger audit logs show the jobs being run as the impersonated user. The command-line interface output of an Oozie job looks like the following code:
275
-
276
291
292
+
Oozie jobs are run for the user. So both Apache Hadoop YARN and Apache Ranger audit logs show the jobs being run as the impersonated user. The command-line interface output of an Oozie job looks like the following code:
The Ranger audit logs for Hive server 2 actions show Oozie running the action for the user. The Ranger and YARN views are visible only to the cluster admin.
@@ -325,5 +341,6 @@ The Oozie web UI provides a web-based view into the status of Oozie jobs on the
325
341
2. Follow the [Oozie web UI](../hdinsight-use-oozie-linux-mac.md) steps to enable SSH tunneling to the edge node and access the web UI.
326
342
327
343
## Next steps
328
-
* [Use Apache Oozie with Apache Hadoop to define and run a workflow on Linux-based Azure HDInsight](../hdinsight-use-oozie-linux-mac.md).
329
-
* [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md#domainjoined).
344
+
345
+
- [Use Apache Oozie with Apache Hadoop to define and run a workflow on Linux-based Azure HDInsight](../hdinsight-use-oozie-linux-mac.md).
346
+
- [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md#domainjoined).
0 commit comments