Skip to content

Commit 833ff86

Browse files
Merge pull request #251606 from v-akarnase/patch-18
Update python-udf-hdinsight.md
2 parents ee25a00 + a75bcc9 commit 833ff86

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

articles/hdinsight/hadoop/python-udf-hdinsight.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Python UDF with Apache Hive and Apache Pig - Azure HDInsight
33
description: Learn how to use Python User Defined Functions (UDF) from Apache Hive and Apache Pig in HDInsight, the Apache Hadoop technology stack on Azure.
44
ms.service: hdinsight
55
ms.topic: how-to
6-
ms.date: 08/21/2022
6+
ms.date: 09/15/2023
77
ms.custom: H1Hack27Feb2017,hdinsightactive, devx-track-python, devx-track-azurepowershell
88
---
99

@@ -13,7 +13,7 @@ Learn how to use Python user-defined functions (UDF) with Apache Hive and Apache
1313

1414
## <a name="python"></a>Python on HDInsight
1515

16-
Python2.7 is installed by default on HDInsight 3.0 and later. Apache Hive can be used with this version of Python for stream processing. Stream processing uses STDOUT and STDIN to pass data between Hive and the UDF.
16+
`Python2.7` is installed by default on HDInsight 3.0 and later. Apache Hive can be used with this version of Python for stream processing. Stream processing uses STDOUT and STDIN to pass data between Hive and the UDF.
1717

1818
HDInsight also includes Jython, which is a Python implementation written in Java. Jython runs directly on the Java Virtual Machine and doesn't use streaming. Jython is the recommended Python interpreter when using Python with Pig.
1919

@@ -23,14 +23,14 @@ HDInsight also includes Jython, which is a Python implementation written in Java
2323
* **An SSH client**. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
2424
* The [URI scheme](../hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be `wasb://` for Azure Storage, `abfs://` for Azure Data Lake Storage Gen2 or adl:// for Azure Data Lake Storage Gen1. If secure transfer is enabled for Azure Storage, the URI would be wasbs://. See also, [secure transfer](../../storage/common/storage-require-secure-transfer.md).
2525
* **Possible change to storage configuration.** See [Storage configuration](#storage-configuration) if using storage account kind `BlobStorage`.
26-
* Optional. If Planning to use PowerShell, you'll need the [AZ module](/powershell/azure/new-azureps-module-az) installed.
26+
* Optional. If planning to use PowerShell, you need the [AZ module](/powershell/azure/new-azureps-module-az) installed.
2727

2828
> [!NOTE]
2929
> The storage account used in this article was Azure Storage with [secure transfer](../../storage/common/storage-require-secure-transfer.md) enabled and thus `wasbs` is used throughout the article.
3030
3131
## Storage configuration
3232

33-
No action is required if the storage account used is of kind `Storage (general purpose v1)` or `StorageV2 (general purpose v2)`. The process in this article will produce output to at least `/tezstaging`. A default hadoop configuration will contain `/tezstaging` in the `fs.azure.page.blob.dir` configuration variable in `core-site.xml` for service `HDFS`. This configuration will cause output to the directory to be page blobs, which aren't supported for storage account kind `BlobStorage`. To use `BlobStorage` for this article, remove `/tezstaging` from the `fs.azure.page.blob.dir` configuration variable. The configuration can be accessed from the [Ambari UI](../hdinsight-hadoop-manage-ambari.md). Otherwise, you'll receive the error message: `Page blob is not supported for this account type.`
33+
No action is required if the storage account used is of kind `Storage (general purpose v1)` or `StorageV2 (general purpose v2)`. The process in this article produces output to at least `/tezstaging`. A default hadoop configuration contains `/tezstaging` in the `fs.azure.page.blob.dir` configuration variable in `core-site.xml` for service `HDFS`. This configuration causes output to the directory to be page blobs, which aren't supported for storage account kind `BlobStorage`. To use `BlobStorage` for this article, remove `/tezstaging` from the `fs.azure.page.blob.dir` configuration variable. The configuration can be accessed from the [Ambari UI](../hdinsight-hadoop-manage-ambari.md). Otherwise, you receive the error message: `Page blob is not supported for this account type.`
3434

3535
> [!WARNING]
3636
> The steps in this document make the following assumptions:
@@ -99,15 +99,15 @@ The script output is a concatenation of the input values for `devicemake` and `d
9999

100100
### Upload file (shell)
101101

102-
In the commands below, replace `sshuser` with the actual username if different. Replace `mycluster` with the actual cluster name. Ensure your working directory is where the file is located.
102+
The following command, replaces `sshuser` with the actual username if different. Replace `mycluster` with the actual cluster name. Ensure your working directory is where the file is located.
103103

104-
1. Use `scp` to copy the files to your HDInsight cluster. Edit and enter the command below:
104+
1. Use `scp` to copy the files to your HDInsight cluster. Edit and enter the command:
105105

106106
```cmd
107107
scp hiveudf.py [email protected]:
108108
```
109109
110-
2. Use SSH to connect to the cluster. Edit and enter the command below:
110+
2. Use SSH to connect to the cluster. Edit and enter the command:
111111
112112
```cmd
113113
@@ -140,7 +140,7 @@ In the commands below, replace `sshuser` with the actual username if different.
140140
ORDER BY clientid LIMIT 50;
141141
```
142142

143-
3. After entering the last line, the job should start. Once the job completes, it returns output similar to the following example:
143+
3. Once the last line entered, the job should start. Once the job completes, it returns output similar to the following example:
144144

145145
```output
146146
100041 RIM 9650 d476f3687700442549a83fac4560c51c
@@ -357,13 +357,13 @@ When the data is returned to Pig, it has a consistent schema as defined in the `
357357

358358
In the commands below, replace `sshuser` with the actual username if different. Replace `mycluster` with the actual cluster name. Ensure your working directory is where the file is located.
359359

360-
1. Use `scp` to copy the files to your HDInsight cluster. Edit and enter the command below:
360+
1. Use `scp` to copy the files to your HDInsight cluster. Edit and enter the command:
361361

362362
```cmd
363363
scp pigudf.py [email protected]:
364364
```
365365
366-
2. Use SSH to connect to the cluster. Edit and enter the command below:
366+
2. Use SSH to connect to the cluster. Edit and enter the command:
367367
368368
```cmd
369369

0 commit comments

Comments
 (0)