You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/domain-joined/apache-domain-joined-create-configure-enterprise-security-cluster.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how to create and configure Enterprise Security Package clust
4
4
services: hdinsight
5
5
ms.service: hdinsight
6
6
ms.topic: how-to
7
-
ms.date: 06/22/2023
7
+
ms.date: 06/14/2024
8
8
ms.custom: devx-track-azurepowershell
9
9
---
10
10
@@ -164,7 +164,7 @@ Create an Active Directory tenant administrator.
164
164
165
165
**Groups and roles**
166
166
1. Select **0 groups selected**.
167
-
1. Select **AAD DC Administrators**, and then **Select**.
167
+
1. Select **`AAD DC` Administrators**, and then **Select**.
168
168
169
169
:::image type="content" source="./media/apache-domain-joined-create-configure-enterprise-security-cluster/azure-ad-add-group-member.png" alt-text="The Microsoft Entra groups dialog box." border="true":::
170
170
@@ -266,7 +266,7 @@ Follow these steps to enable Microsoft Entra Domain Services. For more informati
1. On the **Administrator group** page, you should see a notification that a group named **AAD DC Administrators** has already been created to administer this group. You can modify the membership of this group if you want to, but in this case you don't need to change it. Select **OK**.
269
+
1. On the **Administrator group** page, you should see a notification that a group named **`AAD DC` Administrators** has already been created to administer this group. You can modify the membership of this group if you want to, but in this case you don't need to change it. Select **OK**.
270
270
271
271
:::image type="content" source="./media/apache-domain-joined-create-configure-enterprise-security-cluster/hdinsight-image-0088.png" alt-text="View the Microsoft Entra administrator group." border="true":::
# Use C# user-defined functions with Apache Hive and Apache Pig on Apache Hadoop in HDInsight
@@ -14,7 +14,7 @@ Learn how to use C# user-defined functions (UDF) with [Apache Hive](https://hive
14
14
> [!IMPORTANT]
15
15
> The steps in this document work with Linux-based HDInsight clusters. Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see [HDInsight component versioning](../hdinsight-component-versioning.md).
16
16
17
-
Both Hive and Pig can pass data to external applications for processing. This process is known as _streaming_. When using a .NET application, the data is passed to the application on STDIN, and the application returns the results on STDOUT. To read and write from STDIN and STDOUT, you can use `Console.ReadLine()` and `Console.WriteLine()` from a console application.
17
+
Both Hive and Pig can pass data to external applications for processing. This process is known as _streaming_. When you use a .NET application, the data is passed to the application on STDIN, and the application returns the results on STDOUT. To read and write from STDIN and STDOUT, you can use `Console.ReadLine()` and `Console.WriteLine()` from a console application.
18
18
19
19
## Prerequisites
20
20
@@ -178,7 +178,7 @@ Next, upload the Hive and Pig UDF applications to storage on a HDInsight cluster
@@ -249,7 +249,7 @@ You can also run a Pig job that uses your Pig UDF application.
249
249
> [!NOTE]
250
250
> The application name that is used for streaming must be surrounded by the `` ` `` (backtick) characterwhenaliased, andbythe `'` (single quote) character when used with `SHIP`.
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-data-migration.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn data migration best practices for migrating on-premises Hadoo
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
6
ms.custom: hdinsightactive
7
-
ms.date: 06/22/2023
7
+
ms.date: 06/14/2024
8
8
---
9
9
10
10
# Migrate on-premises Apache Hadoop clusters to Azure HDInsight - data migration best practices
@@ -52,7 +52,7 @@ DistCp tries to create map tasks so that each one copies roughly the same number
52
52
53
53
* DistCp's lowest granularity is a single file. Specifying a number of Mappers more than the number of source files doesn't help and will waste the available cluster resources.
54
54
55
-
* Consider the available Yarn memory on the cluster to determine the number of Mappers. Each Map task is launched as a Yarn container. Assuming that no other heavy workloads are running on the cluster, the number of Mappers can be determined by the following formula: m = (number of worker nodes \* YARN memory for each worker node) / YARN container size. However, If other applications are using memory, then choose to only use a portion of YARN memory for DistCp jobs.
55
+
* Consider the available Yarn memory on the cluster to determine the number of Mappers. Each Map task is launched as a Yarn container. Assuming that no other heavy workloads are running on the cluster, the number of Mappers can be determined by the following formula: m = (number of worker nodes \* YARN memories for each worker node) / YARN container size. However, If other applications are using memory, then choose to only use a portion of YARN memory for DistCp jobs.
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/connect-install-beeline.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Connect to HiveServer2 using Beeline or install Beeline locally to connec
3
3
description: Learn how to connect to the Apache Beeline client to run Hive queries with Hadoop on HDInsight. Beeline is a utility for working with HiveServer2 over JDBC.
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
-
ms.date: 06/12/2023
6
+
ms.date: 06/14/2024
7
7
---
8
8
# Connect to HiveServer2 using Beeline or install Beeline locally to connect from your local
9
9
@@ -13,25 +13,25 @@ ms.date: 06/12/2023
13
13
14
14
### From an SSH session
15
15
16
-
When connecting from an SSH session to a cluster headnode, you can then connect to the `headnodehost` address on port `10001`:
16
+
When you connect from an SSH session to a cluster headnode, you can then connect to the `headnodehost` address on port `10001`:
When connecting from a client to HDInsight over an Azure Virtual Network, you must provide the fully qualified domain name (FQDN) of a cluster head node. Since this connection is made directly to the cluster nodes, the connection uses port `10001`:
24
+
When you connect from a client to HDInsight over an Azure Virtual Network, you must provide the fully qualified domain name (FQDN) of a cluster head node. Since this connection is made directly to the cluster nodes, the connection uses port `10001`:
Replace `<headnode-FQDN>` with the fully qualified domain name of a cluster headnode. To find the fully qualified domain name of a headnode, use the information in the [Manage HDInsight using the Apache Ambari REST API](../hdinsight-hadoop-manage-ambari-rest-api.md#get-the-fqdn-of-cluster-nodes) document.
30
+
Replace `<headnode-FQDN>` with the fully qualified domain name of a cluster headnode. To find the fully qualified domain name of a headnode, use the information in the [Managed HDInsight using the Apache Ambari REST API](../hdinsight-hadoop-manage-ambari-rest-api.md#get-the-fqdn-of-cluster-nodes) document.
31
31
32
32
### To HDInsight Enterprise Security Package (ESP) cluster using Kerberos
33
33
34
-
When connecting from a client to an Enterprise Security Package (ESP) cluster joined to Microsoft Entra Domain Services on a machine in same realm of the cluster, you must also specify the domain name `<AAD-Domain>` and the name of a domain user account with permissions to access the cluster `<username>`:
34
+
When you connect from a client to an Enterprise Security Package (ESP) cluster joined to Microsoft Entra Domain Services on a machine in same realm of the cluster, you must also specify the domain name `<AAD-Domain>` and the name of a domain user account with permissions to access the cluster `<username>`:
35
35
36
36
```bash
37
37
kinit <username>
@@ -48,15 +48,15 @@ To find the JDBC URL from Ambari:
48
48
49
49
### Over public or private endpoints
50
50
51
-
When connecting to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using TLS/SSL.
51
+
When you connect to a cluster using the public or private endpoints, you must provide the cluster login account name (default `admin`) and password. For example, using Beeline from a client system to connect to the `clustername.azurehdinsight.net` address. This connection is made over port `443`, and is encrypted using TLS/SSL.
52
52
53
53
Replace `clustername` with the name of your HDInsight cluster. Replace `admin` with the cluster login account for your cluster. For ESP clusters, use the full UPN (for example, [email protected]). Replace `password` with the password for the cluster login account.
@@ -88,7 +88,7 @@ Private endpoints point to a basic load balancer, which can only be accessed fro
88
88
89
89
#### From cluster head node or inside Azure Virtual Network with Apache Spark
90
90
91
-
When connecting directly from the cluster head node, or from a resource inside the same Azure Virtual Network as the HDInsight cluster, port `10002` should be used for Spark Thrift server instead of `10001`. The following example shows how to connect directly to the head node:
91
+
When you connect directly from the cluster head node, or from a resource inside the same Azure Virtual Network as the HDInsight cluster, port `10002` should be used for Spark Thrift server instead of `10001`. The following example shows how to connect directly to the head node:
0 commit comments