Skip to content

Commit 400fb4b

Browse files
author
Sreekanth Iyer (Ushta Te Consultancy Services)
committed
Improved Correctness Score
1 parent 8f59847 commit 400fb4b

7 files changed

+23
-23
lines changed

articles/hdinsight/hdinsight-hadoop-create-linux-clusters-azure-powershell.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,23 +26,23 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
2626

2727
[!INCLUDE [delete-cluster-warning](includes/hdinsight-delete-cluster-warning.md)]
2828

29-
To create an HDInsight cluster by using Azure PowerShell, you must complete the following procedures:
29+
To create a HDInsight cluster by using Azure PowerShell, you must complete the following procedures:
3030

3131
* Create an Azure resource group
3232
* Create an Azure Storage account
3333
* Create an Azure Blob container
34-
* Create an HDInsight cluster
34+
* Create a HDInsight cluster
3535

3636
> [!NOTE]
37-
> Using PowerShell to create an HDInsight cluster with Azure Data Lake Storage Gen2 is not currently supported.
37+
> Using PowerShell to create a HDInsight cluster with Azure Data Lake Storage Gen2 is not currently supported.
3838
3939
The following script demonstrates how to create a new cluster:
4040

41-
[!code-powershell[main](../../azure_powershell_scripts/hdinsight/create-cluster/create-cluster.ps1?range=5-74)]
41+
[!Code-powershell[main](../../azure_powershell_scripts/hdinsight/create-cluster/create-cluster.ps1?range=5-74)]
4242

4343
The values you specify for the cluster login are used to create the Hadoop user account for the cluster. Use this account to connect to services hosted on the cluster such as web UIs or REST APIs.
4444

45-
The values you specify for the SSH user are used to create the SSH user for the cluster. Use this account to start a remote SSH session on the cluster and run jobs. For more information, see the [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md) document.
45+
The values you specify for the SSH user are used to create the SSH user for the cluster. Use this account to start a remote SSH session on the cluster and run jobs. For more information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md) document.
4646

4747
> [!IMPORTANT]
4848
> If you plan to use more than 32 worker nodes (either at cluster creation or by scaling the cluster after creation), you must also specify a head node size with at least 8 cores and 14 GB of RAM.
@@ -53,7 +53,7 @@ It can take up to 20 minutes to create a cluster.
5353

5454
## Create cluster: Configuration object
5555

56-
You can also create an HDInsight configuration object using [`New-AzHDInsightClusterConfig`](/powershell/module/az.hdinsight/new-azhdinsightclusterconfig) cmdlet. You can then modify this configuration object to enable additional configuration options for your cluster. Finally, use the `-Config` parameter of the [`New-AzHDInsightCluster`](/powershell/module/az.hdinsight/new-azhdinsightcluster) cmdlet to use the configuration.
56+
You can also create a HDInsight configuration object using [`New-AzHDInsightClusterConfig`](/powershell/module/az.hdinsight/new-azhdinsightclusterconfig) cmdlet. You can then modify this configuration object to enable additional configuration options for your cluster. Finally, use the `-Config` parameter of the [`New-AzHDInsightCluster`](/powershell/module/az.hdinsight/new-azhdinsightcluster) cmdlet to use the configuration.
5757

5858
## Customize clusters
5959

@@ -70,7 +70,7 @@ If you run into issues with creating HDInsight clusters, see [access control req
7070

7171
## Next steps
7272

73-
Now that you've successfully created an HDInsight cluster, use the following resources to learn how to work with your cluster.
73+
Now that you've successfully created a HDInsight cluster, use the following resources to learn how to work with your cluster.
7474

7575
### Apache Hadoop clusters
7676

articles/hdinsight/hdinsight-log-management.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
2-
title: Manage logs for an HDInsight cluster - Azure HDInsight
2+
title: Manage logs for a HDInsight cluster - Azure HDInsight
33
description: Determine the types, sizes, and retention policies for HDInsight activity log files.
44
ms.service: azure-hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive
77
ms.date: 01/02/2025
88
---
99

10-
# Manage logs for an HDInsight cluster
10+
# Manage logs for a HDInsight cluster
1111

12-
An HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
12+
a HDInsight cluster produces various log files. For example, Apache Hadoop and related services, such as Apache Spark, produce detailed job execution logs. Log file management is part of maintaining a healthy HDInsight cluster. There can also be regulatory requirements for log archiving. Due to the number and size of log files, optimizing log storage and archiving helps with service cost management.
1313

1414
Managing HDInsight cluster logs includes retaining information about all aspects of the cluster environment. This information includes all associated Azure Service logs, cluster configuration, job execution information, any error states, and other data as needed.
1515

@@ -157,7 +157,7 @@ Alternatively, you can script log archiving with PowerShell.
157157
### Accessing Azure Storage metrics
158158

159159
Azure Storage can be configured to log storage operations and access. You can use these detailed logs for capacity monitoring and planning, and for auditing requests to storage. The logged information includes latency details, enabling you to monitor and fine-tune the performance of your solutions.
160-
You can use the .NET SDK for Hadoop to examine the log files generated for the Azure Storage that holds the data for an HDInsight cluster.
160+
You can use the .NET SDK for Hadoop to examine the log files generated for the Azure Storage that holds the data for a HDInsight cluster.
161161

162162
### Control the size and number of backup indexes for old log files
163163

articles/hdinsight/hdinsight-operationalize-data-pipeline.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The following diagram illustrates the example pipeline.
2929

3030
## Apache Oozie solution overview
3131

32-
This pipeline uses Apache Oozie running on an HDInsight Hadoop cluster.
32+
This pipeline uses Apache Oozie running on a HDInsight Hadoop cluster.
3333

3434
Oozie describes its pipelines in terms of *actions*, *workflows*, and *coordinators*. Actions determine the actual work to perform, such as running a Hive query. Workflows define the sequence of actions. Coordinators define the schedule for when the workflow is run. Coordinators can also wait on the availability of new data before launching an instance of the workflow.
3535

@@ -39,7 +39,7 @@ The following diagram shows the high-level design of this example Oozie pipeline
3939

4040
## Provision Azure resources
4141

42-
This pipeline requires an Azure SQL Database and an HDInsight Hadoop cluster in the same location. The Azure SQL Database stores both the summary data produced by the pipeline and the Oozie Metadata Store.
42+
This pipeline requires an Azure SQL Database and a HDInsight Hadoop cluster in the same location. The Azure SQL Database stores both the summary data produced by the pipeline and the Oozie Metadata Store.
4343

4444
### Provision Azure SQL Database
4545

@@ -70,7 +70,7 @@ Your Azure SQL Database is now ready.
7070

7171
### Provision an Apache Hadoop Cluster
7272

73-
Create an Apache Hadoop cluster with a custom metastore. During cluster creation from the portal, from the **Storage** tab, ensure you select your SQL Database under **Metastore settings**. For more information on selecting a metastore, see [Select a custom metastore during cluster creation](./hdinsight-use-external-metadata-stores.md#select-a-custom-metastore-during-cluster-creation). For more information on cluster creation, see [Get Started with HDInsight on Linux](hadoop/apache-hadoop-linux-tutorial-get-started.md).
73+
Create an Apache Hadoop cluster with a custom metastore. During cluster creation from the portal, from the **Storage** tab, ensures you select your SQL Database under **Metastore settings**. For more information on selecting a metastore, see [Select a custom metastore during cluster creation](./hdinsight-use-external-metadata-stores.md#select-a-custom-metastore-during-cluster-creation). For more information on cluster creation, see [Get Started with HDInsight on Linux](hadoop/apache-hadoop-linux-tutorial-get-started.md).
7474

7575
## Verify SSH tunneling set up
7676

articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,9 @@ Hive Warehouse Connector needs separate clusters for Spark and Interactive Query
6060

6161
### Create clusters
6262

63-
1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
63+
1. Create a HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
6464

65-
1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
65+
1. Create a HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
6666

6767
### Configure HWC settings
6868

@@ -172,7 +172,7 @@ This is a way to run Spark interactively through a modified version of the Scala
172172
173173
Spark-submit is a utility to submit any Spark program (or job) to Spark clusters.
174174
175-
The spark-submit job will set up and configure Spark and Hive Warehouse Connector as per our instructions, execute the program we pass to it, then cleanly release the resources that were being used.
175+
The spark-submited job will set up and configure Spark and Hive Warehouse Connector as per our instructions, execute the program we pass to it, then cleanly release the resources that were being used.
176176
177177
Once you build the scala/java code along with the dependencies into an assembly jar, use the below command to launch a Spark application. Replace `<VERSION>`, and `<APP_JAR_PATH>` with the actual values.
178178

articles/hdinsight/interactive-query/quickstart-resource-manager-template.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ The template used in this quickstart is from [Azure Quickstart Templates](https:
3131
Two Azure resources are defined in the template:
3232

3333
* [Microsoft.Storage/storageAccounts](/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
34-
* [Microsoft.HDInsight/cluster](/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
34+
* [Microsoft.HDInsight/cluster](/azure/templates/microsoft.hdinsight/clusters): create a HDInsight cluster.
3535

3636
### Deploy the template
3737

@@ -48,7 +48,7 @@ Two Azure resources are defined in the template:
4848
|Location|The value will autopopulate with the location used for the resource group.|
4949
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
5050
|Cluster Login User Name|Provide the username, default is `admin`.|
51-
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ```' ` "``` ). |
51+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one nonalphanumeric character (except characters ```' ` "``` ). |
5252
|Ssh User Name|Provide the username, default is sshuser|
5353
|Ssh Password|Provide the password.|
5454

@@ -62,7 +62,7 @@ Once the cluster is created, you'll receive a **Deployment succeeded** notificat
6262

6363
## Clean up resources
6464

65-
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
65+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for a HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
6666

6767
From the Azure portal, navigate to your cluster, and select **Delete**.
6868

articles/hdinsight/kafka/kafka-troubleshoot-insufficient-domains.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ Receive error message similar to `not sufficient fault domains in region` when a
1616

1717
## Cause
1818

19-
A fault domain is a logical grouping of underlying hardware in an Azure data center. Each fault domain shares a common power source and network switch. The virtual machines and managed disks that implement the nodes within an HDInsight cluster are distributed across these fault domains. This architecture limits the potential impact of physical hardware failures.
19+
A fault domain is a logical grouping of underlying hardware in an Azure data center. Each fault domain shares a common power source and network switch. The virtual machines and managed disks that implement the nodes within a HDInsight cluster are distributed across these fault domains. This architecture limits the potential impact of physical hardware failures.
2020

2121
Each Azure region has a specific number of fault domains. For a list of domains and the number of fault domains they contain, refer to documentation on [Availability Sets](/azure/virtual-machines/availability).
2222

2323
In HDInsight, Kafka clusters are required to be provisioned in a region with at least three Fault domains.
2424

2525
## Resolution
2626

27-
If the region you wish to create the cluster does not have sufficient fault domains, reach out to product team to allow provisioning of the cluster even if there are not three fault domains.
27+
If the region you wish to create the cluster doesn't have sufficient fault domains, reach out to product team to allow provisioning of the cluster even if there aren't three fault domains.
2828

2929
## Next steps
3030

articles/hdinsight/spark/apache-spark-microsoft-cognitive-toolkit.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ In this article, you do the following steps.
2525

2626
This solution is divided between this article and a Jupyter Notebook that you upload as part of this article. In this article, you complete the following steps:
2727

28-
* Run a script action on an HDInsight Spark cluster to install Microsoft Cognitive Toolkit and Python packages.
28+
* Run a script action on a HDInsight Spark cluster to install Microsoft Cognitive Toolkit and Python packages.
2929
* Upload the Jupyter Notebook that runs the solution to the HDInsight Spark cluster.
3030

3131
The following remaining steps are covered in the Jupyter Notebook.

0 commit comments

Comments
 (0)