Improved Correctness Score

Sreekanth Iyer (Ushta Te Consultancy Services) · Sreekanth Iyer (Ushta Te Consultancy Services) · commit 6f1d06ee9a05 · 2024-07-12T13:03:20.000+05:30
diff --git a/articles/hdinsight/hdinsight-hadoop-customize-cluster-linux.md b/articles/hdinsight/hdinsight-hadoop-customize-cluster-linux.md
@@ -201,7 +201,7 @@ In this section, you use the [Add-AzHDInsightScriptAction](/powershell/module/az
 
 The following script shows how to apply a script action when you create a cluster by using PowerShell:
 
-[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=5-90)]
+[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=5-90)]
 
 It can take several minutes before the cluster is created.
 
@@ -245,7 +245,7 @@ This section explains how to apply script actions on a running cluster.
 
 To use these PowerShell commands, you need the [AZ Module](/powershell/azure/). The following example shows how to apply a script action to a running cluster:
 
-[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=105-117)]
+[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=105-117)]
 
 After the operation finishes, you receive information similar to the following text:
 
@@ -317,7 +317,7 @@ For an example of using the .NET SDK to apply scripts to a cluster, see [Apply a
 
 The following example script demonstrates using the cmdlets to promote and then demote a script.
 
-[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=123-140)]
+[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=123-140)]
 
 ### Azure CLI
 
diff --git a/articles/hdinsight/hdinsight-supported-node-configuration.md b/articles/hdinsight/hdinsight-supported-node-configuration.md
@@ -18,7 +18,7 @@ The following tables list default and recommended virtual machine (VM) sizes for
 
 If you need more than 32 worker nodes in a cluster, select a head node size with at least 8 cores and 14 GB of RAM.
 
-The only cluster types that have data disks are Kafka and HBase clusters with the Accelerated Writes feature enabled. HDInsight supports P30 and S30 disk sizes in these scenarios. For all other cluster types, HDInsight provides managed disk space with the cluster. Starting 11/07/2019, the managed disk size of each node in the newly created cluster is 128 GB. This can't be changed.
+The only cluster types that have data disks are Kafka and HBase clusters with the Accelerated Writes feature enabled. HDInsight supports P30 and S30 disk sizes in these scenarios. For all other cluster types, HDInsight provides managed disk space with the cluster. From 11/07/2019 onwards, the managed disk size of each node in the newly created cluster is 128 GB. This can't be changed.
 
 The specifications of all minimum recommended VM types used in this document are summarized in the following table.
 
@@ -36,9 +36,9 @@ The specifications of all minimum recommended VM types used in this document are
 
 For more information on the specifications of each VM type, see the following documents:
 
-* [General purpose virtual machine sizes: Dv2 series 1-5](../virtual-machines/dv2-dsv2-series.md)
-* [Memory optimized virtual machine sizes: Dv2 series 11-15](../virtual-machines/dv2-dsv2-series-memory.md)
-* [General purpose virtual machine sizes: Av2 series 1-8](../virtual-machines/av2-series.md)
+* [General purpose virtual machine sizes: `Dv2` series 1-5](../virtual-machines/dv2-dsv2-series.md)
+* [Memory optimized virtual machine sizes: `Dv2` series 11-15](../virtual-machines/dv2-dsv2-series-memory.md)
+* [General purpose virtual machine sizes: `Av2` series 1-8](../virtual-machines/av2-series.md)
 
 ### All supported regions
 
diff --git a/articles/hdinsight/hdinsight-troubleshoot-failed-cluster.md b/articles/hdinsight/hdinsight-troubleshoot-failed-cluster.md
@@ -131,7 +131,7 @@ An HDInsight Gateway times out responses that take longer than two minutes, retu
 
 In this case, review the following logs in the `/var/log/webhcat` directory:
 
-* **webhcat.log** is the log4j log to which server writes logs
+* **webhcat.log** is the Log4j log to which server writes logs
 * **webhcat-console.log** is the stdout of the server when started
 * **webhcat-console-error.log** is the stderr of the server process
 
@@ -166,9 +166,9 @@ At the YARN level, there are two types of timeouts:
 
     If you open the `/var/log/webhcat/webhcat.log` log file and search for "queued job", you may see multiple entries where the execution time is excessively long (>2000 ms), with entries showing increasing wait times.
 
-    The time for the queued jobs continues to increase because the rate at which new jobs get submitted is  higher than the rate at which the old jobs are completed. Once the YARN memory is 100% used,  the *joblauncher queue* can no longer borrow capacity from the *default queue*. Therefore, no more new jobs can be accepted into the joblauncher queue. This behavior can  cause the waiting time to become longer and longer, causing a timeout error that is usually followed by many others.
+    The time for the queued jobs continues to increase because the rate at which new jobs get submitted is  higher than the rate at which the old jobs are completed. Once the YARN memory is 100% used,  the `joblauncher queue` can no longer borrow capacity from the *default queue*. Therefore, no more new jobs can be accepted into the job launcher queue. This behavior can  cause the waiting time to become longer and longer, causing a timeout error that is usually followed by many others.
 
-    The  following  image shows the joblauncher queue at 714.4% overused. This is acceptable so long as there is still free capacity in the default queue to borrow from. However, when the cluster is fully utilized and the YARN memory is at 100% capacity, new jobs must wait, which eventually causes timeouts.
+    The  following  image shows the job launcher queue at 714.4% overused. This is acceptable so long as there is still free capacity in the default queue to borrow from. However, when the cluster is fully utilized and the YARN memory is at 100% capacity, new jobs must wait, which eventually causes timeouts.
 
     :::image type="content" source="./media/hdinsight-troubleshoot-failed-cluster/hdi-job-launcher-queue.png" alt-text="HDInsight Job launcher queue view.":::
 
diff --git a/articles/hdinsight/hdinsight-use-external-metadata-stores.md b/articles/hdinsight/hdinsight-use-external-metadata-stores.md
@@ -59,9 +59,9 @@ HDInsight also supports custom metastores, which are recommended for production
 
 Create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster.  For more information, see [Quickstart: Create a single database in Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart?tabs=azure-portal).
 
-While creating the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
+When you create the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
 
-Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
+Private endpoints for SQL stores are only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
 
 :::image type="content" source="./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall1.png" alt-text="set server firewall button.":::
 
diff --git a/articles/hdinsight/interactive-query/troubleshoot-gateway-timeout.md b/articles/hdinsight/interactive-query/troubleshoot-gateway-timeout.md
@@ -25,9 +25,9 @@ Cannot create property 'errors' on string '<!DOCTYPE html PUBLIC '-//W3C//DTD XH
 
 A Gateway timeout.
 
-The Gateway timeout value is 2 minutes. Queries from Ambari Hive View are submitted to the `/hive2` endpoint through the gateway. Once the query is successfully compiled and accepted, the HiveServer returns a `queryid`. Clients then keep polling for the status of the query. During this process, if the HiveServer doesn't return an HTTP response within 2 minutes, the HDI Gateway throws a 502.3 Gateway timeout error to the caller. The errors could happen when the query is submitted for processing (more likely) and also in the get status call (less likely). Users could see either of them.
+The Gateway timeout value is 2 minutes. Queries from Ambari Hive View are submitted to the `/hive2` endpoint through the gateway. Once the query is successfully compiled and accepted, the HiveServer returns a `queryid`. Clients then keep polling for the status of the query. During this process, if the HiveServer doesn't return an HTTP response within 2 minutes, the HDI Gateway throws a 502.3 Gateway timeout error to the caller. The errors could happen when the query is submitted for processing (more likely) and also in the got status call (less likely). Users could see either of them.
 
-The http handler thread is supposed to be quick: prepare the job and return a `queryid`. However, due to several reasons, all the handler threads could be busy resulting in timeouts for new queries and the get status calls.
+The http handler thread is supposed to be quick: prepare the job and return a `queryid`. However, due to several reasons, all the handler threads could be busy resulting in timeouts for new queries and the got status calls.
 
 ### Responsibilities of the HTTP handler thread
 
@@ -46,7 +46,7 @@ Some general recommendations to you to improve the situation:
 
 * If using an external hive metastore, check the DB metrics and make sure that the database isn't overloaded. Consider scaling the metastore database layer.
 
-* Ensure that parallel ops is turned on (this enables the HTTP handler threads to run in parallel). To verify the value, launch [Apache Ambari](../hdinsight-hadoop-manage-ambari.md) and navigate to **Hive** > **Configs** > **Advanced** > **Custom hive-site**. The value for `hive.server2.parallel.ops.in.session` should be `true`.
+* Ensure that parallel ops are turned on (this enables the HTTP handler threads to run in parallel). To verify the value, launch [Apache Ambari](../hdinsight-hadoop-manage-ambari.md) and navigate to **Hive** > **Configs** > **Advanced** > **Custom hive-site**. The value for `hive.server2.parallel.ops.in.session` should be `true`.
 
 * Ensure that the cluster's VM SKU isn't too small for the load. Consider to splitting the work among multiple clusters. For more information, see [Choose a cluster type](../hdinsight-capacity-planning.md#choose-a-cluster-type).
 
diff --git a/articles/hdinsight/spark/apache-spark-load-data-run-query.md b/articles/hdinsight/spark/apache-spark-load-data-run-query.md
@@ -8,7 +8,7 @@ ms.date: 07/12/2024
 #Customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to load data into a Spark cluster, so I can run interactive SQL queries against the data.
 ---
 
-# Tutorial: Load data and run queries on an Apache Spark cluster in Azure HDInsight
+# Tutorial: Load data, and run queries on an Apache Spark cluster in Azure HDInsight
 
 In this tutorial, you learn how to create a dataframe from a csv file, and how to run interactive Spark SQL queries against an [Apache Spark](https://spark.apache.org/) cluster in Azure HDInsight. In Spark, a dataframe is a distributed collection of data organized into named columns. Dataframe is conceptually equivalent to a table in a relational database or a data frame in R/Python.
 
@@ -50,7 +50,7 @@ Applications can create dataframes directly from files or folders on the remote
     from pyspark.sql.types import *
     ```
 
-    When running an interactive query in Jupyter, the web browser window or tab caption shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner. After the job is completed, it changes to a hollow circle.
+    When you run an interactive query in Jupyter, the web browser window or tab caption shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner. After the job is completed, it changes to a hollow circle.
 
     :::image type="content" source="./media/apache-spark-load-data-run-query/hdinsight-spark-interactive-spark-query-status.png " alt-text="Status of interactive Spark SQL query." border="true":::
 
diff --git a/articles/hdinsight/spark/apache-spark-resource-manager.md b/articles/hdinsight/spark/apache-spark-resource-manager.md
@@ -36,7 +36,7 @@ The three configuration parameters can be configured at the cluster level (for a
 
 ### Change the parameters using Ambari UI
 
-1. From the Ambari UI navigate to **Spark2** > **Configs** > **Custom spark2-defaults**.
+1. From the Ambari UI navigate to **Spark 2** > **Configs** > **Custom spark2-defaults**.
 
     :::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-configs.png " alt-text="Set parameters using Ambari custom." border="true":::
 
@@ -106,15 +106,15 @@ Because of Spark dynamic allocation, the only resources that are consumed by thr
 
 1. From the Ambari UI, from the left pane, select **Spark2**.
 
-2. In the next page, select **Spark2 Thrift Servers**.
+2. In the next page, select **Spark 2 Thrift Servers**.
 
     :::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-thrift-servers.png " alt-text="Restart thrift server1." border="true":::
 
-3. You should see the two headnodes on which the Spark2 Thrift Server is running. Select one of the headnodes.
+3. You should see the two headnodes on which the Spark 2 Thrift Server is running. Select one of the headnodes.
 
     :::image type="content" source="./media/apache-spark-resource-manager/restart-thrift-server-2.png " alt-text="Restart thrift server2." border="true":::
 
-4. The next page lists all the services running on that headnode. From the list, select the drop-down button next to Spark2 Thrift Server, and then select **Stop**.
+4. The next page lists all the services running on that headnode. From the list, select the drop-down button next to Spark 2 Thrift Server, and then select **Stop**.
 
     :::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-thriftserver-restart.png " alt-text="Restart thrift server3." border="true":::
 5. Repeat these steps on the other headnode as well.
@@ -135,11 +135,11 @@ Launch the Yarn UI as shown in the beginning of the article. In Cluster Metrics
 
 1. In the Yarn UI, from the left panel, select **Running**. From the list of running applications, determine the application to be killed and select the **ID**.
 
-    :::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app1.png " alt-text="Kill App1." border="true":::
+    :::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app1.png " alt-text="Kill App 1." border="true":::
 
 2. Select **Kill Application** on the top-right corner, then select **OK**.
 
-    :::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app2.png " alt-text="Kill App2." border="true":::
+    :::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app2.png " alt-text="Kill App 2." border="true":::
 
 ## See also