Skip to content

Commit 6f1d06e

Browse files
author
Sreekanth Iyer (Ushta Te Consultancy Services)
committed
Improved Correctness Score
1 parent 28b6424 commit 6f1d06e

7 files changed

+23
-23
lines changed

articles/hdinsight/hdinsight-hadoop-customize-cluster-linux.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ In this section, you use the [Add-AzHDInsightScriptAction](/powershell/module/az
201201

202202
The following script shows how to apply a script action when you create a cluster by using PowerShell:
203203

204-
[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=5-90)]
204+
[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=5-90)]
205205

206206
It can take several minutes before the cluster is created.
207207

@@ -245,7 +245,7 @@ This section explains how to apply script actions on a running cluster.
245245

246246
To use these PowerShell commands, you need the [AZ Module](/powershell/azure/). The following example shows how to apply a script action to a running cluster:
247247

248-
[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=105-117)]
248+
[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=105-117)]
249249

250250
After the operation finishes, you receive information similar to the following text:
251251

@@ -317,7 +317,7 @@ For an example of using the .NET SDK to apply scripts to a cluster, see [Apply a
317317

318318
The following example script demonstrates using the cmdlets to promote and then demote a script.
319319

320-
[!code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=123-140)]
320+
[!Code-powershell[main](../../powershell_scripts/hdinsight/use-script-action/use-script-action.ps1?range=123-140)]
321321

322322
### Azure CLI
323323

articles/hdinsight/hdinsight-supported-node-configuration.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The following tables list default and recommended virtual machine (VM) sizes for
1818

1919
If you need more than 32 worker nodes in a cluster, select a head node size with at least 8 cores and 14 GB of RAM.
2020

21-
The only cluster types that have data disks are Kafka and HBase clusters with the Accelerated Writes feature enabled. HDInsight supports P30 and S30 disk sizes in these scenarios. For all other cluster types, HDInsight provides managed disk space with the cluster. Starting 11/07/2019, the managed disk size of each node in the newly created cluster is 128 GB. This can't be changed.
21+
The only cluster types that have data disks are Kafka and HBase clusters with the Accelerated Writes feature enabled. HDInsight supports P30 and S30 disk sizes in these scenarios. For all other cluster types, HDInsight provides managed disk space with the cluster. From 11/07/2019 onwards, the managed disk size of each node in the newly created cluster is 128 GB. This can't be changed.
2222

2323
The specifications of all minimum recommended VM types used in this document are summarized in the following table.
2424

@@ -36,9 +36,9 @@ The specifications of all minimum recommended VM types used in this document are
3636

3737
For more information on the specifications of each VM type, see the following documents:
3838

39-
* [General purpose virtual machine sizes: Dv2 series 1-5](../virtual-machines/dv2-dsv2-series.md)
40-
* [Memory optimized virtual machine sizes: Dv2 series 11-15](../virtual-machines/dv2-dsv2-series-memory.md)
41-
* [General purpose virtual machine sizes: Av2 series 1-8](../virtual-machines/av2-series.md)
39+
* [General purpose virtual machine sizes: `Dv2` series 1-5](../virtual-machines/dv2-dsv2-series.md)
40+
* [Memory optimized virtual machine sizes: `Dv2` series 11-15](../virtual-machines/dv2-dsv2-series-memory.md)
41+
* [General purpose virtual machine sizes: `Av2` series 1-8](../virtual-machines/av2-series.md)
4242

4343
### All supported regions
4444

articles/hdinsight/hdinsight-troubleshoot-failed-cluster.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ An HDInsight Gateway times out responses that take longer than two minutes, retu
131131

132132
In this case, review the following logs in the `/var/log/webhcat` directory:
133133

134-
* **webhcat.log** is the log4j log to which server writes logs
134+
* **webhcat.log** is the Log4j log to which server writes logs
135135
* **webhcat-console.log** is the stdout of the server when started
136136
* **webhcat-console-error.log** is the stderr of the server process
137137

@@ -166,9 +166,9 @@ At the YARN level, there are two types of timeouts:
166166

167167
If you open the `/var/log/webhcat/webhcat.log` log file and search for "queued job", you may see multiple entries where the execution time is excessively long (>2000 ms), with entries showing increasing wait times.
168168

169-
The time for the queued jobs continues to increase because the rate at which new jobs get submitted is higher than the rate at which the old jobs are completed. Once the YARN memory is 100% used, the *joblauncher queue* can no longer borrow capacity from the *default queue*. Therefore, no more new jobs can be accepted into the joblauncher queue. This behavior can cause the waiting time to become longer and longer, causing a timeout error that is usually followed by many others.
169+
The time for the queued jobs continues to increase because the rate at which new jobs get submitted is higher than the rate at which the old jobs are completed. Once the YARN memory is 100% used, the `joblauncher queue` can no longer borrow capacity from the *default queue*. Therefore, no more new jobs can be accepted into the job launcher queue. This behavior can cause the waiting time to become longer and longer, causing a timeout error that is usually followed by many others.
170170

171-
The following image shows the joblauncher queue at 714.4% overused. This is acceptable so long as there is still free capacity in the default queue to borrow from. However, when the cluster is fully utilized and the YARN memory is at 100% capacity, new jobs must wait, which eventually causes timeouts.
171+
The following image shows the job launcher queue at 714.4% overused. This is acceptable so long as there is still free capacity in the default queue to borrow from. However, when the cluster is fully utilized and the YARN memory is at 100% capacity, new jobs must wait, which eventually causes timeouts.
172172

173173
:::image type="content" source="./media/hdinsight-troubleshoot-failed-cluster/hdi-job-launcher-queue.png" alt-text="HDInsight Job launcher queue view.":::
174174

articles/hdinsight/hdinsight-use-external-metadata-stores.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ HDInsight also supports custom metastores, which are recommended for production
5959

6060
Create or have an existing Azure SQL Database before setting up a custom Hive metastore for a HDInsight cluster. For more information, see [Quickstart: Create a single database in Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart?tabs=azure-portal).
6161

62-
While creating the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
62+
When you create the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. Configure Azure SQL Database firewall rules to allow Azure services and resources to access the server. Enable this option in the Azure portal by selecting **Set server firewall**. Then select **No** underneath **Deny public network access**, and **Yes** underneath **Allow Azure services and resources to access this server** for Azure SQL Database. For more information, see [Create and manage IP firewall rules](/azure/azure-sql/database/firewall-configure#use-the-azure-portal-to-manage-server-level-ip-firewall-rules)
6363

64-
Private endpoints for SQL stores is only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
64+
Private endpoints for SQL stores are only supported on the clusters created with `outbound` ResourceProviderConnection. To learn more, see this [documentation](./hdinsight-private-link.md).
6565

6666
:::image type="content" source="./media/hdinsight-use-external-metadata-stores/configure-azure-sql-database-firewall1.png" alt-text="set server firewall button.":::
6767

articles/hdinsight/interactive-query/troubleshoot-gateway-timeout.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ Cannot create property 'errors' on string '<!DOCTYPE html PUBLIC '-//W3C//DTD XH
2525

2626
A Gateway timeout.
2727

28-
The Gateway timeout value is 2 minutes. Queries from Ambari Hive View are submitted to the `/hive2` endpoint through the gateway. Once the query is successfully compiled and accepted, the HiveServer returns a `queryid`. Clients then keep polling for the status of the query. During this process, if the HiveServer doesn't return an HTTP response within 2 minutes, the HDI Gateway throws a 502.3 Gateway timeout error to the caller. The errors could happen when the query is submitted for processing (more likely) and also in the get status call (less likely). Users could see either of them.
28+
The Gateway timeout value is 2 minutes. Queries from Ambari Hive View are submitted to the `/hive2` endpoint through the gateway. Once the query is successfully compiled and accepted, the HiveServer returns a `queryid`. Clients then keep polling for the status of the query. During this process, if the HiveServer doesn't return an HTTP response within 2 minutes, the HDI Gateway throws a 502.3 Gateway timeout error to the caller. The errors could happen when the query is submitted for processing (more likely) and also in the got status call (less likely). Users could see either of them.
2929

30-
The http handler thread is supposed to be quick: prepare the job and return a `queryid`. However, due to several reasons, all the handler threads could be busy resulting in timeouts for new queries and the get status calls.
30+
The http handler thread is supposed to be quick: prepare the job and return a `queryid`. However, due to several reasons, all the handler threads could be busy resulting in timeouts for new queries and the got status calls.
3131

3232
### Responsibilities of the HTTP handler thread
3333

@@ -46,7 +46,7 @@ Some general recommendations to you to improve the situation:
4646

4747
* If using an external hive metastore, check the DB metrics and make sure that the database isn't overloaded. Consider scaling the metastore database layer.
4848

49-
* Ensure that parallel ops is turned on (this enables the HTTP handler threads to run in parallel). To verify the value, launch [Apache Ambari](../hdinsight-hadoop-manage-ambari.md) and navigate to **Hive** > **Configs** > **Advanced** > **Custom hive-site**. The value for `hive.server2.parallel.ops.in.session` should be `true`.
49+
* Ensure that parallel ops are turned on (this enables the HTTP handler threads to run in parallel). To verify the value, launch [Apache Ambari](../hdinsight-hadoop-manage-ambari.md) and navigate to **Hive** > **Configs** > **Advanced** > **Custom hive-site**. The value for `hive.server2.parallel.ops.in.session` should be `true`.
5050

5151
* Ensure that the cluster's VM SKU isn't too small for the load. Consider to splitting the work among multiple clusters. For more information, see [Choose a cluster type](../hdinsight-capacity-planning.md#choose-a-cluster-type).
5252

articles/hdinsight/spark/apache-spark-load-data-run-query.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.date: 07/12/2024
88
#Customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to load data into a Spark cluster, so I can run interactive SQL queries against the data.
99
---
1010

11-
# Tutorial: Load data and run queries on an Apache Spark cluster in Azure HDInsight
11+
# Tutorial: Load data, and run queries on an Apache Spark cluster in Azure HDInsight
1212

1313
In this tutorial, you learn how to create a dataframe from a csv file, and how to run interactive Spark SQL queries against an [Apache Spark](https://spark.apache.org/) cluster in Azure HDInsight. In Spark, a dataframe is a distributed collection of data organized into named columns. Dataframe is conceptually equivalent to a table in a relational database or a data frame in R/Python.
1414

@@ -50,7 +50,7 @@ Applications can create dataframes directly from files or folders on the remote
5050
from pyspark.sql.types import *
5151
```
5252

53-
When running an interactive query in Jupyter, the web browser window or tab caption shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner. After the job is completed, it changes to a hollow circle.
53+
When you run an interactive query in Jupyter, the web browser window or tab caption shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark** text in the top-right corner. After the job is completed, it changes to a hollow circle.
5454

5555
:::image type="content" source="./media/apache-spark-load-data-run-query/hdinsight-spark-interactive-spark-query-status.png " alt-text="Status of interactive Spark SQL query." border="true":::
5656

articles/hdinsight/spark/apache-spark-resource-manager.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The three configuration parameters can be configured at the cluster level (for a
3636

3737
### Change the parameters using Ambari UI
3838

39-
1. From the Ambari UI navigate to **Spark2** > **Configs** > **Custom spark2-defaults**.
39+
1. From the Ambari UI navigate to **Spark 2** > **Configs** > **Custom spark2-defaults**.
4040

4141
:::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-configs.png " alt-text="Set parameters using Ambari custom." border="true":::
4242

@@ -106,15 +106,15 @@ Because of Spark dynamic allocation, the only resources that are consumed by thr
106106

107107
1. From the Ambari UI, from the left pane, select **Spark2**.
108108

109-
2. In the next page, select **Spark2 Thrift Servers**.
109+
2. In the next page, select **Spark 2 Thrift Servers**.
110110

111111
:::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-thrift-servers.png " alt-text="Restart thrift server1." border="true":::
112112

113-
3. You should see the two headnodes on which the Spark2 Thrift Server is running. Select one of the headnodes.
113+
3. You should see the two headnodes on which the Spark 2 Thrift Server is running. Select one of the headnodes.
114114

115115
:::image type="content" source="./media/apache-spark-resource-manager/restart-thrift-server-2.png " alt-text="Restart thrift server2." border="true":::
116116

117-
4. The next page lists all the services running on that headnode. From the list, select the drop-down button next to Spark2 Thrift Server, and then select **Stop**.
117+
4. The next page lists all the services running on that headnode. From the list, select the drop-down button next to Spark 2 Thrift Server, and then select **Stop**.
118118

119119
:::image type="content" source="./media/apache-spark-resource-manager/ambari-ui-spark2-thriftserver-restart.png " alt-text="Restart thrift server3." border="true":::
120120
5. Repeat these steps on the other headnode as well.
@@ -135,11 +135,11 @@ Launch the Yarn UI as shown in the beginning of the article. In Cluster Metrics
135135

136136
1. In the Yarn UI, from the left panel, select **Running**. From the list of running applications, determine the application to be killed and select the **ID**.
137137

138-
:::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app1.png " alt-text="Kill App1." border="true":::
138+
:::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app1.png " alt-text="Kill App 1." border="true":::
139139

140140
2. Select **Kill Application** on the top-right corner, then select **OK**.
141141

142-
:::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app2.png " alt-text="Kill App2." border="true":::
142+
:::image type="content" source="./media/apache-spark-resource-manager/apache-ambari-kill-app2.png " alt-text="Kill App 2." border="true":::
143143

144144
## See also
145145

0 commit comments

Comments
 (0)