Skip to content

Commit bcdc125

Browse files
Merge pull request #228624 from sreekzz/patch-150
Freshness MS Date change
2 parents fc68c63 + c1ba7f9 commit bcdc125

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

articles/hdinsight/hdinsight-troubleshoot-yarn.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Troubleshoot YARN in Azure HDInsight
33
description: Get answers to common questions about working with Apache Hadoop YARN and Azure HDInsight.
44
ms.service: hdinsight
55
ms.topic: troubleshooting
6-
ms.date: 08/15/2019
6+
ms.date: 02/27/2023
77
---
88

99
# Troubleshoot Apache Hadoop YARN by using Azure HDInsight
@@ -50,15 +50,15 @@ In this example, two existing queues (**default** and **thriftsvr**) both are ch
5050

5151
These changes are visible immediately on the YARN Scheduler UI.
5252

53-
### Additional reading
53+
### Further reading
5454

5555
- [Apache Hadoop YARN CapacityScheduler](https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html)
5656

5757
## How do I download YARN logs from a cluster?
5858

5959
### Resolution steps
6060

61-
1. Connect to the HDInsight cluster by using a Secure Shell (SSH) client. For more information, see [Additional reading](#additional-reading-2).
61+
1. Connect to the HDInsight cluster by using a Secure Shell (SSH) client. For more information, see [Further reading](#additional-reading-2).
6262

6363
1. To list all the application IDs of the YARN applications that are currently running, run the following command:
6464

@@ -131,22 +131,22 @@ These changes are visible immediately on the YARN Scheduler UI.
131131
132132
### Yarn UI isn't loading
133133
134-
If your YARN UI isn't loading or is unreachable, and it returns "HTTP Error 502.3 - Bad Gateway," it highly indicates your ResourceManager service is unhealthy. To mitigate the issue, follow these steps:
134+
If your YARN UI isn't loading or is unreachable, and it returns "HTTP Error 502.3 - Bad Gateway," it highly indicates your Resource Manager service is unhealthy. To mitigate the issue, follow these steps:
135135
136-
1. Go to **Ambari UI** > **YARN** > **SUMMARY** and check to see if only the active ResourceManager is in the **Started** state. If not, try to mitigate by restarting the unhealthy or stopped ResourceManager.
137-
2. If step 1 doesn't resolve the issue, SSH the active ResourceManager head node and check the garbage collection status using `jstat -gcutil <ResourceManager pid> 1000 100`. If you see the **FGCT** increase significantly in just a few seconds, it indicates ResourceManager is busy in *Full GC*, and is unable to process the other requests.
138-
3. Go to **Ambari UI** > **YARN** > **CONFIGS** > **Advanced** and increase `ResourceManager java heap size`.
136+
1. Go to **Ambari UI** > **YARN** > **SUMMARY** and check to see if only the active Resource Manager is in the **Started** state. If not, try to mitigate by restarting the unhealthy or stopped Resource Manager.
137+
2. If step 1 doesn't resolve the issue, SSH the active Resource Manager head node and check the garbage collection status using `jstat -gcutil <Resource Manager pid> 1000 100`. If you see the **FGCT** increase significantly in just a few seconds, it indicates Resource Manager is busy in *Full GC*, and is unable to process the other requests.
138+
3. Go to **Ambari UI** > **YARN** > **CONFIGS** > **Advanced** and increase `Resource Manager java heap size`.
139139
4. Restart required services in Ambari UI.
140140
141141
### Both resource managers are in standby
142142
143-
1. Check ResourceManager log to see if below similar error exists.
143+
1. Check Resource Manager log to see if similar error exists.
144144
```
145145
Service RMActiveServices failed in state STARTED; cause: org.apache.hadoop.service.ServiceStateException: com.google.protobuf.InvalidProtocolBufferException: Could not obtain block: BP-452067264-10.0.0.16-1608006815288:blk_1074235266_494491 file=/yarn/node-labels/nodelabel.mirror
146146
```
147147
2. If the error exists, check to see if some files are under replication or if there are missing blocks in the HDFS. You can run `hdfs fsck hdfs://mycluster/`
148148
149-
3. Run `hdfs fsck hdfs://mycluster/ -delete` to forcefully clean up the HDFS and to get rid of the standby RM issue. Alternatively, run [PatchYarnNodeLabel](https://hdiconfigactions.blob.core.windows.net/hadoopcorepatchingscripts/PatchYarnNodeLabel.sh) on one of headnodes to patch the cluster.
149+
3. Run `hdfs fsck hdfs://mycluster/ -delete` too forcefully cleanup the HDFS and to get rid of the standby RM issue. Alternatively, run [PatchYarnNodeLabel](https://hdiconfigactions.blob.core.windows.net/hadoopcorepatchingscripts/PatchYarnNodeLabel.sh) on one of headnodes to patch the cluster.
150150
151151
## Next steps
152152

0 commit comments

Comments
 (0)