Skip to content

Commit 304399e

Browse files
Merge pull request #239087 from sreekzz/patch-186
MS Freshness Date change
2 parents 56f38a1 + 2900785 commit 304399e

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: OutOfMemoryError exceptions for Apache Spark in Azure HDInsight
33
description: Various OutOfMemoryError exceptions for Apache Spark cluster in Azure HDInsight
44
ms.service: hdinsight
55
ms.topic: troubleshooting
6-
ms.date: 03/31/2022
6+
ms.date: 05/24/2023
77
---
88

99
# OutOfMemoryError exceptions for Apache Spark in Azure HDInsight
@@ -50,13 +50,13 @@ The most likely cause of this exception is that not enough heap memory is alloca
5050

5151
### Resolution
5252

53-
1. Determine the maximum size of the data the Spark application will handle. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate is not sufficient, increase the size slightly, and iterate until the memory errors subside.
53+
1. Determine the maximum size of the data the Spark application handles. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate isn't sufficient, increase the size slightly, and iterate until the memory errors subside.
5454

5555
1. Make sure that the HDInsight cluster to be used has enough resources in terms of memory and also cores to accommodate the Spark application. This can be determined by viewing the Cluster Metrics section of the YARN UI of the cluster for the values of **Memory Used** vs. **Memory Total** and **VCores Used** vs. **VCores Total**.
5656

5757
:::image type="content" source="./media/apache-spark-ts-outofmemory/yarn-core-memory-view.png" alt-text="yarn core memory view" border="true":::
5858

59-
1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values should not exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application:
59+
1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values shouldn't exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application:
6060

6161
```
6262
spark.executor.instances (Example: 8 for 8 executor count)
@@ -96,7 +96,7 @@ scala.MatchError: java.lang.OutOfMemoryError: Java heap space (of class java.lan
9696
9797
This issue is often caused by a lack of resources when opening large spark-event files. The Spark heap size is set to 1 GB by default, but large Spark event files may require more than this.
9898
99-
If you would like to verify the size of the files that you are trying to load, you can perform the following commands:
99+
If you would like to verify the size of the files that your'e trying to load, you can perform the following commands:
100100
101101
```bash
102102
hadoop fs -du -s -h wasb:///hdp/spark2-events/application_1503957839788_0274_1/
@@ -126,7 +126,7 @@ Make sure to restart all affected services from Ambari.
126126

127127
### Issue
128128

129-
Livy Server cannot be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
129+
Livy Server can't be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
130130

131131
```log
132132
17/07/27 17:52:50 INFO CuratorFrameworkImpl: Starting
@@ -186,35 +186,35 @@ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new nati
186186

187187
### Cause
188188

189-
`java.lang.OutOfMemoryError: unable to create new native thread` highlights OS cannot assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
189+
`java.lang.OutOfMemoryError: unable to create new native thread` highlights OS can't assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
190190

191-
When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data will be lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
191+
When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data are lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
192192

193-
When large number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates a certain number of to-be-recovered sessions causing too many threads being created.
193+
When so many number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates some to-be-recovered sessions causing too many threads being created.
194194

195195
### Resolution
196196

197-
Delete all entries using steps detailed below.
197+
Delete all entries using the following steps.
198198

199199
1. Get the IP address of the zookeeper Nodes using
200200

201201
```bash
202202
grep -R zk /etc/hadoop/conf
203203
```
204204

205-
1. Above command listed all the zookeepers for my cluster
205+
1. Above command listed all the zookeepers for a cluster
206206

207207
```bash
208208
/etc/hadoop/conf/core-site.xml: <value><zookeepername1>.lnuwp5akw5ie1j2gi2amtuuimc.dx.internal.cloudapp.net:2181,<zookeepername2>.lnuwp5akw5ie1j2gi2amtuuimc.dx.internal.cloudapp.net:2181,<zookeepername3>.lnuwp5akw5ie1j2gi2amtuuimc.dx.internal.cloudapp.net:2181</value>
209209
```
210210

211-
1. Get all the IP address of the zookeeper nodes using ping Or you can also connect to zookeeper from headnode using zk name
211+
1. Get all the IP address of the zookeeper nodes using ping Or you can also connect to zookeeper from headnode using zookeeper name
212212

213213
```bash
214214
/usr/hdp/current/zookeeper-client/bin/zkCli.sh -server <zookeepername1>:2181
215215
```
216216

217-
1. Once you are connected to zookeeper execute the following command to list all the sessions that are attempted to restart.
217+
1. Once your'e connected, to zookeeper execute the following command to list all the sessions that are attempted to restart.
218218
219219
1. Most of the cases this could be a list more than 8000 sessions ####
220220

0 commit comments

Comments
 (0)