You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,13 +50,13 @@ The most likely cause of this exception is that not enough heap memory is alloca
50
50
51
51
### Resolution
52
52
53
-
1. Determine the maximum size of the data the Spark application will handle. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate is not sufficient, increase the size slightly, and iterate until the memory errors subside.
53
+
1. Determine the maximum size of the data the Spark application handles. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate isn't sufficient, increase the size slightly, and iterate until the memory errors subside.
54
54
55
55
1. Make sure that the HDInsight cluster to be used has enough resources in terms of memory and also cores to accommodate the Spark application. This can be determined by viewing the Cluster Metrics section of the YARN UI of the cluster for the values of **Memory Used** vs. **Memory Total** and **VCores Used** vs. **VCores Total**.
1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values should not exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application:
59
+
1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values shouldn't exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application:
60
60
61
61
```
62
62
spark.executor.instances (Example: 8 for 8 executor count)
@@ -96,7 +96,7 @@ scala.MatchError: java.lang.OutOfMemoryError: Java heap space (of class java.lan
96
96
97
97
This issue is often caused by a lack of resources when opening large spark-event files. The Spark heap size is set to 1 GB by default, but large Spark event files may require more than this.
98
98
99
-
If you would like to verify the size of the files that you are trying to load, you can perform the following commands:
99
+
If you would like to verify the size of the files that your'e trying to load, you can perform the following commands:
@@ -126,7 +126,7 @@ Make sure to restart all affected services from Ambari.
126
126
127
127
### Issue
128
128
129
-
Livy Server cannot be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
129
+
Livy Server can't be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
130
130
131
131
```log
132
132
17/07/27 17:52:50 INFO CuratorFrameworkImpl: Starting
@@ -186,35 +186,35 @@ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new nati
186
186
187
187
### Cause
188
188
189
-
`java.lang.OutOfMemoryError: unable to create new native thread` highlights OS cannot assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
189
+
`java.lang.OutOfMemoryError: unable to create new native thread` highlights OS can't assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
190
190
191
-
When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data will be lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
191
+
When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data are lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
192
192
193
-
When large number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates a certain number of to-be-recovered sessions causing too many threads being created.
193
+
When so many number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates some to-be-recovered sessions causing too many threads being created.
194
194
195
195
### Resolution
196
196
197
-
Delete all entries using steps detailed below.
197
+
Delete all entries using the following steps.
198
198
199
199
1. Get the IP address of the zookeeper Nodes using
200
200
201
201
```bash
202
202
grep -R zk /etc/hadoop/conf
203
203
```
204
204
205
-
1. Above command listed all the zookeepers formy cluster
205
+
1. Above command listed all the zookeepers fora cluster
0 commit comments