You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-settings.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,14 +7,14 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
9
ms.custom: hdinsightactive
10
-
ms.date: 04/15/2020
10
+
ms.date: 04/24/2020
11
11
---
12
12
13
13
# Configure Apache Spark settings
14
14
15
-
An HDInsight Spark cluster includes an installation of the [Apache Spark](https://spark.apache.org/) library. Each HDInsight cluster includes default configuration parameters for all its installed services, including Spark. A key aspect of managing an HDInsight Apache Hadoop cluster is monitoring workload, including Spark Jobs. To best run Spark jobs, consider the physical cluster configuration when determining the cluster's logical configuration.
15
+
An HDInsight Spark cluster includes an installation of the Apache Spark library. Each HDInsight cluster includes default configuration parameters for all its installed services, including Spark. A key aspect of managing an HDInsight Apache Hadoop cluster is monitoring workload, including Spark Jobs. To best run Spark jobs, consider the physical cluster configuration when determining the cluster's logical configuration.
16
16
17
-
The default HDInsight Apache Spark cluster includes the following nodes: three [Apache ZooKeeper](https://zookeeper.apache.org/) nodes, two head nodes, and one or more worker nodes:
17
+
The default HDInsight Apache Spark cluster includes the following nodes: three Apache ZooKeeper nodes, two head nodes, and one or more worker nodes:
@@ -97,7 +97,7 @@ Depending on your Spark workload, you may determine that a non-default Spark con
97
97
|---|---|
98
98
|--num-executors|Sets the number of executors.|
99
99
|--executor-cores|Sets the number of cores for each executor. We recommend using middle-sized executors, as other processes also consume some portion of the available memory.|
100
-
|--executor-memory|Controls the memory size (heap size) of each executor on [Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html), and you'll need to leave some memory for execution overhead.|
100
+
|--executor-memory|Controls the memory size (heap size) of each executor on Apache Hadoop YARN, and you'll need to leave some memory for execution overhead.|
101
101
102
102
Here is an example of two worker nodes with different configuration values:
103
103
@@ -124,8 +124,8 @@ Spark clusters in HDInsight include a number of components by default. Each of t
|[Apache Livy](https://livy.incubator.apache.org/)|The Apache Spark REST API, used to submit remote jobs to an HDInsight Spark cluster.|
128
-
|[Jupyter](https://jupyter.org/) and [Apache Zeppelin](https://zeppelin.apache.org/) notebooks|Interactive browser-based UI for interacting with your Spark cluster.|
127
+
|Apache Livy|The Apache Spark REST API, used to submit remote jobs to an HDInsight Spark cluster.|
128
+
|Jupyter and Apache Zeppelin notebooks|Interactive browser-based UI for interacting with your Spark cluster.|
129
129
|ODBC driver|Connects Spark clusters in HDInsight to business intelligence (BI) tools such as Microsoft Power BI and Tableau.|
130
130
131
131
For applications running in the Jupyter notebook, use the `%%configure` command to make configuration changes from within the notebook itself. These configuration changes will be applied to the Spark jobs run from your notebook instance. Make such changes at the beginning of the application, before you run your first code cell. The changed configuration is applied to the Livy session when it gets created.
@@ -148,6 +148,5 @@ Monitor core configuration settings to ensure your Spark jobs run in a predictab
148
148
149
149
*[Apache Hadoop components and versions available with HDInsight?](../hdinsight-component-versioning.md)
150
150
*[Manage resources for an Apache Spark cluster on HDInsight](apache-spark-resource-manager.md)
151
-
*[Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more](../hdinsight-hadoop-provision-linux-clusters.md)
0 commit comments